Trouble Running Single Finger Test - TriFinger Robot

Hi Felix,

Absolutely no problem at all. I highly appreciate the responses regardless of how long they take – I would be stuck without them!!

I did figure this out after a while of troubleshooting today. I discovered that the issue was that I had already installed the booster packs and removed the jumpers – simply switching the third switch to “ON” was throwing cryptic memory errors. I’m not sure exactly what is required, but to flash the boards, I returned them to their out-of-the-box configuration: no booster packs, jumpers JP1,2,3,4,5,7 installed, JP6 not installed. Once I did this, I was able to follow the instructions and flash the boards.

Another thing of note is that I did this all using CCS 10.3.1 rather than 9.1.0 – it all worked exactly the same.

Tomorrow I will go to the lab and test to see if everything is working. I hope to have no issues and run the single_finger_test!!

Hi Felix,

Unfortunately I was not able to get the single_finger_test running. I just want to provide a quick overview of what I have done to see if there are any obvious issues that you see. This is a long post so please feel free to take some time to respond!!

The current situation:
I have a fully built single finger:

(Obviously, there’s a bit of wire management that I need to do, but I’ve been moving everything around trying to get it all working)

The “upper leg” module is connected to the bottom microcontroller (the microcontroller with only 1 Booster Pack). The “hip fe” module is connected to Booster Pack A on the top microcontroller, and the “hip aa” module is connected to Booster Pack B on the top microcontroller. On both microcontrollers, only jumpers JP3, JP6, and JP7 are installed. Both microcontrollers are set to the “ON-ON-OFF” position. Both microcontrollers have been successfully flashed with the provided software using the instructions within the Github page. Both motors are connected to a nearby lab power supply supplying around 3.3V at around .3 A:

When turning the power supply on, the green D1 LEDs are illuminated on both boards:

Playing with the amperage supplied can also turn on the blue D10 LEDs as well:

If I set the voltage and/or amperage too high, the red D9 LEDs will illuminate:

I don’t know what any of these mean, but I generally assumed that the green was the best and left it in that configuration.

On all three Booster Packs, the red “nFault” LED, and the green “D3 3.3V” LED are illuminated:

I assumed the “nFault” LED was not supposed to be illuminated, but the official documentation says that “A fault may appear [after turning on your power supply] on the nFAULT LED. This is normal and should be cleared when the status registers are read or EN_GATE is taken HIGH.”, so I didn’t look too far into it.

Both microcontrollers are connected via CAN to a temporary computer that I built for this project:

The components on this computer are very bad and around 10+ years old (core i3-2100, 8gb ddr3, no gpu, etc.) but I assume this does not matter. On the SSD is Ubuntu 20.04. The CAN ports have been initialized by running the “initialize_can_bus.sh” script provided in another Discourse thread, which allows me to see that they are being recognized by the computer.

With your help, on the computer, I’ve been able to fully get the Singularity image working and the demo to run as seen here:
image (4)

You can see the warning “Warning this thread is not going to be real time.” which I assume is a warning notifying me that I have no yet configured my Ubuntu install to be an RT_PREEMPT installation – I attempted doing this but had some issues which led me to stop trying for the time being. Seeing as the demo worked, I assumed this meant that it would be possible to run the single_finger_test, just without real-time results.

Next, I try to run the single_finger_test and see the following:

One thing to note here is that there was a code update on Mar 1 which set the value of “homing_with_index” to “false” which is not compatible with my codebase – it threw some error that I needed to use “homing_method” instead, which I set to “none” and it seemed to work.

After this, I get the message “waiting for board and motors to be ready” as seen in the screenshot. I’ve tried various things that I don’t really know if they matter such as: changing the CAN ports for each microcontroller, swapping the microcontrollers out for different ones (I have 6 fully prepared ones), connecting the different modules to different power sources (not sure if there is a required order for this, but I assume not), and various other software things (restarting my computer, etc.) and I just can’t seem to get it working…

I’m not sure what I’m doing wrong, and what should do next but I would love your input.

Once again, thank you very much for designing this awesome robot and helping me build it!!
Ross

Hi Ross,

some thoughts:

Real Time Linux

The RT_PREEMPT patch should indeed not be needed for a first test, so that should be fine.

Power Supply

I think the power supply should actually be set to 24 V (unless I’m missing some difference in our setups). @fgrimminger can you confirm?

Connection of Motors to the Boards

Regarding the connection of motors to the boards: They need to be connected in a specific order (not sure if this might be missing in the public documentation?)

Joint Board/Connector
lower (= upper leg?) Board 1 / Motor 2
middle (= hip fe?) Board 1 / Motor 1
upper (= hip aa?) Board 2 / Motor 1

I assume that “Motor 1” corresponds to Booster Pack A and “Motor 2” to Pack B.
@fgrimminger Can you double-check that I translated upper/middle/lower to the correct joints?

I don’t think that this is the reason for the current issues, though. The software should still be able to connect to the robot it would just have the joints in a wrong order.

Checking CAN Port Order

I’m using the following trick to find out which CAN port corresponds to which name (can0, can1, …):

  1. Initialise the CAN bus but do not connect anything
  2. In a terminal run watch netstat -i
  3. Power on the robot and connect one CAN plug
  4. Check the output in the terminal. For one of the ports some of the numbers should increase. That’s the port you are currently connected to.
  5. Once identified, you can add labels to the ports. In my experience they don’t change as long as you don’t add/remove any CAN carts in your computer.

But again, if they are connected in the wrong order, it should still be able to connect and only result in a wrong joints order in the software.

2 Likes

Hi Ross,

thanks for the pictures and your description.

The supply voltage is too low - I would increase it to 12V or 24V.
(make sure that the power jumpers are removed if you connect the cards via usb again)

I observe the same behavior here when I reduce the voltage below 5V.

Below is a screenshot with the motor assignment that we use - I will also document this on GitHub.

The naming of the degrees of freedom was changed for the fingers - below is a screenshot with the convention we use.
It’s as @felixwidmaier described in his reply.

Best. Felix

2 Likes

Thank you, and @felixwidmaier, for your detailed answers!! I’m going to head into the lab and work on these modifications ASAP.

Hi Felix,

I did get a chance to try this out in the lab. After moving the wires around to the correct positions for each motor, increasing the voltage on the power supply, and turning it on, I was able to actually run the single_finger_test.

What I saw was this screen appears for around 0.5 seconds:

Followed by this screen:

Everything looks good – it appears to all be plugged in correctly, I can “watch netstat -i” to see that data is coming in through the CAN ports, the boards all appear to be powered on, etc.

Have you ever seen this issue? Or do you have any other ideas of what could potentially be wrong?

Sorry for the delayed response, and thanks again,
Ross

Hi Ross,

that you reached the first screen, showing observation, action, etc. means that the software could successfully connect to and initialise the robot, so that’s good news!

The “Action did not end on time” error means that the robot backend detected a violation of the timing constraints. To overcome this, you most likely need to install the PREEMPT_RT kernel.
We are using this script for this: ubuntu_installation_scripts/install_rt_preempt_kernel.sh at master · machines-in-motion/ubuntu_installation_scripts · GitHub
I think we don’t yet have a good documentation on how to use it, though. So some hints:

  • The VERSION_ variables at the top refer to the kernel version that will be installed. Usually you can leave the default values.

  • DEFAULT_CONFIG needs to be modified to match the kernel version you are currently using.

  • In the beginning (after downloading some things) you will be asked to manually adjust some configuration settings. Before entering the menu, the script prints the following instructions:

    Please apply the following configurations in the next step:
    
    General setup
      Local version - append to kernel release: [Enter] Add '-preempt-rt'
    
    Processor type and features ---> [Enter]
      Preemption Model (Voluntary Kernel Preemption (Desktop)) [Enter]
        Fully Preemptible Kernel (RT) [Enter] #Select
    

    However, I observed that depending on the kernel version the “Preemption Model” setting may be found in the “General setup” menu, not in “Processor type and features”.

  • Once the kernel is installed, you should configure Grub to load it by default and add your user to the “realtime” group.

  • When using the real-time kernel, the following command should be run after each reboot:

    sudo chmod 0666 /dev/cpu_dma_latency
    

    Alternatively you can set up the following udev rule to do this automatically:

    KERNEL=="cpu_dma_latency", NAME="cpu_dma_latency", MODE="0666"
1 Like

Hi Felix,

First of all, sorry for the late reply – things were hectic around the end of the semester here.

I did end up installing the real time Kernel using the script provided, so thank you for that. I have also ran the command you provided here successfully:

sudo chmod 0666 /dev/cpu_dma_latency

I’m able to confirm that the real time kernel is the one which I am using as seen here:

In addition, we can also see that it’s properly being recognized within Singularity as seen here:

sing

Unfortunately, I’m still getting the same errors about timing. I see the error warning that the thread is not real time during both the “demo_fake_finger”:

And the “single_finger_test”:

I was wondering if you had any idea of what else this could potentially be, or any other ideas of how to proceed?

Sorry for the long wait between responses and thanks again for all your help,
Ross

Hi Ross,

No need to apologise!
You probably need to rebuild the workspace. There are some build flags that are set automatically based on the detected kernel, so to get a real-time capable build, the preempt-rt kernel needs to be running at build time (or alternatively you can set the flag explicitly: colcon build --cmake-args -DOS_VERSION=preempt-rt).

So I expect that when you do a clean rebuild (rm -rf build install; colcon build) while the preempt-rt kernel is active, it should solve the issue.

I’ll update the documentation to add this information.

1 Like

I added a section regarding the real-time stuff to the documentation: Real Time Setup — robot_interfaces documentation

2 Likes

Hi Felix,

Thanks again for the input here. I did rebuild the project with these flags and it worked as expected. When I run the program now, I do not see the “warning” any more.

I think at this point I’ve done everything possible – with the updated Singularity image, I’ve successfully built the package, am able to successfully run the “demo_fake_finger”, installed the RT_PREEMPT patch to my kernel, and everything else above (i.e. the motors are connected to the correct positions on the board, the power supply is supplying 24V, the CAN ports are initialized, everything is plugged into the correct CAN ports, I ran all of the commands on startup, etc. etc.) but the robot still doesn’t actually work. I also built a second finger using higher resolution 3D printing in the hopes that potentially the printing quality might make it work, but it did not.

The situation is that I begin the “single_finger_test”, the screen outputs “Waiting for board and motors to be ready”. The LEDs on the TI boards begin flashing as seen here:
trifinger_lights

I’m not sure what this flashing means but it never stops, the status message of “Waiting for board and motors to be ready” never changes, the robot never moves, and that’s basically where I’m stuck at.

I’d love any insight you have as to where to look next! I don’t mind putting in some effort to get this working, I’m just not sure exactly where I should focus that effort…

Thanks again!
Ross

Regarding the blinking LED:

Behaviour Meaning
Off System is disabled.
On System is enabled. Both motors are disabled.
Slow Blinking (1Hz) System and at least one motor are enabled.
Fast Blinking (4Hz) Same as above + at least one motor is currently aligned (happens when the motor is enabled for the first time after power cycling and takes around 5 seconds).

So blinking means that the initial communication is working but then it gets stuck for some reason.
You can monitor the messages on the CAN bus with candump (from the can-utils package):

candump -e can0

If I recall correctly, you are using a “PCAN-PCI Express FD Four Channel”, is this correct?
We recently started to test with one of those ourselves (so far, we always used the ones without “FD”) and are running into similar issues (sometimes the connection is interrupted a moment after the motor is enabled, sometimes it is running but has some notable delays).
The motor boards do not support CAN FD but in theory it should be backward compatible. It might be that the firmware needs to be configured a bit differently but I unfortunately haven’t figured out yet what the problem is.

I had some hope that our card simply has some damage but given that you have the same problem, it seems to be more fundamental :confused:
There is an ongoing discussion about this in the PEAK Forum.

So I’m afraid I can’t help with this at the moment :frowning: However, I’m still investigating. To my understanding it has to work with the FD card, it might just be a matter of finding the correct configuration.

1 Like

Hi Felix,

That is correct, I do have the “PCAN-PCI Express FD Four Channel” CAN cards. I’m hoping to figure something out to make these work as I don’t think I can return them at this point.

I’ll go through the thread on the PEAK forums and see what I think, but let me know if there’s anything specific I can do to help figure this out!

Thanks again,
Ross

Hi Ross,

I think I have good news! With the help of the PEAK support, I got it working. It seems there is some problem with the driver of the CAN FD cards, so we need to change some parameters there (unfortunately this involves rebuilding the kernel). See the PEAK forum.

I didn’t do any extensive tests yet and I think it needs some fine tuning (and a proper write-up from my side), but I wanted to share the good news immediately :slight_smile:

I don’t have time now but will provide more information on how to update the driver tomorrow, in case you want to test it yourself.

2 Likes

Wow, this is great news! I don’t have time to test it out today, but I’ll hopefully be able to make it into the lab tomorrow and see.

Thanks for your tireless work on this project!!
Ross

Here are the steps I followed to update the driver:

To rebuild the kernel with the modified parameter I used the setup from installing the preempt_rt kernel.

I still had the ~/Downloads/rt_preempt_kernel_install directory in which the install script downloads and builds the kernel source. So I just went there and modified the parameter in the file linux-XXX/drivers/net/can/peak_canfd/peak_pciefd_main.c.

Then I modified the install_rt_preempt_kernel.sh to skip the part that downloads the source and applies the preempt_rt patch, so that I could use it to just rebuild and install the kernel with the modified source. Here is a diff of what I commented:

37,39c37,39
< wget -nc https://mirrors.edge.kernel.org/pub/linux/kernel/v$VERSION_MAJOR.x/linux-$VERSION.tar.xz
< wget -nc http://cdn.kernel.org/pub/linux/kernel/projects/rt/$VERSION_MAJOR.$VERSION_SECOND/older/patch-$VERSION_PATCH.patch.xz
< xz -cd linux-$VERSION.tar.xz | tar xvf -
---
> #wget -nc https://mirrors.edge.kernel.org/pub/linux/kernel/v$VERSION_MAJOR.x/linux-$VERSION.tar.xz
> #wget -nc http://cdn.kernel.org/pub/linux/kernel/projects/rt/$VERSION_MAJOR.$VERSION_SECOND/older/patch-$VERSION_PATCH.patch.xz
> #xz -cd linux-$VERSION.tar.xz | tar xvf -
43c43
< xzcat ../patch-$VERSION_PATCH.patch.xz | patch -p1
---
> #xzcat ../patch-$VERSION_PATCH.patch.xz | patch -p1

I haven’t tested this yet but I assume that the part for re-creating the config can be skipped as well.

Then I simple re-ran install_rt_preempt_kernel.sh and rebooted.

I also applied a fix in the blmc_drivers package (see pull request) which might be relevant when using a CAN FD card.

Further, I changed the CAN bit timing on the motor board which I am using for testing (see the discussion in the PEAK forum), but this might actually not be relevant.

Unfortunately, I’m still running into issues that sometimes it get’s stuck in the “waiting for board and motors to be ready” phase. It seems the communication on the CAN bus just stops in this case for some reason. Turning the board off and on again and then restarting the application usually helps but it’s still an issue I need to investigate.

So to clarify: The driver fix I’m talking about above fixes an issue that once the robot was running, messages are sometimes delayed, resulting in instable motor control. It seems that the “gets stuck in the beginning”-issue is a separate problem :(.

Hi Felix,

No problem – I understand that these things are very complex and can be difficult to solve. I was wondering if there has been any movement on this in the last couple weeks?

Also, is there anything that I can help out with – maybe providing some data or other troubleshooting? I would love to step in to help out where I can.

Thanks again,
Ross

Hi Ross,

I don’t really have news yet. The problem (at least for me) seems to be that the CAN communication is unstable and sometimes the CAN controller on the board is shutting down due to some error (probably because sent messages are not acknowledged by the PC for some reason). I have not yet figured out why, though. On the latest tests, I actually wasn’t able to reproduce the issue, i.e. the robot was running fine. We did some modifications on the CAN cable for debugging, not sure if this had some influence.

Right now I’m on vacation, so I didn’t work on it in the last days but I plan to continue when I am back in mid September.

One thing you could test, if not done already:
Connect the board, power it on (without starting the robot software on the PC) and then monitor the CAN interface with candump -e can0. It should at this point receive one message per second (a status message from the board). If you see this, it means that the communication is generally working. Then start the robot software and see what happens in the can dump. In the cases where it is not working for me, the communication simply stops after a few moments (due to error on the board).

Best,
Felix