Regarding the blinking LED:
Behaviour |
Meaning |
Off |
System is disabled. |
On |
System is enabled. Both motors are disabled. |
Slow Blinking (1Hz) |
System and at least one motor are enabled. |
Fast Blinking (4Hz) |
Same as above + at least one motor is currently aligned (happens when the motor is enabled for the first time after power cycling and takes around 5 seconds). |
So blinking means that the initial communication is working but then it gets stuck for some reason.
You can monitor the messages on the CAN bus with candump
(from the can-utils
package):
candump -e can0
If I recall correctly, you are using a āPCAN-PCI Express FD Four Channelā, is this correct?
We recently started to test with one of those ourselves (so far, we always used the ones without āFDā) and are running into similar issues (sometimes the connection is interrupted a moment after the motor is enabled, sometimes it is running but has some notable delays).
The motor boards do not support CAN FD but in theory it should be backward compatible. It might be that the firmware needs to be configured a bit differently but I unfortunately havenāt figured out yet what the problem is.
I had some hope that our card simply has some damage but given that you have the same problem, it seems to be more fundamental 
There is an ongoing discussion about this in the PEAK Forum.
So Iām afraid I canāt help with this at the moment
However, Iām still investigating. To my understanding it has to work with the FD card, it might just be a matter of finding the correct configuration.
1 Like
Hi Felix,
That is correct, I do have the āPCAN-PCI Express FD Four Channelā CAN cards. Iām hoping to figure something out to make these work as I donāt think I can return them at this point.
Iāll go through the thread on the PEAK forums and see what I think, but let me know if thereās anything specific I can do to help figure this out!
Thanks again,
Ross
Hi Ross,
I think I have good news! With the help of the PEAK support, I got it working. It seems there is some problem with the driver of the CAN FD cards, so we need to change some parameters there (unfortunately this involves rebuilding the kernel). See the PEAK forum.
I didnāt do any extensive tests yet and I think it needs some fine tuning (and a proper write-up from my side), but I wanted to share the good news immediately 
I donāt have time now but will provide more information on how to update the driver tomorrow, in case you want to test it yourself.
2 Likes
Wow, this is great news! I donāt have time to test it out today, but Iāll hopefully be able to make it into the lab tomorrow and see.
Thanks for your tireless work on this project!!
Ross
Here are the steps I followed to update the driver:
To rebuild the kernel with the modified parameter I used the setup from installing the preempt_rt kernel.
I still had the ~/Downloads/rt_preempt_kernel_install
directory in which the install script downloads and builds the kernel source. So I just went there and modified the parameter in the file linux-XXX/drivers/net/can/peak_canfd/peak_pciefd_main.c
.
Then I modified the install_rt_preempt_kernel.sh
to skip the part that downloads the source and applies the preempt_rt patch, so that I could use it to just rebuild and install the kernel with the modified source. Here is a diff of what I commented:
37,39c37,39
< wget -nc https://mirrors.edge.kernel.org/pub/linux/kernel/v$VERSION_MAJOR.x/linux-$VERSION.tar.xz
< wget -nc http://cdn.kernel.org/pub/linux/kernel/projects/rt/$VERSION_MAJOR.$VERSION_SECOND/older/patch-$VERSION_PATCH.patch.xz
< xz -cd linux-$VERSION.tar.xz | tar xvf -
---
> #wget -nc https://mirrors.edge.kernel.org/pub/linux/kernel/v$VERSION_MAJOR.x/linux-$VERSION.tar.xz
> #wget -nc http://cdn.kernel.org/pub/linux/kernel/projects/rt/$VERSION_MAJOR.$VERSION_SECOND/older/patch-$VERSION_PATCH.patch.xz
> #xz -cd linux-$VERSION.tar.xz | tar xvf -
43c43
< xzcat ../patch-$VERSION_PATCH.patch.xz | patch -p1
---
> #xzcat ../patch-$VERSION_PATCH.patch.xz | patch -p1
I havenāt tested this yet but I assume that the part for re-creating the config can be skipped as well.
Then I simple re-ran install_rt_preempt_kernel.sh
and rebooted.
I also applied a fix in the blmc_drivers package (see pull request) which might be relevant when using a CAN FD card.
Further, I changed the CAN bit timing on the motor board which I am using for testing (see the discussion in the PEAK forum), but this might actually not be relevant.
Unfortunately, Iām still running into issues that sometimes it getās stuck in the āwaiting for board and motors to be readyā phase. It seems the communication on the CAN bus just stops in this case for some reason. Turning the board off and on again and then restarting the application usually helps but itās still an issue I need to investigate.
So to clarify: The driver fix Iām talking about above fixes an issue that once the robot was running, messages are sometimes delayed, resulting in instable motor control. It seems that the āgets stuck in the beginningā-issue is a separate problem :(.
Hi Felix,
No problem ā I understand that these things are very complex and can be difficult to solve. I was wondering if there has been any movement on this in the last couple weeks?
Also, is there anything that I can help out with ā maybe providing some data or other troubleshooting? I would love to step in to help out where I can.
Thanks again,
Ross
Hi Ross,
I donāt really have news yet. The problem (at least for me) seems to be that the CAN communication is unstable and sometimes the CAN controller on the board is shutting down due to some error (probably because sent messages are not acknowledged by the PC for some reason). I have not yet figured out why, though. On the latest tests, I actually wasnāt able to reproduce the issue, i.e. the robot was running fine. We did some modifications on the CAN cable for debugging, not sure if this had some influence.
Right now Iām on vacation, so I didnāt work on it in the last days but I plan to continue when I am back in mid September.
One thing you could test, if not done already:
Connect the board, power it on (without starting the robot software on the PC) and then monitor the CAN interface with candump -e can0
. It should at this point receive one message per second (a status message from the board). If you see this, it means that the communication is generally working. Then start the robot software and see what happens in the can dump. In the cases where it is not working for me, the communication simply stops after a few moments (due to error on the board).
Best,
Felix
Hi Felix,
Sorry itās been a bit since I have responded ā things were hectic with the school year starting back up.
So here is what I did:
I first plugged in one single finger into CAN ports ā4ā and ā5ā. I ran the above command, and then I turned the power onto the system. After which, this was output:
I believe this is similar to what you see, with an error statement appearing once per second.
One odd thing is that I see NO messages coming through can5, which is the other port that I am plugged into.
Second, I started the āsingle_finger_testā while monitoring can4, which output this:
There were no further messages after this, just like you said. It just output this error and then stopped outputting anything.
Hope this helps and let me know if you want me to run any other tests!!
Ross
Hi Ross,
Thanks for running the test. Your issue with can4 looks indeed like the issue I am observing. Unfortunately, I still donāt have a solution for this.
Itās weird that you donāt receive anything on can5. Maybe there is something wrong with the cable or the corresponding board is not flashed correctly?
Best,
Felix
Hi Ross,
whenever you have time, could you repeat the test with the following modification in the initialize_can_bus.sh
?
Append sample-point 0.667 fd off fd-non-iso off
to the ip
commands configuring the bitrate, e.g. for can0:
sudo ip link set can0 type can bitrate 1000000 sample-point 0.667 fd off fd-non-iso off
The sample point that is set there is actually not matching with the one used by the motor boards (which is 0.867) but when I tested today, it was mostly working for me with this setting (interestingly, it did not work when I set 0.867 explicitly).
Iām not yet sure if this is really improving something or if it was just coincidence (this all seems to be a bit non-deterministicā¦) so Iāll do some more tests in the next days but in any case it would be interesting to see if this improves anything for you.
Instead of running single_finger_test
, you can also use this script, which just enables a single board without sending any further commands. When running it, simply pass the can interface (e.g. ācan0ā) as argument.
One important note: While it was always working for me today, running the script linked above, I observed a somewhat weird behaviour when running an application on the whole finger: It always failed on the first attempt after initialising the CAN interfaces. However, when killing the application, power-cycling the finger and trying again (without re-initialising the CAN interfaces) it was working. Could you try that as well?
For all the tests, I had candump
running so I could immediately see when the communication stopped.
1 Like
Hi Felix,
I got into the lab today to run this test. First, I changed the initialize_can_bus.sh script as you stated. I changed each line, even though right now I am only using can4 and can5.
While running the candump -e can4
command, I plugged the boards into the power supply and turned it on, which began producing many messages of can4 010 [1] 01
.
Next, I ran the linked script (start_motors_via_can.sh
) which produced the following stream of errors, after which the communication stopped:
Next, as you stated, I turned the power supply off, waited a bit, turned it back on, and re-ran the start_motors_via_can.sh
script, at which point a never-ending stream of messages began appearing on screen:
Similar to this, but unlike before, the communication never stopped ā it just continued to display errors extremely quickly (100s of lines per second) until killed the process.
Sorry itās been a couple days here ā hope this helps and please let me know if there are any other tests youād like me to run!!
Thanks,
Ross
Thanks, this is good news! Those messages (IDs 020-050) are no errors but the sensor measurements (position, velocity and current) of the motors (i.e. it means that the board is sending data as requested). So it seems you are getting the same behaviour as I.
Can you try again with this configuration?
ip link set can4 type can \
tq 12 prop-seg 26 phase-seg1 26 phase-seg2 27 sjw 2 \
fd off fd-non-iso off
This worked best for me in the last tests.
Do you still have the issue with can5? In case this is fixed, you could try running the single_finger_test
again.
1 Like
Oh, thatās great news, yes! I did get a chance to get into the lab and test this new configuration out.
When running it, both can4 and can5 would receive consistent communication with the 020-050 messages.
I decided to run the single_finger_test and it ran the program, but didnāt really work as expected:
For example, the Single Finger Test Application screen (seen above) would come on, display that there were no errors, and just sort of jostle between -0.01 to 0.01 torque at each joint without changing the position. The actual robot itself didnāt physically move even though you could see it sort of ājoltingā to life when you begin the test.
The āqā button (for āquitā) worked successfully, and after quitting, I could restart the Single Finger Test again and get similar results.
Does your single_finger_test work with this configuration or is there something else wrong with mine?
Thanks!
Ross
Glad to hear that!
The single_finger_test
just runs a position controller to hold the finger at its initial position, i.e. as long as you donāt touch it, there is no motion to be expected. However, when you manually move the finger away from the initial position, the controller should increase the torque, trying to bring it back. Likewise, you should see the position and velocity of the observation changing in this case. Does this happen?
Hi Felix,
Sorry itās been a while since Iāve been working on this project, but Iām back on it again.
When running the single_finger_test, demo_single_finger_position_control, or demo_sparse_position_control, I get the following error.
Start homing.
Reached end stop.
BlmcJointModule::update_homing(): ERROR: Failed to find index with joint [0].
terminate called after throwing an instance of āstd::runtime_errorā
what(): Robot needs to be initialized before applying actions. Run the initialize()
method.
So, the initialize method is being called earlier in these scripts, but for whatever reason it isnāt able to finish initialization, and is throwing an error. Iāve looked at this part in the cpp code, but I havenāt been able to figure out why itās happening. This is a consistent and repeatable error which shows up even after power cycling and restarting the computer.
What Iāve done so far is first confirm that all of the hardware is built correctly ā I took the entire finger apart, double checked all of the connections with a multimeter to make sure they were working and everything was plugged in correctly, etc. I checked every port, every wire, etc. ā everything looked good.
Next, I used the ācandumpā command along with the āstart_motors_via_can.shā script to observe the movement of the motors. I can see the data coming through the CAN ports and changing when I physically move each motor ā this also looks good to me.
It looks like weāve already solved everything with the CAN communication, so I tend to think it might be a hardware issue, but the hardware looks good to me, so Iām a bit confusedā¦Have you ever seen this error or have any idea what this might be?
Thanks again for your help!
Ross
Hi Ross,
Glad to see youāre still on it! 
The homing relies on the encoder index (a special encoder tick that appears only once per revolution). Maybe there is some problem with it on the upper joint (ājoint [0]ā).
The position of the index tick is transmitted via CAN using id 0x060 every time it is observed (see documentation).
So you should be able to check if it is detected using candump
(ideally filtering, so it only shows the 0x060 messages). When you manually move the joint, it should send the message once per motor revolution.
If this is not the case, then it is probably either an issue with the encoder itself (e.g. damaged or dirty encoder disc) or with the cables connecting the encoder with the board. Maybe you can attach an oscilloscope to the āIā line of the encoder cables (see here) to check if the index shows up there.
Wow, thank you so much for this! I didnāt see this documentation get launched, but itās super helpful. Iāll take a look at what else Iāve missed in the last couple of months in Github.
I was able to quickly see that the 0x060 messages are coming through for both motors on can1 (joints 1 and 2), but nothing comes through with id 0x060 through can0 (joint 0). I will take this part of the finger apart and inspect it, test the āIā line, potentially put a different encoder in there.
Iāll let you know what I find.
Thank you for this help!!
Ross