7i77 lost connection while encoder counts (PREEMPT-RT)

  • lonnox
  • lonnox's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
07 Jul 2025 15:14 #331451 by lonnox
Hi folks, 
i got a problem in a linuxcnc mesa card configuration. if the encoders of the 7i77 counts for a while, the 7i77 lost connection and linuxcnc stops working. Its not possible to calibrate the connected drives because it is not possible to drive more than a few cm before the error comes up.
If the error occurs the green flashing led on the 7i77 turns to constant red and pin hm2_5i25.0.write.time did not change anymore. LinuxCNC itself did only responds very slowly, the rest of the os is working normal. On closing i got an timeout error from linuxCNC.

I think the problem is related to the preempt kernel.

I retrofitted about 20 different machines to linuxCNC with mesa cards custom panels and custom HAL components and i got 2 configurations that shows this error. The configuration system i build to analyse the error consist only of the live image (2.9.4) and 6i25, 7i77 and an encoder.
what did i try so far without a positve result:
  • tried 2 different pc with different mainboards (Biostar B450MH, ASUS Prime A320M-R), ram, ssd and graphic card
  • tried different 7i77 and 6i25 cards
  • tried different encoders (5V, 3V)
  • tried debian bookworm and debian buster (with PREEMPT-RT)
  • tried different jumper settings
  • tried different firmware on 6i25
  • tried different encoder inpust
  • changed configuration string
  • verified that latency is not the problem (less than 50000 servo thread jitter)
  • verified that its not triggered by the mesa watchdog (set it to 10seconds)
  • viewed the journalctl logs (no hints occour that relates to the problem)
  • viewed linuxCNC debug output by running it in the terminal in max debug-level
what i tried with positive result:
  • the problem did not occur on debian wheezy 
  • the problem did not occur on debian bookworm with rtai kernel
  • the problem did not occur on all mainboards

Mayby this is a kind of a solution but it would be the better way, to find what of the preempt kernel causes the problem i quess. So if anyone has an idea how the problem can be solved, please feel free to answer the post.


 

Please Log in or Create an account to join the conversation.

More
07 Jul 2025 15:48 - 07 Jul 2025 15:50 #331454 by PCW
That's odd, might be worth trying a newer Preempt-RT kernel
(current is 6.16)

Does dmesg show any errors?

(clear the dmesg log first, run LinuxCNC and then run dmesg again)

Is there any PCIE power management enabled in the BIOS?
Last edit: 07 Jul 2025 15:50 by PCW.

Please Log in or Create an account to join the conversation.

  • tommylight
  • tommylight's Avatar
  • Away
  • Moderator
  • Moderator
More
07 Jul 2025 17:25 #331467 by tommylight

Hi folks, 
i got a problem in a linuxcnc mesa card configuration. if the encoders of the 7i77 counts for a while, the 7i77 lost connection and linuxcnc stops working.

Most probably weak 5V power supply and/or encoder drawing more than 50-60mA.
Easily checked if you have a DVM with min/max memory, and measuring 5V at the 7i77 connector and at the encoder connector.

Please Log in or Create an account to join the conversation.

More
07 Jul 2025 17:50 #331469 by PCW
pin hm2_5i25.0.write.time did not change anymore.

Suggests this is a PCIE communication issue and probably
not related to 7I77 5V

It may indicate a grounding/EMI issue. Do you have an issue when
the 7I77 in not connected to the rest of the machine?

This is why i asked for a dmesg log as PCIE issues should show up there.
The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

  • lonnox
  • lonnox's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
07 Jul 2025 19:45 #331479 by lonnox
Thanks for your quesses. The encoder that is now connected is mikrocontroller that simulates the encoder to trigger the error as fast as possible. The controller uses the single ended mode of the mesa and outputs 3V. I looked up the datasheet of the encoder ic on the 7i77 and 3V is in the ic s range. on a second encoder input i have connected a rotary encoder, that uses the 5V of the 7i77. 
And in the beginning the encoder inputs were connected to a 5V glas scale.
We also tried the encoder outputs of an ESTUN Servo drive.
All gave the same (bad) result.

Please Log in or Create an account to join the conversation.

More
07 Jul 2025 19:54 #331480 by PCW
Its unlikely this is specific to the encoder but maybe to changing PCIE data

Did you check dmesg kernel log?
 

Please Log in or Create an account to join the conversation.

  • lonnox
  • lonnox's Avatar Topic Author
  • Offline
  • New Member
  • New Member
More
07 Jul 2025 20:03 #331482 by lonnox

pin hm2_5i25.0.write.time did not change anymore.

Suggests this is a PCIE communication issue and probably
not related to 7I77 5V

It may indicate a grounding/EMI issue. Do you have an issue when
the 7I77 in not connected to the rest of the machine?

This is why i asked for a dmesg log as PCIE issues should show up there.

EMI was also one of my suggestions, at the time where the 6i25, 7i77 and the PC was part of the machine, so i decide to use shielded wires between the servos and their drives. Wires from drive to mesa are shielded too.
But the problem was the same.
Now i build a system with only the cards, the pc and rotary encoder but the problem is still the same.

Does dmesg show any errors?

Checked dmesg and there maby we have a track, the problem occur immediately after starting linuxCNC, this is the output:

[ 1323.534393] mce: Uncorrected hardware memory error in user-access at fcb01000
[ 1323.534404] Memory failure: 0xfcb01: memory outside kernel control
[ 1323.534405] mce: Memory error not recovered
[ 1323.534528] mce: [Hardware Error]: Machine check events logged
[ 1323.534532] [Hardware Error]: Uncorrected, software restartable error.
[ 1323.534535] [Hardware Error]: CPU:3 (17:8:2) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|UECC|-|Poison|-]: 0xbc002800000c0135
[ 1323.534541] [Hardware Error]: Error Addr: 0x00000000fcb01000
[ 1323.534542] [Hardware Error]: IPID: 0x000000b000000000
[ 1323.534544] [Hardware Error]: Load Store Unit Ext. Error Code: 12, DC Data error type 1 and poison consumption.
[ 1323.534546] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD

Is there any PCIE power management enabled in the BIOS?

Indead there was, but i disabled it before doing dmesg.
 

Please Log in or Create an account to join the conversation.

More
07 Jul 2025 20:53 #331490 by PCW
Looking around, this may be a Ryzen/kernel version issue.

Please Log in or Create an account to join the conversation.

Moderators: PCWjmelson
Time to create page: 0.092 seconds
Powered by Kunena Forum