7i77 lost connection while encoder counts (PREEMPT-RT)

lonnox
Topic Author
Offline
New Member

07 Jul 2025 15:14 #331451 by lonnox

7i77 lost connection while encoder counts (PREEMPT-RT) was created by lonnox

Hi folks,
i got a problem in a linuxcnc mesa card configuration. if the encoders of the 7i77 counts for a while, the 7i77 lost connection and linuxcnc stops working. Its not possible to calibrate the connected drives because it is not possible to drive more than a few cm before the error comes up.
If the error occurs the green flashing led on the 7i77 turns to constant red and pin hm2_5i25.0.write.time did not change anymore. LinuxCNC itself did only responds very slowly, the rest of the os is working normal. On closing i got an timeout error from linuxCNC.

I think the problem is related to the preempt kernel.

I retrofitted about 20 different machines to linuxCNC with mesa cards custom panels and custom HAL components and i got 2 configurations that shows this error. The configuration system i build to analyse the error consist only of the live image (2.9.4) and 6i25, 7i77 and an encoder.
what did i try so far without a positve result:

tried 2 different pc with different mainboards (Biostar B450MH, ASUS Prime A320M-R), ram, ssd and graphic card
tried different 7i77 and 6i25 cards
tried different encoders (5V, 3V)
tried debian bookworm and debian buster (with PREEMPT-RT)
tried different jumper settings
tried different firmware on 6i25
tried different encoder inpust
changed configuration string
verified that latency is not the problem (less than 50000 servo thread jitter)
verified that its not triggered by the mesa watchdog (set it to 10seconds)
viewed the journalctl logs (no hints occour that relates to the problem)
viewed linuxCNC debug output by running it in the terminal in max debug-level

what i tried with positive result:

the problem did not occur on debian wheezy
the problem did not occur on debian bookworm with rtai kernel
the problem did not occur on all mainboards

Mayby this is a kind of a solution but it would be the better way, to find what of the preempt kernel causes the problem i quess. So if anyone has an idea how the problem can be solved, please feel free to answer the post.

Please Log in or Create an account to join the conversation.

PCW
Offline
Moderator

07 Jul 2025 15:48 - 07 Jul 2025 15:50 #331454 by PCW

Replied by PCW on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

That's odd, might be worth trying a newer Preempt-RT kernel
(current is 6.16)

Does dmesg show any errors?

(clear the dmesg log first, run LinuxCNC and then run dmesg again)

Is there any PCIE power management enabled in the BIOS?

Last edit: 07 Jul 2025 15:50 by PCW.

Please Log in or Create an account to join the conversation.

tommylight
Away
Moderator

07 Jul 2025 17:25 #331467 by tommylight

Replied by tommylight on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

lonnox wrote:
Hi folks,
i got a problem in a linuxcnc mesa card configuration. if the encoders of the 7i77 counts for a while, the 7i77 lost connection and linuxcnc stops working.

Most probably weak 5V power supply and/or encoder drawing more than 50-60mA.
Easily checked if you have a DVM with min/max memory, and measuring 5V at the 7i77 connector and at the encoder connector.

Please Log in or Create an account to join the conversation.

PCW
Offline
Moderator

07 Jul 2025 17:50 #331469 by PCW

Replied by PCW on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

pin hm2_5i25.0.write.time did not change anymore.

Suggests this is a PCIE communication issue and probably
not related to 7I77 5V

It may indicate a grounding/EMI issue. Do you have an issue when
the 7I77 in not connected to the rest of the machine?

This is why i asked for a dmesg log as PCIE issues should show up there.

The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

lonnox
Topic Author
Offline
New Member

07 Jul 2025 19:45 #331479 by lonnox

Replied by lonnox on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

Thanks for your quesses. The encoder that is now connected is mikrocontroller that simulates the encoder to trigger the error as fast as possible. The controller uses the single ended mode of the mesa and outputs 3V. I looked up the datasheet of the encoder ic on the 7i77 and 3V is in the ic s range. on a second encoder input i have connected a rotary encoder, that uses the 5V of the 7i77.
And in the beginning the encoder inputs were connected to a 5V glas scale.
We also tried the encoder outputs of an ESTUN Servo drive.
All gave the same (bad) result.

Please Log in or Create an account to join the conversation.

PCW
Offline
Moderator

07 Jul 2025 19:54 #331480 by PCW

Replied by PCW on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

Its unlikely this is specific to the encoder but maybe to changing PCIE data

Did you check dmesg kernel log?

Please Log in or Create an account to join the conversation.

lonnox
Topic Author
Offline
New Member

07 Jul 2025 20:03 #331482 by lonnox

Replied by lonnox on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

pin hm2_5i25.0.write.time did not change anymore.

Suggests this is a PCIE communication issue and probably
not related to 7I77 5V

It may indicate a grounding/EMI issue. Do you have an issue when
the 7I77 in not connected to the rest of the machine?

This is why i asked for a dmesg log as PCIE issues should show up there.

EMI was also one of my suggestions, at the time where the 6i25, 7i77 and the PC was part of the machine, so i decide to use shielded wires between the servos and their drives. Wires from drive to mesa are shielded too.
But the problem was the same.
Now i build a system with only the cards, the pc and rotary encoder but the problem is still the same.

Does dmesg show any errors?

Checked dmesg and there maby we have a track, the problem occur immediately after starting linuxCNC, this is the output:

[ 1323.534393] mce: Uncorrected hardware memory error in user-access at fcb01000
[ 1323.534404] Memory failure: 0xfcb01: memory outside kernel control
[ 1323.534405] mce: Memory error not recovered
[ 1323.534528] mce: [Hardware Error]: Machine check events logged
[ 1323.534532] [Hardware Error]: Uncorrected, software restartable error.
[ 1323.534535] [Hardware Error]: CPU:3 (17:8:2) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|UECC|-|Poison|-]: 0xbc002800000c0135
[ 1323.534541] [Hardware Error]: Error Addr: 0x00000000fcb01000
[ 1323.534542] [Hardware Error]: IPID: 0x000000b000000000
[ 1323.534544] [Hardware Error]: Load Store Unit Ext. Error Code: 12, DC Data error type 1 and poison consumption.
[ 1323.534546] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD

Is there any PCIE power management enabled in the BIOS?

Indead there was, but i disabled it before doing dmesg.

Please Log in or Create an account to join the conversation.

PCW
Offline
Moderator

07 Jul 2025 20:53 #331490 by PCW

Replied by PCW on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

Looking around, this may be a Ryzen/kernel version issue.

Please Log in or Create an account to join the conversation.

lonnox
Topic Author
Offline
New Member

08 Jul 2025 12:52 #331517 by lonnox

Replied by lonnox on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

Thanks for your fast reply. i read some linux forum posts to and try to play again with some cpu bios settings, but with no positive result. trying a newer preempt rt kernel seems to be some additional work.
So i think switching to rtai is the best choice at the moment.

Thanks again for your suggestions.

Please Log in or Create an account to join the conversation.

tommylight
Away
Moderator

08 Jul 2025 13:17 #331518 by tommylight

Replied by tommylight on topic 7i77 lost connection while encoder counts (PREEMPT-RT)

In BIOS set the PCI-E that the 6i25 is plugged in to GEN1, save and reboot.
Also, other hardware can cause such issues on PCI-E lanes as main ones (usually 20 lanes) are on the CPU, so setting the graphic port to GEN2 or GEN3 might also help, especially if you have a GEN4 board with GEN3 graphic.
And since most errors mention memory, try disabling XMP/EXPO.

Please Log in or Create an account to join the conversation.

Moderators: PCW, jmelson

Time to create page: 0.074 seconds