Installing Linuxcnc and Debian Bookworm on problematic hardware (eg. Realtek NIC

19 Feb 2023 06:20 - 19 Feb 2023 06:24 #264795 by nf1z
A follow-up to my earlier post about the N5105-based mini-PC.
After doing a more thorough latency test, I got some very high numbers (not far short of 1 mS jitter).  I didn't see how the NIC could cause the symptoms (disabling the interface doesn't make any difference), but I ended up with a messed up system (somehow I "upgraded" from gcc-12 to gcc-10, which prevented the r8168-dkms package installing), so I decided to do a clean build closely following Rod's instructions to see if fixing the NIC driver helped latency.

First surprise was that I could not install the recommended weekly build ISO.  grub would not get past the grub prompt and the grub shell didn't even see the root partition.  I couldn't fix it, so I tried the debian-bookworm-DI-alpha1-amd64-netinst iso, and that worked.  This release (6.1.0-3) includes support for the UHD graphics, but the RTL8821ce WiFi driver needed to be built from source again. Using Synaptic, I installed linux-headers-6.1.0-3-rt-amd64 and linux-image-6.1.0-3-rt-amd64 from the bookworm repository, along with r8168-dkms, linux-uspace, linux-uspace-dev and mesaflash (2.9.0~pre1+git20230208.f1270d6ed7).

I did a quick test with Axis and a 7i96s on the bench.  It seemed to work Ok the first time, but when restarting Axis I got the "read finish" error (servo period 1mS).  Restarting the PC was needed to avoid the error after exiting Axis.  This aside, it seemed to work, and I got the following results:
sudo chrt 99 ping -i.001 -q
--- ping statistics ---
9269 packets transmitted, 9269 received, 0% packet loss, time 9281ms
rtt min/avg/max/mdev = 0.054/0.186/1.632/0.164 ms

I don't know what the root cause of the "read finish" error is, but it sounds to me that the inability to clear it is a driver bug. I did find that increasing the servo period to 2mS in pncconf eliminated the "read finish" error, even after restarting Axis.

Out of interest, I installed a TP-link UE300, a $10 USB ethernet dongle using the RTL8153. This gave me the following results:
sudo chrt 99 ping -i .001 -q
--- ping statistics ---
11455 packets transmitted, 11455 received, 0% packet loss, time 11454ms
rtt min/avg/max/mdev = 0.271/0.349/0.876/0.031 ms

The min and average RTTs are higher than for the RTL8168, which I suppose is due to the USB, but the max is much lower, perhaps suggesting the RTL8168 driver needs some work? Maybe I don't have it installed correctly, or maybe even have a hardware issue; I'd be interested to see what numbers other people get.

As another comparison, my previous system using Debian 4.19.0-23 and LinuxCNC 2.6.1 and a very old AMD 79C970 NIC gives the following:
sudo chrt 99 ping -i .001 -q
--- ping statistics ---
rtt min/avg/max/mdev = 0.245/0.275/0.497/0.019 ms

My surmise is that built-in RTL8168H ethernet will probably work as well as the USB dongle or the older NIC, if the "read finish" error can be avoided by increasing the servo period. I plan on substituting the mini PC for the current PC on my mill, so we'll see if that's so.
Last edit: 19 Feb 2023 06:24 by nf1z. Reason: typo, formatting
The following user(s) said Thank You: rodw

Please Log in or Create an account to join the conversation.

19 Feb 2023 13:09 #264805 by chris@cnc
I found a way to check "hardware-irq-coalesce-rx-usecs" status. 
sudo ethtool -c "your net card"
will show this and 
sudo ethtool -C "your net card"  rx-usecs 0
will change the hardware-irq-coalesce-rx-usecs. 

Please Log in or Create an account to join the conversation.

20 Feb 2023 00:17 #264841 by nf1z
Tried this command on the UE-300 USB dongle, and it appears to reduce average ping RTT by about 15%. However, the -c and -C commands are not available for the driver for the built-in RTL8168H. Turns out the r8168-dkms driver is a "New API" driver (version 8.051.02-NAPI). From my quick googling, it seems NAPI is an alternative to IRQ coalescing, which may or may not be better, but it means ethtool can't be used to fine tune the algorithm. We get what we get, I suppose.

Please Log in or Create an account to join the conversation.

20 Feb 2023 16:02 - 20 Feb 2023 16:20 #264876 by PCW
"error finishing read" means that too many sequential read timeouts have occurred.
You can change the threshold and attack/decay timing (man hm2_eth)

It can be cleared in hal but this requires resetting the watchdog and
any downstream watchdogs (like sserial devices) before continuing
Last edit: 20 Feb 2023 16:20 by PCW.
The following user(s) said Thank You: nf1z

Please Log in or Create an account to join the conversation.

Time to create page: 0.110 seconds
Powered by Kunena Forum