7i96 issue read/write

More
05 Mar 2023 12:52 - 05 Mar 2023 13:35 #265910 by evengravy
Hi,

I run two cnc machines from a single PC box, one cnc lathe one cnc router, both are running 7i96. Each machine has independent 7i96 and PSU's etc. I never need to run these at the same time, so use one PC machine to run them (switching the network cable and LCNC instance as necessary)

Recently, I have been getting sporadic errors with both machines with:

hm2/hm2_7i96.0: error queuing read! iter="number"
or
hm2/hm2_7i96.0: error queuing write! iter="number"

This happens on both the cnc router and lathe (with different network cables) so, I expect it is something with the local machine/networking.

The strange thing is, this works fine until recently. Nothing has changed on the machine (it is not network/internet active) so no updates or software changes.

Is there some initial things I can do to test what might be happening, bios settings are all good from what I can see. I have used this exact machine for LCNC for many years and it is solid, although 7i96 is recent ish to me, only on and off use for the past 12 months or so.

Latency figures are very low, but I can run new tests to check that out to see if something might be found there. No hyper threading enabled. Motherboard is a gigabyte, I have read on my limited searching that certain kernels with Debian can result in this issue with certain network hardware, is there something there I can look at? 

I am running 2.9..0-pre0-3879-g9a26e69fe (I need dual synchronised axes in the Y for my router).

I just find it quite strange that this has just started to occur. And, on both machines would suggest it is the networking. This sounds very unlikely but, is there any potential for issues with temperature? I'm running a long program just now and it 'so far' is running just fine, earlier when the space was cold, it errors out. I seem to encounter it more when space is cold, maybe a long shot and I can't think of why that would matter. 5V supplies are all over specified and test out fine.... Any things I could try?

CPU: E-350D Dual core (isolcpus set to 1 for lowest latency here)
Network: Realtek GbE Lan 

Thanks, best, John

Edit: running latency test as we speak, so far Servo is 12k, Base 10k. I'll leave it to run for a while to see if I get any spikes. Although I am not sure that makes a lot of difference here. I ran two long programs when the machine was warm, dry run, no issues. V. Strange.
Last edit: 05 Mar 2023 13:35 by evengravy. Reason: typo correction

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 14:16 #265915 by evengravy
Replied by evengravy on topic 7i96 issue read/write
I ran a few tests after changing ini settings.

Increased servo from 1,000,000 to 2,000,000

Changed joint P values from 1000 to 500 (a suggestion read elsewhere)

One program ran fine, the next had same issue.

I have disconnected all vfds during tests to eliminate noise from equation as best as possible. Still no joy. If there is a solid recommendation for an Ethernet card/chipset for debian/mesa im happy to get one to see if that remedies the situation.

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 15:00 #265916 by PCW
Replied by PCW on topic 7i96 issue read/write
If no software was changed, those errors sound like a hardware
failure somewhere, perhaps a loss of link?

does dmesg show any link up/down transitions? ( other than at host or 7I96 power up )

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 15:10 #265918 by evengravy
Replied by evengravy on topic 7i96 issue read/write
Yes. It does seem to be a loss of connection. No physical issues with wiring but I will check as you say and report back. In cases where it occurs, lcnc halts all comms to drivers (no movement possible) restarting lcnc resolves without having to touch anything physical. Seems so many packets fail and it halts. Im away from workshop atm but will check asap

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 15:39 - 05 Mar 2023 15:41 #265921 by evengravy
Replied by evengravy on topic 7i96 issue read/write
Typical error steam for ref. Edit. Photo too large. Ill send later
Last edit: 05 Mar 2023 15:41 by evengravy.

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 16:14 #265923 by PCW
Replied by PCW on topic 7i96 issue read/write
The error you get does suggest something odd with the host hardware.
Normally if you lose the link, you will get "error finishing read"

Here's the result of either setting the interface down
or simple pulling the RS45:

hm2/hm2_7i92.0: error finishing read! iter=2782471
hm2/hm2_7i92.0: error finishing read! iter=2782471

I did this at least 20 times and never got any queue related errors

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 16:20 #265924 by evengravy
Replied by evengravy on topic 7i96 issue read/write
Right yes. Here it goes between, error queuing read and error finishing write. Almost sequentially but some cases the read error happens twice in a row. Since this happen on both 7i96 that I have it seems unlikely to be that, to me. I really don't think it is a physical issue with connectivity. It happens on two cables to two machines. I suspect the nic, or kernel strangeness. If anyone can recommend a pci intel card or chipset. I think I'd like to get one at least to rule out some things

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 16:31 #265925 by PCW
Replied by PCW on topic 7i96 issue read/write
The errors are odd in that they are both related to Ethernet transmit failing
(so unrelated to latency) I would not expect a kernel issue unless you changed
the kernel.

Are you using the10.10.10.10 IP address?

If you want to try a different NIC, I would choose a Intel based card
The following user(s) said Thank You: evengravy

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 17:23 - 05 Mar 2023 17:27 #265928 by evengravy
Replied by evengravy on topic 7i96 issue read/write
Yes. That was my impression too. I would need to double check the IP, its been a while since I configured these but I am 99% sure they are on 10.10.10.10 address. I will double check. Are there any particular intel nic chipsets that are recommended? I know they generally work well with Linux.

The reason I mentioned kernel is it seems similar to this, in my small amount of searching. github.com/LinuxCNC/linuxcnc/issues/927

I haven't changed the kernel, no. But maybe I would benefit from an update of the master. I don't want to change that just yet.
Last edit: 05 Mar 2023 17:27 by evengravy.

Please Log in or Create an account to join the conversation.

More
05 Mar 2023 17:28 #265929 by PCW
Replied by PCW on topic 7i96 issue read/write
Any Intel 100BT or 1000BT should be OK

The kernel error linked is a latency issue, This does not seem to be.

Please Log in or Create an account to join the conversation.

Moderators: PCWjmelson
Time to create page: 0.348 seconds
Powered by Kunena Forum