Axis 2.9.0~pre: jogging moves the tool cone, but not the steppers

More
09 Sep 2022 19:31 #251565 by PCW
If you have no WD bite I have no idea other than maybe marginal
timing (try doubling the step length)

What is the host CPU and what step drives are you using?

Please Log in or Create an account to join the conversation.

More
09 Sep 2022 19:41 #251566 by Dr. John
The CPU is an n3160l, 4 CPU's, 1.6GHz. It's one of those micro numa class computers, all in one, no fan, pretty much sealed. Perfect for the environment it will live in next to the mill.

I'm using DM542 drivers. I've been using these for years with my 5i25 I/O card without difficulty. The configuration for the 5i25 was essentially identical to the current configuration with the 7i96s.

I've said this before, but I'll repeat it. I'm monitoring the outputs that drive the stepper drivers with an oscilloscope. I can see clearly whether there are pulses coming out. There aren't any. With the pncconf test, things work as they should.

My sense is that we're going around in circles. I keep hearing the same disbelief, the same suggestions. I KNOW that it SHOULD work.

Does it make sense to try using the 2.8.0 version of the software, modified per BigJohnT's instructions? If so, can you tell me where to find the source code?

Please Log in or Create an account to join the conversation.

More
09 Sep 2022 20:48 - 09 Sep 2022 20:51 #251573 by PCW
I just tried 2.9 (latest) and a 7I96S with your hal/ini files and
saw no issue with step generation.

I would not expect this to be LinuxCNC version related
as nothing in the stepgen has changed recently and there have been
no complaints with similar symptoms.

My guess is that its hardware (7I96S or PC) or
network related somehow (packet loss during setup?)
Last edit: 09 Sep 2022 20:51 by PCW. Reason: correct word

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 14:55 #251617 by Dr. John
Well, this gets harder all of the time. Of course, your test shows that the configuration CAN work, not that it WILL work, at least not all of the time. Alas, that's how all engineered systems are, especially those that are software based and have to work in a huge variety of environments. So, the question is: how to troubleshoot from here?

FWIW, I left axis running all night and this morning found that it had been bitten by the watchdog at some point and showed the following error message:

hm2/hm2_yi96s.0: error finishing read!
iter=4136016

Assuming that messages are passed via the network at a rate of 1000 per second, that's well over an hour of continuous operation without an error. While that's not great, it does suggest that the network link is working "reasonably well."

It is also clear that at least one aspect of axis is working properly: the enable and disable of the drivers. This works as expected all of the time. It's another indication that the network link is ok.

The firmware for the 7i96s is from 15 July 2022 or later, downloaded from the Mesanet site on 25 Aug. Unfortunately, I couldn't find a rev indicator nor a time stamp in one of the root files to give me a better indication of the revision. However, my guess is that it is the most recent. As previously noted, the system works correctly with the test from pncconf. This leads me to believe that neither the firmware is bad, nor is the network failing, at least not frequently enough to be a cause of this problem.

It's always hard to rule out hardware, but the fact that there is a distinct difference in the results from running the test from pncconf and axis amply point in the direction of a software issue. Having not been involved in the design of the software nor having spent much time trying to understand it (alas, I'm a user, not a developer), it's hard for me to know what tests to undertake to try to bisect the problem to further isolate it.

(BTW, I did runtests before running the latest compilation of LinuxCNC. The results are:

Runtest: 246 tests run, 246 successful, 0 failed + 0 expected, 2 skipped

I assume that this is a positive test result.)

So, I need your help in troubleshooting the system. I understand the reticence to believe that there IS a software problem, but if you'll take the evidence as given, then it behooves us to accept the possibility that there is and then the question is how to isolate it or eliminate the possibility that it is related to LinuxCNC.

I look forward to your help in this.

Many thanks in advance.

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 15:29 #251618 by PCW
I still think this is a network issue specific to your hardware
or OS software (as suggested by the communications loss)

The reason I suggest this is that out of 10s of thousands of identical systems
yours is the only on to exhibit these symptoms.

To eliminate possible hardware issues I think it would be best to first, try
another PC with a stock LinuxCNC installation and if then you have identical
issues to return the 7I96S to Mesa for evaluation.

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 18:07 #251627 by Dr. John
With all due respect, the logic of your hypothesis fails me. Here's why:

the hardware/network/and some level of software WORKS!

Yes, it only works in pncconf, but it DOES WORK. It does not fail in the time frame that I do testing, which is much longer than that it takes to determine that axis does NOT WORK. Everything else is still the same but the application with which I'm working.

Your hypothesis does nothing to explain these results. Rather, it creates a considerable amount of additional work that, in the end, may lead nowhere.

As to your comment about 10's of thousands of identical systems, I know that to be a fib. I spoke with the person at Mesanet who convinced me that the 7i96s was the way to go given that I was forced into changing computers. There aren't 10's of thousands 7i96s I/O cards in the universe. It's more likely that there are only a few hundred. I'm one of the lucky few who got to purchase from the first batch.

As to suggesting returning the 7i96s card to Mesanet for evaluation, what can I say? The card apparently works. It enables the stepper motor drivers as it should. It sends out pulses and direction signals as it should in accordance with the commands it receives from the test software in pncconf. The stages physically move in accordance with those as they should. There is no data to suggest that the problem is there.

I won't argue about whether my setup is the only one out there that presents these problems. I don't know. However, given that it does, but the underlying hardware/software seems to work, at least some of the time, suggests that the problem exists elsewhere in the system, mostly likely in LinuxCNC, although the OS can't be ruled out completely.

Is there no other way to try to isolate the problem? It seems to me that we should be focusing on the differences between how communications occur between pncconf and the I/O card and axis and the I/O card. That is where the essential difference lies.

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 18:34 #251629 by PCW
I did not mean there 10s of thousands of 7I96S cards out there (there are only about 500 7I96S cards shipped so far) but there are 10s of thousands of Ethernet FPGA cards using identical step generator and Ethernet firmware.

Your issue is unique, which points to either hardware or perhaps something OS related

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 19:15 #251630 by Dr. John
I accept that my situation is unique. Let's see if we can do a little better in terms of trying to find out why it exists.

First of all, I think my point about the fact that the hardware is functional still stands. It works as expected with pncconf. Let's take it off the list, please.

I haven't ruled out the OS as a cause of the problem. However, it does seem strange that it might affect axis uniquely and not pncconf.

Can you direct me to documentation that might help me understand the interfaces that pncconf uses and likewise those that axis uses? Perhaps by understanding the data flow, we can find out where a difference lies that could result in the symptoms that I'm experiencing.

Please Log in or Create an account to join the conversation.

More
10 Sep 2022 20:14 #251635 by PCW
You can look at the pncconf source and axis source (both in src/emc/usr_infc)
(pncconf uses halrun so ends up using the same low level interface a linuxCNC hal file does)

Since LinuxCNC 2.9+7I96S works for me (and many others) that leaves the
hardware (and possibly OS/network driver) as the unique items.

Please Log in or Create an account to join the conversation.

More
14 Sep 2022 17:39 #251909 by Dr. John
I've had a chance to dig into things based upon PCW's hint about halrun.

My first response is: Wow! Now I know why the resistance to change anything above halrun, in spite of the clear evidence that that's where one should look. The complexity is immense!

Mostly undocumented Python scripts, tcl scripts, bash scripts, C++ and C code all seem to be a part of what is the user I/F portion of Linux CNC, all working "together" in ways that, at least for me, are hard to comprehend. Unfortunately, there doesn't seem to be virtually any code to facilitate debugging either. So, I had to figure out a way to add it myself.

Based upon my read of the code (I don't use any of these, except C, for programming microcomputers, and I avoid that, so my reading skills are limited), I assume that halrun (and halcmd, haltcl) are the funnels through which communications go in order to send commands to the hardware.

Therefore, I've created a trace log that shows the commands that are being sent to halrun and halcmd. I've collected the tracelog for both pncconf and axis.

While running pncconf, I just test the X-axis. As previously mentioned, everything works under pncconf, just as you would expect. In the tracelog, you can see the pins for axistest.0.jog-plus and axistest.0.jog-minus being changed as I push the respective jog "buttons" on the test window (lines 65 - 316 in the attache file).

With axis running, no commands creating such activity can be found. In fact, the only input that shows up in the tracelog when running axis is the loading of the initial configuration file created by pncconf. There is nothing thereafter.

This is consistent with the evidence mentioned previously, namely that testing works through pncconf. This is enough to prove that everything below halrun, including the hardware, is working fine.

It seems like Murphy rules! Of course, Murphy's law is a simplification of the concept that Mama Nature will eventually explore every freedom of action. And, with the complexity of this system design, there are plenty of them.

Yes, I expect to receive even more pushback. (Hey, you broke something! No, you're not capturing everything! Etc.) OK, so please tell me why/how. Meanwhile, the evidence that something is broken in axis seems pretty clear.

Given that the axis code is reasonably well tested in the field by many users (including me with my 5i25 card), let's assume that the code, as such, isn't broken. A higher probability relates to the question of configuration. This has (at least) two aspects. 1) An error made by me during the making of LinuxCNC. 2) The other is related to the differences between the platform upon which I'm working and those that have been tried before.

When compiling the code, I just follow the instructions. I use the --with-realtime=uspace and --with-realtime=uspace options with configure. Everything seems to go swimmingly. There are some warnings, but no errors. Runtests completes, seemingly without errors. So, I think 1) is ok, but let's still leave that on the table as a possibility. 2) seems more likely to me.

Here's my system configuration:

- computer hardware: n3160 CPU, 4 cores, 1.6GHz; 8GB RAM; 60 GB SATA SSD.
- OS: Xubuntu 22.04.1 LTS; Linux 5.15.55-rt48 #2 SMP PREEMPT_RT (real time kernel compiled by me).
- LinuxCNC: 2.9.0~pre0, compiled by me; run in place configuration.
- 7i96s firmware: 7i96s_d.bin; Mesaflash programs and verifies the firmware correctly.
- ethernet link: enp1s0 configured with static IP address 10.10.10.9; direct cable connection between computer and 7i96s card.

Any thoughts about what's different about it that could cause a failure? Has anybody tried a similar configuration, especially Xubuntu 22.04.1? In other words, is the failure in the autogen/configure/make scripts?

Thanks.

Please Log in or Create an account to join the conversation.

Time to create page: 0.249 seconds
Powered by Kunena Forum