Random run-time errors

More
19 Sep 2012 23:42 #24440 by billooms
I'm getting some weird random run-time errors:

"F word missing with inverse time g1 move" -- yes, I'm using inverse time mode but there really is an f-word on each line. Running from the beginning worked OK.

"Multiple z words on one line" -- no, there were no multiple z words. Running from the beginning worked OK.

The display area (with the g-code listing) jumps to the end as if the run was complete in the middle of a run. The run continued normally, but the display area was stuck at the last line.

"bad number format (trailing characters) parsing '-0.340.37170'" -- there was no such line in the g-code. Running again was OK.

System info: Intel D525MW board with 4GB RAM, LinuxCNC version 2.5.0, hyper-threading is disabled, both CPUs are running (see note below), Max Base Period latency 15256, config file has BASE_PERIOD=100000, Gecko G540 drivers.

I initially had disabled the 2nd CPU with isolcpus=1 as described in the wiki. However, several about a month ago I started seeing occasional erratic behavior. I noted the the single CPU was hitting 100% (which I figured was not too good) so I went back to using both processors.

The reason that I have such a high CPU usage is probably because I've got lots of closely spaced points and I'm guessing the trajectory planner is kept busy doing the calculations. By enabling the 2nd CPU, I saw some increase in latency but now I'm not hitting the 100% CPU. When I saw the random errors I was at about 50% on one CPU and 80% on the 2nd CPU.

Some days I can run for hours without a problem, but today I was plagued by these random errors. Any ideas on what causes them? Or how to fix them?

Please Log in or Create an account to join the conversation.

More
20 Sep 2012 08:30 #24447 by ArcEye
Replied by ArcEye on topic Re:Random run-time errors
Hi Bill

config file has BASE_PERIOD=100000

First thing I would look at is the base period. If it is struggling to produce all the pulses required this could be impacting on other areas.

Max Base Period latency 15256

In theory you could run with a base period of about 20K. In practice it should be more like 30 - 40K. The gecko 540s are well documented, so you can calculate it precisely.
linuxcnc.org/docs/2.5/html/motion/tweaking_steppers.html

In any case it is probably 2 to 3 times slower than it needs to be, 100K is the default stepconf figure, deliberately slow because everything should run with it.

CPU was hitting 100% (which I figured was not too good) so I went back to using both processors.

That doesn't necessarily follow, the theory behind the isolcpus and cpu hog, is that if the CPU is kept fully employed on the main task, it does not take other data into the cache and thus runs faster because the data it needs is already in cache. By reducing to 1 CPU this ensures full employment and prevents tasks being spawned to another CPU, with its own cache.

As a first step, I would reinstate isolcpus and try tuning the base period and assess what impact that has on your problem.

regards

Please Log in or Create an account to join the conversation.

More
20 Sep 2012 14:55 #24456 by billooms
Replied by billooms on topic Re:Random run-time errors
Thanks for the detailed response. Today, I'll boot with isolcpus=1 and run the machine through it's motions on the same files without actual cutting to see what happens.

I was wondering why stepconf gave such a large base period. From the calculations I was thinking that a lower number should work with the gecko 540. Your explanation is appreciated.

About a month ago, the problem I had with isolcpus=1 was that there was a step cut into my work as if some small offset had suddenly been introduced. It almost looked as if the steppers had lost count. I stopped the run and went back to the point where I had done a touch-off and everything was OK -- no actual offset or loss of count. It wasn't a matter of steppers losing count from pushing too hard into the work because the step was the wrong way -- a step into the work (not out of the work).

I'll let you know what I find after doing some runs with isolcpus=1.

Please Log in or Create an account to join the conversation.

More
20 Sep 2012 15:16 #24459 by BigJohnT
Replied by BigJohnT on topic Re:Random run-time errors
ArcEye wrote:

In any case it is probably 2 to 3 times slower than it needs to be, 100K is the default stepconf figure, deliberately slow because everything should run with it


Stepconf will adjust that number down if you increase the step rate required.

John

Please Log in or Create an account to join the conversation.

More
20 Sep 2012 17:09 #24488 by billooms
Replied by billooms on topic Re:Random run-time errors
This morning, running with isolcpus=1, the source of the random events seems to coincide with network activity. I generate my g-code on another design computer networked to the D525MW. Occasionally, when I write a large file from the design computer to the D525MW I will get errors similar to that experienced yesterday.

It's interesting to note that the network activity does not cause any hiccup in the latency. I ran the latency test for a while and wrote a number of large files over the network and the Max Base Period latency never exceeded 5086.

I'm guessing that when the CPU is running at or near 100% and some network activity occurs, then something happens to cause these random errors.

With isolcpus=1 and keeping BASE_PERIOD=100000 I'm getting CPU usage of about 50% when nothing is running (i.e. Axis is displayed but no stepper movement). When moving a larger distance in a single g-code instruction, the CPU usage jumps up to about 70% presumably from generating stepper pulses. I get max CPU (close to 100%) whenever I run about 50 to 100 g-code instructions per second.

I'm going to dust off the old Sony PC I used to use and see if I get similar results.

Please Log in or Create an account to join the conversation.

More
20 Sep 2012 19:35 #24492 by billooms
Replied by billooms on topic Re:Random run-time errors
The old Sony PC also will also generate the occasional random error when writing large files to it over the network. So it's not some peculiarity of the D525MW board. (Good!)

Moral of the story: Don't write big files while doing a run that's pushing close to 100% CPU.

I can live with that restriction.

Please Log in or Create an account to join the conversation.

More
21 Sep 2012 08:14 #24500 by ArcEye
Replied by ArcEye on topic Re:Random run-time errors
Good that you have identified the source of the problem.

There is still a lot of scope for tuning your base period by the look of it.
You should get smoother operation and be able to increase rapids if you require to

regards

Please Log in or Create an account to join the conversation.

More
21 Sep 2012 13:40 #24512 by billooms
Replied by billooms on topic Re:Random run-time errors
Yes, I agree that I could do some changes on the base period.

At the present setting of BASE_PERIOD = 100000 and SCALE = 20000.0 I'm limited to a MAX_VELOCITY = 0.25 which is OK for my application. I don't need to go any faster.

I can make the base period smaller with my G540 driver, but won't that just take up more CPU load? By changing to 30K won't that just call the step generator 3X as often?

You mention smoother operation -- can you explain this aspect a bit more? If I don't need the faster speed, I'm thinking that calling the step generator more often won't produce any difference?

Thanks for the assistance.

Please Log in or Create an account to join the conversation.

More
21 Sep 2012 14:55 - 21 Sep 2012 14:56 #24518 by PCW
Replied by PCW on topic Re:Random run-time errors
With a a 100000 ns thread time (= 100 usec period or 10 KHz frequency), pulses can only be generated at 100usec intervals. so for example at the maximum 10 KHz step rate, a step is generated every base thread, If your step rate was 5 KHz, a step will be generated every other base thread.

Where this causes a problem is intermediate speeds. For example at a 9 KHz step rate you will get 9 step pulses in a row and then a skipped pulse. This gives you the correct _average_ step rate but now you have an undesirable 1 KHz vibration component added to the motion (the skipped pulse happens at 1 KHz). If you decrease your base thread period the magnitude of these unwanted vibrations will decrease (at a given speed)

These variations in step rate can trigger resonances that cause stalls and limit the maximum speed

So choosing the base thread period is a compromise between smoothness and CPU cycles burned on I/O
Last edit: 21 Sep 2012 14:56 by PCW.

Please Log in or Create an account to join the conversation.

More
21 Sep 2012 15:31 #24519 by ArcEye
Replied by ArcEye on topic Re:Random run-time errors
A far better technical explanation from Peter than I could have made.

By changing to 30K won't that just call the step generator 3X as often?

No because you only need a certain number of steps to move a finite distance, they will just be generated 3 times quicker(ish)

regards

Please Log in or Create an account to join the conversation.

Time to create page: 0.176 seconds
Powered by Kunena Forum