Seeking some help in finding cause of Linuxcnc freezing

More
16 May 2022 20:07 #243107 by Apollo
Dear forum members,

I have started a trial to realize a LinuxCNC based DC servo setup, as a trial for an intended self-build CNC mill. While doing so I have frequently visited this forum and the Linux CNC documentation for info, which already helped me a lot. Unfortunately my LinuxCNC installation keeps crashing after a small period of time. Initially I thought this was because I used a demo simulation, however even now I have completely configured a MESA 7i76e using Pncconfig I still run into the same issue. I tried various things based on the information I found here and elsewhere on internet, however I am new to using Linux and thus far none of the things I tried were successful. Via this route I hope someone with more experience in debugging hardware is willing to give me a couple of hints... I will try to start describing the issue I encounter: When I start Linux all go's well and I can do what I like (such as writing to this forum). I can also start-up LinuxCNC and if I do nothing after this will stay responsive for a short while. However after some time (couple of minutes) the program freezes and with it all of Linux. The only thing which will still move is the mouse pointer. When I zoom in and out a couple of time just after I started LinuxCNC I get the impression the error occurs faster and usually the system freezes after only a few times scrolling with the mouse wheel. Latency test gives something like 9000 for jitter on servo thread and 14000 for base thread.

The workstation I use to run this LinuxCNC installation is an HP Z400 workstation with an NVIDIA GeForce 660 GTX 660 Ti graphics card (this HP motherboard has no onboard video). The latter is a "usual suspect" if I understand well, however I also read that as long as the mouse cursor still responds the cause can't be the video card. The Linux version installed is SMP PREEMPT RT Debian 4.19.146-1 (2020-09-17).
After some more reading I thought that the freezing might be caused by overheating of the CPU. I changed the fan idle speed in the HP bios and monitored the cpu temperatures using lm-sensors and the "watch sensors" command in a terminal. This didn't show a significant increase after starting LinuxCNC and during the time up to the next freeze (around 55 deg. C, with high = 80 deg C). So it doesn't look like CPU heating is the issue....
I then found some exhanges on CPU idle states, the so called 'cstates'. Using some online source I found out that the CPU idle driver used on this installation is "intel_idle". Based on further information I found I added the kernel parameters "intel_idle.max_cstate=0" (to disable the intel_idle driver) and added the kernel parameter "idle=poll". Unfortunately this didn't solve the issue of LinuxCNC freezing after a while. Now I am more or less lost... Which is a real pitty because I do think the MESA 7i76e is working (mesaflash on terminal provides nice feedback) and by now I have a single DC servo working with a DG4S-16035 driver and connected to one of the mesa step/dir channels. I would love to get this connected to LinuxCNC and get some confidence in that I will be able to make this work before I buy additional hardware.

Any suggestions in how to do further efficient fault finding is highly appreciated. Note that it might be something "obvious" related to Linux as I am a very very unexperienced user on this operating system...
 

Please Log in or Create an account to join the conversation.

More
16 May 2022 22:36 #243115 by tommylight
Remove all but one memory module, test all modules one by one.
--Power off the PC before removing or installing memory DIMM's

Please Log in or Create an account to join the conversation.

More
17 May 2022 04:28 #243128 by katedoyle

Dear forum members,

I have started a trial to realize a LinuxCNC based DC servo setup, as a trial for an intended self-build CNC mill. While doing so I have frequently visited this forum and the Linux CNC documentation for info, which already helped me a lot. Unfortunately my LinuxCNC installation keeps crashing after a small period of time. Initially I thought this was because I used a demo simulation, however even now I have completely configured a MESA 7i76e using Pncconfig I still run into the same issue. I tried various things based on the information I found here and elsewhere on internet, however I am new to using Linux and thus far none of the things I tried were successful. Via this route I hope someone with more experience in debugging hardware is willing to give me a couple of hints... I will try to start describing the issue I encounter: When I start Linux all go's well and I can do what I like (such as writing to this forum). I can also start-up LinuxCNC and if I do nothing after this will stay responsive for a short while. However after some time (couple of minutes) the program freezes and with it all of Linux. The only thing which will still move is the mouse pointer. When I zoom in and out a couple of time just after I started LinuxCNC I get the impression the error occurs faster and usually the system freezes after only a few times scrolling with the mouse wheel. Latency test gives something like 9000 for jitter on servo thread and 14000 for base thread.

The workstation I use to run this LinuxCNC installation is an HP Z400 workstation with an NVIDIA GeForce 660 GTX 660 Ti graphics card (this HP motherboard has no onboard video). The latter is a "usual suspect" if I understand well, however I also read that as long as the mouse cursor still responds the cause can't be the video card. The Linux version installed is SMP PREEMPT RT Debian 4.19.146-1 (2020-09-17).  wordle game
After some more reading I thought that the freezing might be caused by overheating of the CPU. I changed the fan idle speed in the HP bios and monitored the cpu temperatures using lm-sensors and the "watch sensors" command in a terminal. This didn't show a significant increase after starting LinuxCNC and during the time up to the next freeze (around 55 deg. C, with high = 80 deg C). So it doesn't look like CPU heating is the issue....
I then found some exhanges on CPU idle states, the so called 'cstates'. Using some online source I found out that the CPU idle driver used on this installation is "intel_idle". Based on further information I found I added the kernel parameters "intel_idle.max_cstate=0" (to disable the intel_idle driver) and added the kernel parameter "idle=poll". Unfortunately this didn't solve the issue of LinuxCNC freezing after a while. Now I am more or less lost... Which is a real pitty because I do think the MESA 7i76e is working (mesaflash on terminal provides nice feedback) and by now I have a single DC servo working with a DG4S-16035 driver and connected to one of the mesa step/dir channels. I would love to get this connected to LinuxCNC and get some confidence in that I will be able to make this work before I buy additional hardware.

Any suggestions in how to do further efficient fault finding is highly appreciated. Note that it might be something "obvious" related to Linux as I am a very very unexperienced user on this operating system...

check the module again
 

Please Log in or Create an account to join the conversation.

More
17 May 2022 15:51 #243175 by andypugh

 Unfortunately my LinuxCNC installation keeps crashing after a small period of time. 
 

What are the symptoms when it has crashed? You mention that the mouse pointer works. 

Does motion still work? The motion parts of LinuxCNC will generally continue to operate very reliably, even of the GUI has become unresponsive. 

The only time I have had GUI crashes have ben due to a dodgy wireless keyboard. Leaving the dongle out except when needed has fixed that (it's running a touchscreen gui so this isn't an issue) 
 

Please Log in or Create an account to join the conversation.

More
18 May 2022 20:15 #243292 by arvidb
This type of problem can be difficult to track down. I would start by installing ssh server ('sudo apt install openssh-server' on debian) and log in from another machine. Then watch the system log while the machine crashes ('sudo journalctl -f'). If you're lucky, this might give some hint to what's wrong.

If you have trouble repeating the issue, or don't want to log in from another machine with ssh, it's also possible to check the logs from earlier boots, e.g. 'sudo journalctl -b -1' displays the log from the previous boot. Note that, depending on the type of issue, the logs from earlier crashes might not have been saved to disk so I'd prefer to follow the current log via ssh.
The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

More
19 May 2022 20:01 #243359 by Apollo
Dear Tommylight and katedayle,

Thank you for your suggestions, based on your reply I opened up the HP workstation and did some further research on internet about RAM DIMM modules and how these can be configured for the Z400 motherboard I have. While taking the DIMM modules out I noticed they were not identical and initially I had a hard time finding out why, as each module stated it was 2GB. After some searching I found out that there are "dual rank" and "single rank" modules and that my motherboard supports a total of number of 8 "ranks". Terminology is still a bit vague for me but you can recognize the dual rank ones as these have chips on both sides. On mine it states either 2Rx8 or 1Rx8. As I have 3 of each I can install 8GB (in total = 3x1+1x2= 5 ). I also read that for optimal performance some claim it is best not to mix dual rank and single rank DIMM's. HP provides info on which of the six slots to use for 3 DIMM's with 2GB each (these are 1, 3, 5 and take care the numbering of the slots is a bit odd...). I have tested a whole bunch of combinations, but unfortunately none of them solved the problem. With only one 2GB DIMM module installed in the first slot I got the impression that the system kept working the longest, however this time the ui really got scrambled in the end (in the graphical part of the window several images got copied over each other and numbers became unrecognizable) . Most other combinations showed the same result, namely that the complete system became unresponsive, while I was only able to still move the mouse. Even if I completely replaced 6GB (the 3 modules in single rank) by 6 GB from other modules (the 3 dual rank ) the issue remained. From this I tend to conclude that the DIMM modules are not the (only) issue.

None the less thanks for providing these hints, as a starter this really helps a lot. For now I will try to follow the other suggestions provided in the hope this gives me further hints on what is going on.
Attachments:

Please Log in or Create an account to join the conversation.

More
19 May 2022 20:16 #243360 by Apollo
dear arvidb,

Thanks for this suggestion, I will first try to retrieve a log from previous boot to see if this exists. If not I will need to do some more setting up, as unfortunately this HP workstation is the only Linux machine I have (I know this might be a delicate subject, but all my profession related pc work is on windows machines). I guess it will also be possible to connect a windows machine to the HP LinuxCNC workstation, but I will need to "dig-in" to this as I have no clue how to set this up at the moment. Luckily a lot can be found on the internet... If I manage to get a log report of the crash I will probably come back, to ask for some advise how to interpret the content and try out new things based on this info.

Please Log in or Create an account to join the conversation.

More
19 May 2022 20:32 #243361 by Apollo
Dear andypugh,

Unfortunately I am not able to determine whether the motion part is still running or not, as I haven't been successful yet in setting up the MESA 7i76 + driver + servo to a point where it should actually turn (I did tune the servo + driver with the software tool from cncdrive, but didn't connect the step-dir signals to the mesa board yet). I first wanted to see if the MESA 7i76 board works with linuxCNC.

The feedback you gave on the wireless mouse made me realize that for the mouse and keyboard (wired USB) I do use USB ports on the front side of the PC Z400 housing (which has a separate connection to the motherboard via some device), while there are also USB ports directly mounted on the motherboard to which I can connect the mouse and keyboard (these are on the back side of the housing). I will quickly check if this makes any difference (probably not, but I can check this easily.

During the crash actually the complete GUI "freezes": nothing changes anymore and the system no longer reacts when I push on any button. The only thing which does still responds is the mouse icon pointer (the small arrow). When I move the mouse this arrow moves like it should do. Thus I can still hoover the mouse to a location on the screen where menu's or buttons are located, but the system no longer reacts. Also the graphical screen no longer reacts (I can't pan, zoom, rotate etc.). Hope this helps to explain what is going on.....

Please Log in or Create an account to join the conversation.

More
19 May 2022 20:41 #243362 by Apollo
Dear all,

Based on the feedback I got above I do have some "new ideas" on what I could try to find out what is causing this crash. Let me try to give a bit more info on the PC, maybe I am just making a beginners error... The Z400 has 2 hard drive disks mounted in it, and one SSD. On one hard disk there is an existing Windows installation (which I can access by selecting it in some sort of bootmenu after I start the machine) the other hard disk drive only has (old) data and on the ssd I actually installed linuxcnc.

Could the way I did this install cause the issue? Could it help to remove the hard disk drives? Or is it perhaps best to install linuxcnc on an ordinary harddisk drive rather than an ssd?

Please Log in or Create an account to join the conversation.

More
19 May 2022 22:00 #243365 by tommylight

Could the way I did this install cause the issue? Could it help to remove the hard disk drives? Or is it perhaps best to install linuxcnc on an ordinary harddisk drive rather than an ssd?

Not really, i doubt it, does not matter.

Please Log in or Create an account to join the conversation.

Time to create page: 0.124 seconds
Powered by Kunena Forum