Serious Hard Drive memory leak bug in Debian

More
07 Aug 2014 19:09 #49575 by akb1212
I would like to report a bug I have encountered when using LinuxCNC 2.6 on Debian Wheezy. I used the hybrid ISO file downloaded from LinuxCNC.org to install.

It's installed on the computer mentioned in this thread: linuxcnc.org/index.php/english/forum/39-...of-2-avail-pci-slots

It's a memory leak that is filling up the HD to a point where it's no longer possible to log in to the system in a normal way.

I left the HD with 2.5.4 installations I had from earlier in the computer and installed a 400Gig HD I had lying around to put the new version on. It had a 330Gig partition that was wiped and the new system was installed on that space. Grub configured everything to work, and I was able to boot in to all versions of Linux I had on the system, on the other HD installed.

I'm having problems getting Pncconf to make configurations that work, so I have been doing a lot of testing with different .hal and .ini files to try to work out what's wrong.

Then at some point after a few days of work like that I was no longer able to start the Debian Linux installation!
In the text on the screen while booting up one of the lines was saying:
[warn] Root file system has insufficient free space: mounting tmpfs on /tmp ... (warning)

I had to use my mobile camera to take a photo of the screen before it disappeared to be able to read it. If needed I can post this picture, although it's a bit blurry. So if the server compresses it I don't think it will be readable.

After this the computer only show a blinking cursor in the upper left hand corner, nothing more happens.

I was able to start a console by pressing CTRL + ALT + F1. But console is of no use to me, so I can't do much from there. I tried to start X by typing startx, but it didn't work.
I tried to start it in recovery mode, and from that it worked to issue startx. I was then able to look around, but I didn't know what to do. In retrospect I could have used that to remove the offending file.

The only other idea I had was to try a reinstall. I decided to reinstall on top of the previous install without wiping it. After all there were files in there I had been working on for days and I didn't want to loose them.
And that made it possible for me to start up Debian Linux again. Everything was back as before the problem. For a while that was....

At one point (before I reinstalled) I tried to figure out why it was saying there is no space by finding out how much free space there was on the disks. But this is where I found Linux is not very user friendly. None of the ways I know from windows gave me a clue of how much space was free on the HD. I expected to find something in properties, but no..... So in my case I blame Linux' poor file handling user interface. I find it hugely lacking, and I have found there is no way I will be using Linux for any other tasks than LinuxCNC. I'm sorry to say this, but it's my hones oppinion. File handling in general is a major pain! And it shouldn't have to be?

After the reinstall I was able to continue testing, and did that for a while. Then I left my computer running last with everything still working. Then when I came back this morning and did a small edit on an .ini file and tried to save it refused to save. The message was it was unable to save, no reason or anyting (dont remember the exact message). Other things also didn't work. There was 2 updates available, so I tried to install them. But also this failed. It wasn't able to read file permisions or something like that, don't remember exactly.

So the only thing I knew I could do was to reboot. And again it failed to start Linux. Only a blinking cursor in the upper left corner.....

So I started it up in Ubuntu 10.04. This install has Teamviewer application installed on it to be able to remote control it (so I can sit in my living room and monitor the mill in my garage). I expected to have to reinstall again, but this time I figured I need to install another HD. So I wanted to copy out the files I had been working on. Teamviewer has a file transfer module, and I started that. I figured it was best to copy my whole home folder over to my windows laptop (where I'm able to be in control as I want to).
Then it told me that was 260 something Gig of data! One file named .xsesion-errors.old file was all but about 2 Gigs of that. So I figured that file might be the cause of all of this! So I used the teamviewer file manager to delete that file.

Then I did a reboot, and now my Debian Linux install started as normal again!

So there is a huge memory leak bug that is filling up the HD in this in stall! It is filling it up very fast, using only a few days to fill up my whole 330Gig partition.

I think my extensive testing of .ini and .hal files might be the cause of this. I have set debug level in LinuxCNC high in at least one of the installations. And it's writing quite a bit of data each time LinuxCNC is started (or attempted started and failing).
I thought backup files like these were purged regularly to prevent fatal errors like this. But with this install ISO that's not the case!

I don't know if this is the correct place to post a bug report like this. And I'm not sure what I need to include. So if I need to do more or post files please inform me of what I need to do. The implication of this bug is so severe it should be given high priority. And I'd like to do what I can to help.

Anders

Please Log in or Create an account to join the conversation.

More
07 Aug 2014 19:41 #49576 by Todd Zuercher
I am no linux expert, but I am pretty sure you should to look to see what is in that giant error log that's filling your HD, to see what is polluting it, then post back with that info for someone (not me) to be able to help you.

Please Log in or Create an account to join the conversation.

More
07 Aug 2014 20:23 - 07 Aug 2014 20:24 #49577 by ArcEye
Hi

The error is not a memory leak, it is an error log which is not controlled by the look of it

First thing, turn off debugging in all your configs, unless you know quite a lot about Linuxcnc internals, it is not going to tell you anything useful.
The size of the linuxcnc.log can be controlled by rsyslog
www.rsyslog.com/doc/log_rotation_fix_size.html

However this is not about Linuxcnc, or at least not directly, this . xsession-error log problem is quite common
It can be caused by a variety of things, as Todd says, need to know what is in the log to have any idea what that might be.

There are many threads on the net, just google xsession-error huge and you will get these and many more
debian-user.blogspot.co.uk/2008/01/giant...logs-fill-drive.html
ubuntuforums.org/showthread.php?t=1946716
vsido.org/index.php?topic=279.0

The top one gives a Debian specific fix which zeros the file each boot

As far as Linux not being user friendly, you are entitled to your opinions, but mine is windoze is a crock of shite.
Windows users have been spoon fed on some graphical file manager for so long that they have no idea how to use a terminal or the commandline.

To find out how much disc space is free is df
To find out how much memory free etc etc

If you are using samba or similar for a windows share, that is another possible source of the errors.

regards
Last edit: 07 Aug 2014 20:24 by ArcEye.

Please Log in or Create an account to join the conversation.

More
08 Aug 2014 07:20 #49590 by akb1212
You are right, It's not a memory leak in RAM. Hard Drive space consuming bug might be a more precise term for it? But to me a leak in the available HD space is equally problematic.

And having this kind of thing happening for someone not that familiar with Linux is quite scary!

And I'm aware this isn't a LinuxCNC error as such, more a Linux in general error. But this time it seems to be caused by LinuxCNC.

I took a look at the file now, and now it's only 4k in size. So it seems it's behaving now. And since it's not misbehaving now I don't know if it's useful to post it? The one that got big was deleted to be able to load the system again, so I don't know to access any remaining parts of that.

It's odd though... I thought all the messages I would get as far as error and configuration messages was from dmesg. But there are (other) messages in that file that are of use (as they explain details on what is happening during program loading that isn't found by dmesg). The content of dmesg is the info that is given each time LinuxCNC crashes. But I learned that this file (and probably the one with no .old extension as well) actually have info that might help while debugging. Other info than what is given with dmesg. So something good also came out of this. At least for me as I found another source of info I'm trying to get my head around.

BTW, the config file where debugging is turned on is the one I found here on this forum a while back. The comments in it state it's derived from Ted Hyde's original hm2-servo config. And now it's the only one I'm able to make run. So I'm using that as reference and trying to work out why the configs I get with Pncconf doesn't work.
This config is able to find my 7i77 card (I have tested it and found it's putting out voltage on the analogue out) even though it's installed on P2 (and I have reprogrammed my 5i25 to expect the 7i77 on this port). This is how I want it to be wired. But I had no idea that small difference should give me this much extra work! If I had known I wouldn't even consider it.

I'm trying to work out what is preventing the other configs from looking for it on this port (although I'm really not even sure if this is what's making it crash). But I haven't been able to find what entry in the .ini or .hal files that control this. Is this controlled by any of these files, or is this info taken from somewhere else? In that case I'd appreciate to know where. Nothing seems to control this in the config that is working as this config also worked when I had the 7i77 on P3 connector. So it doesn't seem to differentiate.

I know I'm harsh with my comments on Linux' user friendliness. But I'm not a programmer. And I have come to understand that programmers tend to think differently about things like this. And unfortunately for me Linux is still a platform made by programmers with their own ways of thinking in mind when deciding on how things should work.

Anyway, I just wanted to show other users of LinuxCNC that this can happen (and what to do in case they experience it themselves). I have been on this forum and reading about LinuxCNC for many years, but have never heard of this bug. I did find out how to fix it myself, so this was most in the interest of others I posted it. And in case it was a bug that wasn't known.

I will monitor this file (and read it to search for clues on why some of my configs crashes) to see if anything like this happens again. Is there a way to read this kind of huge files? In case it gets huge again?

No samba, remote desktop with Teamviewer is all I need. This computer will only be used for LinuxCNC.

Thanks for clarifying! Now I know I don't need to fear this, and that it's relatively easy to prevent if it keeps happening.

Anders

Please Log in or Create an account to join the conversation.

More
08 Aug 2014 13:35 - 08 Aug 2014 19:35 #49592 by ArcEye
Hi

I can sympathise with your struggles with the Mesa cards, they are very powerful but not easy to get your head around.

If you frame a specific query on the Boards section, I'm sure PCW will assist (he makes them)

No samba, remote desktop with Teamviewer is all I need. This computer will only be used for LinuxCNC.


Anything involving remoting the X display could be involved, attach the error file anyway there could be a clue in there for the future.

There is a huge amount to Linux and no-one knows or can remember it all.

Google-fu is often your best skill, you don't need to know everything, just how to find out.

regards

PS

Didn't see this question initially

Is there a way to read this kind of huge files? In case it gets huge again?


split - look it up on google or do man split in a terminal

Will break it down into files of a specified size. You should not have to look too far, this is liable to be the same error output repeated over and over again.

In extreme circumstances where disk space is totally used up, you can redirect the output of split to another medium, say a usb stick, but that is probably getting too involved.
Last edit: 08 Aug 2014 19:35 by ArcEye.

Please Log in or Create an account to join the conversation.

More
08 Aug 2014 19:19 #49595 by cncbasher
perhaps archiving your config folder and posting , we can help
what errors are you getting , looking at dmesg from a terminal or starting linuxcnc from a terminal window will give many answers
what version of debian or live cd are you using ?

Please Log in or Create an account to join the conversation.

More
08 Aug 2014 19:21 #49596 by cncbasher
perhaps archiving your config folder and posting , we can help
what errors are you getting , looking at dmesg from a terminal or starting linuxcnc from a terminal window will give many answers
what version of debian or live cd are you using ?, and which verson of linuxcnc

Please Log in or Create an account to join the conversation.

Time to create page: 0.112 seconds
Powered by Kunena Forum