What's all this about, Fatty?
Damned Linux PC has been running in the background on and off over the past month or so. I'm doing this to try to convince myself that it's robust, having seen a number of worrying issues during that time.
Generally that's involved the screen, mouse and keyboard randomly freezing. This can happen at any moment and the only remedy is a hard reboot. Not quite what you hope for in a CNC controller really.
It doesn't seem to be associated with any particular operation or program - or indeed any actual input from the user. True, it's happened while I've been using it but equally it's more often happened when it's been left to run by itself.
I'm no Linux power user by any means, so what can I do to troubleshoot this? What's available by way of tools?
Crash logs?
Yes, there are crash logs created within Linux that might help to give some insight but when I looked at these, they were (for me) overwhelmingly detailed and voluminous and the text outputs looked pretty inconsistent ie look as if they had been written by a whole host of authors. I'm guessing each module of code came with its own debug message. So I deleted mine, to allow new messages to be somewhat visible.
Update Manager - was that the answer?
These crashes could happen minutes or even a day or more apart. In line with good / obvious practice, it seemed sensible to do an(other) system update before getting more serious about this. This happened 2 days ago and in itself caused a crash towards the end of the update process, followed by another crash after restarting. However, since then, I've been unable to provoke any further misbehaviour.
Fixed, Fatty?
Can it really have been fixed by that last update? The problem with intermittent faults is that you can never be 100% certain you've fixed them. Another failure could happen seconds after you celebrate the problem being "fixed", particularly if you didn't actually find a clear "smoking gun" root cause and implement a convincing corrective action.
Stress testing!
Yes, let's try to stress test the system by loading it up with a range of programs and then run it flat out for several days.
- Run LinuxCNC Axis Sim with an endless program loop.
- Stream music videos on YouTube
- Gobble up more memory with Visual Studio Code
- Run the "stress" program from Linux command prompt from time to time
- Report system status (up time, core temp etc)
1st Sept, 09:00
Fan Speeds (RPM): N/A
Repos: No active apt repos in: /etc/apt/sources.list
Active apt repos in: /etc/apt/sources.list.d/additional-repositories.list
1: deb http: //cnc.beaglebrainz.net/mintcnc/ bionic 2.8-rtpreempt
Active apt repos in: /etc/apt/sources.list.d/hardkernel-ppa-bionic.list
1: deb http: //ppa.launchpad.net/hardkernel/ppa/ubuntu bionic main
Active apt repos in: /etc/apt/sources.list.d/kelebek333-kablosuz-bionic.list
1: deb http: //ppa.launchpad.net/kelebek333/kablosuz/ubuntu bionic main
Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list
1: deb http: //packages.linuxmint.com tricia main upstream import backport #id:linuxmint_main
2: deb http: //archive.ubuntu.com/ubuntu bionic main restricted universe multiverse
3: deb http: //archive.ubuntu.com/ubuntu bionic-updates main restricted universe multiverse
4: deb http: //archive.ubuntu.com/ubuntu bionic-backports main restricted universe multiverse
5: deb http: //security.ubuntu.com/ubuntu/ bionic-security main restricted universe multiverse
6: deb http: //archive.canonical.com/ubuntu/ bionic partner
Active apt repos in: /etc/apt/sources.list.d/vscode.list
1: deb [arch=amd64,arm64,armhf] http: //packages.microsoft.com/repos/code stable main
Info: Processes: 217 Uptime: 1d 19h 02m Memory: 7.50 GiB used: 2.19 GiB (29.2%) Init: systemd
v: 237 runlevel: 5 Compilers: gcc: 7.5.0 alt: 7 Client: Unknown python3.6 client
inxi: 3.0.32
- I suspect I didn't have the "micro code" installed until I ran the Update Manager. This includes (comprises?) the i915 graphics driver.
- I've disabled the Intel HD graphics "PSR" implementation that came with the "micro code", as it seems to have an "open bug".
- I've also disabled that KWallet password app that was causing errors.
- No crashes since then, 4 days later, with LinuxCNC and Youtube running continuously.
No comments:
Post a Comment