Tuesday, March 3, 2009

Weird crash with linux 2.6.26

I've had a rather funny crash for the second time now. It goes like this: the interface is freezing (processing event extremely slowly), it is nearly impossible to do anything (but apparently, everything works, just extremely slowly !). Kernel keep on saying this:

Mar  3 18:56:50 tanyaivinco kernel: [ 2385.443069] hdd: status timeout: status=0xd0 { Busy }
Mar  3 18:56:50 tanyaivinco kernel: [ 2385.443069] ide: failed opcode was: unknown
Mar  3 18:56:56 tanyaivinco kernel: [ 2393.777714] hdd: status timeout: status=0xd0 { Busy }
Mar  3 18:56:56 tanyaivinco kernel: [ 2393.777714] ide: failed opcode was: unknown
Mar  3 18:57:01 tanyaivinco kernel: [ 2398.982260] hdd: status timeout: status=0xd0 { Busy }
Mar  3 18:57:01 tanyaivinco kernel: [ 2398.982260] ide: failed opcode was: unknown

One of the funny things is that this time like the previous one, at the beginning of the problem, there is always this kernel message, for which I could find no documentation:

Mar  3 18:56:35 tanyaivinco kernel: [ 2367.012218] Clocksource tsc unstable (delta = 4686267423 ns)

If anyone ever came across this, or has clue, I'm interested !

4 comments:

SEJeff said...

Yeah this means the tsc clocksource is unstable.

It is often due to hardware throttling software like powernowd, intel speedstep, or something that adjusts the cpu clockrate.

You just need to change your cpu's clocksource. Go into the computer's BIOS and see if you see something like "HPET" or "High Precision Timers". If so enable it.

Then the linux side of things...
sudo cat /sys/devices/system/clocksource/clocksource0/available_clocksource

You can take any of the available clocksources and echo them into /sys/devices/system/clocksource/clocksource0/current_clocksource.

HPET is the highest precision clocksource and is suggested unless you care about gettimeofday(7) latency (if you do you'll know about this already). If you see hpet in your available_clocksource, do this:
# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource.

Then in your grub.conf at the very end of your kernel's command line, put:
clocksource=hpet.

You can also do acpi_pm if hpet isn't available and tsc isn't stable. jiffies is a fallback and isn't very reliable.

Let me know if you have any further questions.

---
Jeff Schroeder
http://www.digitalprognosis.com

Vincent Fourmond said...

Wow ! Many thanks ! I'll try as soon as I have an opportunity (not before saturday, I'm afraid). I'll keep you posted here.

Vincent Fourmond said...

For the record, it seems that this bug was due to a hardware problem: one of the IDE CDROM drives would simply switch off, which lead to wild system instability. The clock wasn't the real problem.

SEJeff said...

Good to know. Thanks for the followup. Either way,