kernel default cpufreq governor fails to take action when system is overheating

Bug #746924 reported by Steve Langasek
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Natty
Won't Fix
High
Unassigned

Bug Description

While running a compile on my laptop, I suddenly got a console message that the system was shutting down for poweroff because the critical temperature had been reached. I expected that, when the warning threshold was reached, the cpu frequency would be lowered to avoid overheating; this doesn't appear to be what happened.

Assigning this to the kernel since AFAIK cpu frequency is now handled by default entirely in the kernel through the ondemand governor, so this doesn't seem to be a userspace policy problem.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-7-generic 2.6.38-7.39
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.38-7.39-generic 2.6.38
Uname: Linux 2.6.38-7-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: vorlon 2720 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf2520000 irq 43'
   Mixer name : 'Intel IbexPeak HDMI'
   Components : 'HDA:14f15069,17aa2155,00100302 HDA:80862804,17aa21b5,00100000'
   Controls : 12
   Simple ctrls : 6
Card29.Amixer.info:
 Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 6QHT30WW-1.11'
   Mixer name : 'ThinkPad EC 6QHT30WW-1.11'
   Components : ''
   Controls : 1
   Simple ctrls : 1
Card29.Amixer.values:
 Simple mixer control 'Console',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Thu Mar 31 17:40:20 2011
HibernationDevice: RESUME=UUID=f6ab3c43-61b4-4af7-bf03-fa3b147a1de0
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.1)
Lsusb:
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0a5c:217f Broadcom Corp. Bluetooth Controller
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 3249CTO
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-2.6.38-7-generic root=/dev/mapper/hostname-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-7-generic N/A
 linux-backports-modules-2.6.38-7-generic N/A
 linux-firmware 1.49
SourcePackage: linux
UpgradeStatus: Upgraded to natty on 2011-03-24 (7 days ago)
WifiSyslog:

dmi.bios.date: 08/23/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET52WW (1.22 )
dmi.board.name: 3249CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET52WW(1.22):bd08/23/2010:svnLENOVO:pn3249CTO:pvrThinkPadX201:rvnLENOVO:rn3249CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3249CTO
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO

Revision history for this message
Steve Langasek (vorlon) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Steve Langasek (vorlon) wrote :

This has happened to me at least four more times since I filed this bug.

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
manuell (manuell) wrote :

I have the same problem.

Revision history for this message
Phillip Susi (psusi) wrote :

This may be a dup of #751689. Try running this good test from there and see if you get the same results:

stress -c 4 & watch -n 0.5 'cat /proc/acpi/ibm/thermal ; cat /proc/acpi/ibm/fan | egrep "(speed|level):" ; cat /proc/cpuinfo | grep MHz'

Does the frequency drop once the temperature rises?

Revision history for this message
Steve Langasek (vorlon) wrote :

No, the frequency does not drop when the temperature rises, on either maverick or natty kernel.

Although I never saw this problem when I was running natty, running a machine-stressing compile on a maverick kernel now did get within a couple of degrees of critical before I ^Zed the build. I do have the same problem that Jamie reports of the fans never coming up to full speed; in my case they max out around 3800rpm on 'auto', and about 6350rpm on 'disengaged'. When running at full speed, this is enough to keep the machine 10°C away from the critical trip point.

The last problem is that the acpi thermal thresholds themselves are screwed up. With a single thermal zone on the system, I have:

$ cat /sys/class/thermal/thermal_zone0/trip_point_{0,1}_{temp,type}
100000
critical
127500
passive
$

The 'passive' trip point is higher than the 'critical' trip point. And the cooling devices are all tied to the passive trip point:

$ cat /sys/class/thermal/thermal_zone0/cdev*_trip_point
1
1
1
1
$

So screwy ACPI tables seem to be at least part of the problem.

Revision history for this message
Steve Langasek (vorlon) wrote :

well, the linked cooling devices are all of type: Processor anyway, so that's not really related to the fan.

Revision history for this message
Phillip Susi (psusi) wrote : Re: [Bug 746924] Re: kernel default cpufreq governor fails to take action when system is overheating

On 4/14/2011 2:39 AM, Steve Langasek wrote:
> The last problem is that the acpi thermal thresholds themselves are
> screwed up. With a single thermal zone on the system, I have:

Jamie reported that his was working correctly. I wonder if you are
running a different rev of the bios? Can you see if there is an upgrade
and what rev you are currently running?

> well, the linked cooling devices are all of type: Processor anyway, so
> that's not really related to the fan.

Right; it looks like the fan is normally on hardware auto pilot and that
the hardware just won't ramp it up to full speed. The ACPI tables only
configure the passive trip point to throttle the CPU; fans would be
hooked up to the active trip point if there were one and the ACPI tables
actually complied with the standard.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

marking Won't Fix for Natty, but the Oneiric task will remain. We will evaluate for a fix for SRU if a viable patch is found.

~JFo

Changed in linux (Ubuntu Natty):
status: Confirmed → Won't Fix
penalvch (penalvch)
tags: added: kernel-therm
tags: added: maverick
Revision history for this message
penalvch (penalvch) wrote :

Steve Langasek, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, as a potential WORKAROUND, could you perform the following at the terminal and report the results:
sudo cpufreq-selector --cpu=0 --governor=powersave && sudo cpufreq-selector --cpu=1 --governor=powersave && sudo cpufreq-selector --cpu=2 --governor=powersave && sudo cpufreq-selector --cpu=3 --governor=powersave && cpufreq-info

In addition, could you please test for this with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. As well, please comment on which kernel version specifically you tested.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream', and comment as to why specifically you were unable to test it.

Please let us know your results. Thanks in advance.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Steve Langasek (vorlon) wrote :

This bug is still present in quantal. Sorry for being so slow to respond.

Haven't tested with the upstream kernel yet. This is not a failure scenario I'm particularly eager to test.

Changed in linux (Ubuntu):
status: Expired → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
penalvch (penalvch)
tags: added: quantal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.