battery and thermal dissappears after few minutes of booting. on msi ex600 laptop

Bug #286169 reported by anagor
6
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
High
Andy Whitcroft

Bug Description

laptop: msi ex600 Intel core 2 duo t7100 on intel 965m chipset.
Upon booting the battery state is shown normally and the info on battery is correct, but after a few minutes of work the
cat /proc/acpi/battery/BAT1/state shows
present: no
In addition the cat /proc/acpi/thermal_zone/THRM/temperature command after booting shows the correct temperature of around 56C, but then again after a few minutes of work it always shows 146C.
The both of the problems are happening at the same time, so I believe they are of the same problem.
I've attached the dmesg output, in which you can see that the battery is detected and working.

Oct 19 23:13:21 mark-laptop kernel: [ 15.096408] ACPI: Battery Slot [BAT1] (battery present)

also the lshal -m shows the following:

23:16:54.113: computer_power_supply_battery_BAT1 property battery.voltage.current = 12510 (0x30de)
23:17:24.106: computer_power_supply_battery_BAT1 property battery.voltage.current = 12508 (0x30dc)
23:17:54.111: computer_power_supply_battery_BAT1 property battery.voltage.current = 12509 (0x30dd)
23:18:24.111: computer_power_supply_battery_BAT1 property battery.voltage.current = 12508 (0x30dc)
23:18:54.107: computer_power_supply_battery_BAT1 property battery.voltage.current = 12509 (0x30dd)
23:20:24.108: computer_power_supply_battery_BAT1 property battery.voltage.current = 12508 (0x30dc)
23:20:54.107: computer_power_supply_battery_BAT1 property battery.voltage.current = 12509 (0x30dd)
23:21:54.107: computer_power_supply_battery_BAT1 property battery.voltage.current = 12508 (0x30dc)
23:22:54.111: computer_power_supply_battery_BAT1 property battery.charge_level.percentage = 96 (0x60)
23:22:54.116: computer_power_supply_battery_BAT1 property battery.charge_level.current = 45792 (0xb2e0)
23:22:54.118: computer_power_supply_battery_BAT1 property battery.reporting.current = 4240 (0x1090)
23:22:54.120: computer_power_supply_battery_BAT1 property battery.voltage.current = 37008 (0x9090)
23:23:24.006: computer_power_supply_battery_BAT1 property battery.charge_level.percentage = 0 (0x0)
23:23:24.009: computer_power_supply_battery_BAT1 property battery.charge_level.design = 48000 (0xbb80)
23:23:24.012: computer_power_supply_battery_BAT1 property battery.charge_level.last_full = 43760 (0xaaf0)
23:23:24.014: computer_power_supply_battery_BAT1 property battery.charge_level.current = 0 (0x0)
23:23:24.017: computer_power_supply_battery_BAT1 property battery.reporting.current = -1 (0xffffffff)
23:23:24.018: computer_power_supply_battery_BAT1 property battery.reporting.rate = -1 (0xffffffff)
23:23:24.019: computer_power_supply_battery_BAT1 property battery.voltage.current = 10000 (0x2710)

As you can see after a few minutes of work the battery status gets requested and after that the battery is lost.
if I try to reload the battery module, dmesg will show that the battery absent. And acpi no longer detects if the AC is plugged in or not.

the kernel is 2.6.27-7-generic #1 SMP Fri Oct 17 22:24:30 UTC 2008 x86_64 GNU/Linux
if I boot to the previous 2.6.27-6 kernel all works fine.
If you need any other information I will be happy to provide it.
If needed I can check/install other kernels, or recompile it with some specific options, and with a little help I will be able to apply patches to kernel, again if needed.

Revision history for this message
anagor (anagor) wrote :
Revision history for this message
anagor (anagor) wrote :

What bugs me, is that the battery detected all right during the boot,
and on a side note.
As long as the status of the battery changes, what I mean is, if I to start working on battery to drain it a little, and then plug the AC in, as long as the battery is not fully charged the status will be correct, but after a few minutes of complete charge, when the state no longer changes, the battery state will disappear.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi anagor,

Which specific 2.6.27-6 kernel version was working and which specific 2.6.27-7 kernel this began failing. For example was it working in say 2.6.27-6.9 and began failing with say 2.6.27-7.10 ? You can find this specific version by doing 'cat /proc/version_signature' . That can at least help narrow down where a patch may have been introduced that causes this regression. The changelog can be seen at:

https://edge.launchpad.net/ubuntu/+source/linux/

It's likely the best solution would be to do a git bisect to narrow down the offending patch. Would you be comfortable doing a git bisect - http://www.kernel.org/doc/local/git-quick.html#bisect . I've previously tried to outline performing a git bisect on an Ubuntu kernel in another bug report - https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/273266/comments/6 so that may help. Let me know if you have any questions. Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
anagor (anagor) wrote :

Hi Leann,
The first part of your question will be easy to answer, in 2.6.27-6.9, which is the last of the 27-6 it was still working.
But on the very first of 27-7, which was 2.6.27-7.10 I think, it wasn't, and it still doesn't work on 2.6.27-7.13.
So I'm pretty sure it was some patch introduced in the 2.6.27-7.10 version.
As to the second part of your question, I can try to do git bisect myself, it will probably take me until the begging of the next week, but I should try nevertheless.
However, should you prefer a more quick solution, I suspect that the offending patch is this one:
linux (2.6.27-7.10) intrepid; urgency=low

  [ Alexey Starikovskiy ]

  * SAUCE: ACPI: EC: do transaction from interrupt context
    - LP: #277802

you can see the original here:
http://lkml.org/lkml/2008/10/11/43

My suspicion with this patch is because I know that on this line of laptops from MSI, the BIOS EC region is not standard compliant, as you can see in the following:
http://bugzilla.kernel.org/show_bug.cgi?id=9697
and
http://bugzilla.kernel.org/show_bug.cgi?id=9627

So there is a good chance that this specific patch by Alexey Starikovskiy is the problematic one, as it is the only one I saw that changes the ACPI EC directly.
So if you are not too busy :) and have the opportunity to create a kernel without this patch before I do, please let me know and I will try it.
Thanks a lot.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi anagor,

Let me try to build a .deb for you to install with that patch removed so you can test it. Thanks.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Ok, here's a .deb of the kernel to try.

Revision history for this message
anagor (anagor) wrote :

Ohh, I'm sorry, but this is i386 and I need a 64 bit one :(/
If you think that it will be quicker for you to attach the source, I will recompile it myself rather then for you to compile and upload it again.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Ok, that'll work better since I don't have a 64bit arch machine here. I've attached the patch that we want to try reverting. Basically do the following:

$ sudo apt-get install git-core linux-kernel-devel fakeroot build-essential makedumpfile
$ git clone git://kernel.ubuntu.com/ubuntu/ubuntu-intrepid.git
$ cd ubuntu-intrepid
$ patch -p1 -R < 286169.patch
$ CONCURRENCY_LEVEL=2 AUTOBUILD=1 NOEXTRAS=1 fakeroot debian/rules binary-generic skipabi=true

Once that finishes building there should be something like linux-image-2.6.27-7-generic_2.6.27-7.14_i386.deb one directory level above the ubuntu-intrepid directory.

The following wiki also described building the Ubuntu kernel source - https://help.ubuntu.com/community/Kernel/Compile

Hope that helps. Thanks.

Revision history for this message
anagor (anagor) wrote :

Well, I'm happy to say that it works, without this patch, the kernel is correctly detects GPE storm and starts working in polling mode instead of interrupt mode.

Oct 24 23:29:13 mark-laptop kernel: [ 0.600610] ACPI: EC: non-query interrupt received, switching to interrupt mode
Oct 24 23:29:13 mark-laptop kernel: [ 0.608051] ACPI: EC: GPE storm detected, disabling EC GPE
Oct 24 23:29:13 mark-laptop kernel: [ 1.108037] ACPI: EC: missing confirmations, switch off interrupt mode.
Oct 24 23:29:13 mark-laptop kernel: [ 1.153390] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
Oct 24 23:29:13 mark-laptop kernel: [ 1.153390] ACPI: EC: driver started in poll mode

So the battery and temperatures are detected correctly and everything is working ok.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks anagor. So the patch in question was actually cherrypicked form the upstream kernel. But I'll notify the kernel team of the regression it introduced. However, I'm curious if you'd be willing to test the upstream vanilla kernel as well to verify if this issue exists upstream as well and if reverting the patch resolves the issue. Information on how to build the upstream kernel from git can be found at https://wiki.ubuntu.com/KernelTeam/GitKernelBuild . Thanks.

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: Incomplete → Triaged
Revision history for this message
anagor (anagor) wrote :

I will try with the vanilla kernel, though I'm sure of the outcome, I also know that the kernel team won't consider the bug seriously enough until it can be proved that the vanilla have the same behavior.

Thanks.

Revision history for this message
anagor (anagor) wrote :

Ok, I've tried the vanilla kernel, currently its 2.6.28-rc2.
Sure enough the problem exists there, here is the relevant exert from dmesg:

[ 0.156016] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 0.172212] PCI: MCFG area at e0000000 reserved in ACPI motherboard resources
[ 0.185424] PCI: Using MMCONFIG at e0000000 - efffffff
[ 0.193727] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 0.193727] ACPI: EC: driver started in interrupt mode

full dmesg.log is attached.
also, this kernel I've loaded without any restricted drivers, like nvidia for example, so it won't taint the kernel.
So the problem is the mentioned patch. and it still exists in 2.6.28-rc2 vanilla.

Thanks.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi anagor,

Thanks for all the testing and the feedback. It's greatly appreciated. Since you've confirmed this also exists in the upstream kernel I think it would be good if an upstream bug report would also be opened at http://bugzilla.kernel.org . That way the upstream developers are also notified of the issue. So if it's not too much extra trouble, care to open the upstream report as well since you have the affected hardware and can confirm the issue with the patch. We can also link this report to the upstream report. Thanks.

Changed in linux:
milestone: none → intrepid-updates
Revision history for this message
anagor (anagor) wrote :

Ok, I've submitted this bug to kernel.org's bugzilla,
you can see it here:
http://bugzilla.kernel.org/show_bug.cgi?id=11892

I don't know how to add a watch for remote bug reports, but you probably do :)

Thanks.

Revision history for this message
anagor (anagor) wrote :

Ohh, I see it was added automatically.

Changed in linux:
status: Unknown → In Progress
Changed in linux:
status: In Progress → Fix Released
Revision history for this message
anagor (anagor) wrote :

I think we can close/mark as fix released this bug safely now.
The patches that fixed this issue are in the mainline kernel now, and I tested the 2.6.27-10 ubuntu kernel which have those patches in it already, and it all works well.

Thanks.

Revision history for this message
Andy Whitcroft (apw) wrote :

@anagor -- thanks for the heads up. I have read through the upstream bug, good work there with those guys to find and fix this one. I have checked our 2.6.27-10.20 release and it does indeed contain the upstream fix from upstream. That came through as part of the 2.6.27.7 stable updates.

Moving this Fix Committed as this is in -proposed.

Changed in linux:
assignee: ubuntu-kernel-team → apw
status: Triaged → Fix Committed
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Marking this Fix Released as the current 2.6.27-11.31 kernel is currently available through -updates.

https://edge.launchpad.net/ubuntu/+source/linux

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in linux:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.