Natty fails to boot on Gigabyte GA-MA78GPM-UD2H

Bug #658955 reported by Angus Turnbull
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
linux (Ubuntu)
Natty
Invalid
Medium
Unassigned
Oneiric
Invalid
Medium
Unassigned
Precise
Invalid
Medium
Unassigned

Bug Description

Both Lucid and Maverick kernels suspend to RAM very slowly (5 minute delay) on the first suspend after a cold boot. It appears that the kernel is hanging for that time on suspending the AHCI device, per the DMESG logfile included. I first noticed the problem halfway through the updates for the 6 months I ran Lucid, and it's still present in Maverick release (Karmic seemed to be OK). Interestingly under Lucid at least the first suspend showed the delay, and subsequent suspends did not.

I have a 1Tb Western Digital Green HDD installed and a SH-S223F DVDRW drive, both running under AHCI with no ATA devices configured in the BIOS. After suspend the system often comes up normally in several seconds; sometimes it doesn't resume, showing disk access errors and requiring a hard reset.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-22-generic 2.6.35-22.33
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: angus 1815 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
   Mixer name : 'Realtek ALC889A'
   Components : 'HDA:10ec0885,1458a102,00100101'
   Controls : 38
   Simple ctrls : 21
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdffc000 irq 19'
   Mixer name : 'ATI RS690/780 HDMI'
   Components : 'HDA:1002791a,00791a00,00100000'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Date: Tue Oct 12 17:56:35 2010
HibernationDevice: RESUME=UUID=8c630d28-1be9-412c-9fa2-c55f86d413da
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
MachineType: Gigabyte Technology Co., Ltd. GA-MA78GPM-UD2H
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-22-generic root=UUID=95f1b306-00f2-4060-a24e-9a9f783ae9ad ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_NZ.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.38
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
dmi.bios.date: 10/08/2009
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F7
dmi.board.name: GA-MA78GPM-UD2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF7:bd10/08/2009:svnGigabyteTechnologyCo.,Ltd.:pnGA-MA78GPM-UD2H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-MA78GPM-UD2H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-MA78GPM-UD2H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :
Revision history for this message
Angus Turnbull (angus-twinhelix) wrote : Re: Slow suspend to RAM with clocksource TSC on Maverick AMD64

I've found a solution. The problem is that the clocksource TSC is unstable (not just the AHCI driver). Adding:

clocksource=hpet

to the kernel boot command results in the first suspend working properly.

summary: - Slow suspend of AHCI driver on Lucid/Maverick AMD64
+ Slow suspend to RAM with clocksource TSC on Maverick AMD64
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
summary: - Slow suspend to RAM with clocksource TSC on Maverick AMD64
+ Natty fails to boot on Gigabyte GA-MA78GPM-UD2H
Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

The problem has worsened in Natty -- the default kernel will not boot from the hard disk (freezes after you select it in GRUB2) unless you add "clocksource=hpet" to the command line, in which case it boots perfectly well.

For anyone else, you need to edit "/etc/default/grub" and add "clocksource=hpet" to the linux command line and then run "sudo update-grub" at a terminal.

Can anyone else replicate this? Is it possible to add this motherboard to a kernel workaround-list?

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

I have solved the problem -- upstream bug on the Linux kernel bugzilla linked.

Changing the kernel boot parameters to solely "acpi_skip_timer_override" in /etc/default/grub results in the motherboard reliably booting and suspending/resuming. This appears to affect a wide range of desktop Gigabyte AM2+ desktop motherboards as well as other manufacturers (ASUS and Toshiba mentioned in the upstream).

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

Here's the patch I have submitted upstream to LKML: https://lkml.org/lkml/2011/11/22/511

This automatically works around the issue and allows system using this motherboard to boot and suspend/resume successfully, by detecting the motherboard in a quirks list. Hope this helps.

tags: added: patch
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Angus - Your patch needs correct provenance, e.g., your Signed-off-by.

Changed in linux (Ubuntu):
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Oneiric):
assignee: nobody → Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Natty):
status: New → In Progress
Changed in linux (Ubuntu Oneiric):
status: New → In Progress
Changed in linux (Ubuntu Precise):
status: Confirmed → In Progress
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Patch for kernel ACPI code to work around this issue" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

Apologies for missing that, I've added the Signed-off-by and it passes scripts/checkpatch.pl. Patch reattached.

I've no feedback yet from my upstream submissions, any chance of getting the patch in there? The above linux-kernel list, and also the acpi-support list: http://article.gmane.org/gmane.linux.acpi.devel/51512

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Angus - is this still an issue with 3.2 and 3.3 kernels ? If so, I'll see if I can get your patch upstreamed.

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

Interestingly, while it failed in 3.2.0 (that's what the above patch is against) the motherboard boots out of the box with 3.3-rc6. There's a new DMESG error however on the first and subsequent resumes:

[ 695.000381] Enabling non-boot CPUs ...
[ 695.000381] Booting Node 0 Processor 1 APIC 0x1
[ 695.000381] smpboot cpu 1: start_ip = 9a000
[ 694.690076] Calibrating delay loop (skipped) already calibrated this CPU
[ 694.690076] [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
[ 694.690076] perf: IBS APIC setup failed on cpu #1
[ 695.031768] NMI watchdog enabled, takes one hw-pmu counter.
[ 695.031828] Switch to broadcast mode on CPU1
[ 695.031941] CPU1 is up

which repeats for CPUs 2 & 3. Full DMESG attached. Searching LKML reveals that Robert Richter from AMD has written a bunch of driver code in this area, but this error appears to be more than the expected AMD "bug" notices for this chipset family. Still, the system is operating normally and suspends/resumes OK.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Angus - If you're interested I can get Joe Salisbury to work with you to bisect down to the first bad commit. It'll require you to install 6-12 kernels.

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

Brief update: Tweak is still necessary with the Precise default kernel, but the final 3.3.0 kernel I tested under Oneiric was OK.

Looks like there's some upstream work in the area of buggy AMD motherboard BIOSes: https://lkml.org/lkml/2012/3/28/249

If you want me to try for a bisect I can do it; although is it needed if the latest upstream is alright?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Angus - the problem with the quirk patch is that it only works for your specific model of motherboard. It looks like upstream has a more general solution since I cannot find a quirk in the upstream code that references your model. Given these Gigbyte MBs are problematic in general, its likely beneficial to bisect to the upstream solution. Once we know what the patch(es) are, then we can evaluate whether a backport is possible.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Angus,

To bisect, we first need to find the last know good kernel version, and the first known bad kernel version. It sounds like v3.3 final does not have this bug. For this bug, we are looking for the commit that fixed this bug upstream, not when the regression was introduced, we will perform a "Reverse bisect".

To start, can you test the latest v3.2 stable version[0] to see if it has the bug?

Also, did you test any of the v3.3 release candidates, or just the final version of v3.3?

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.16-precise/

tags: added: kernel-bug-fixed-upstream kernel-da-key
removed: needs-upstream-testing
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Natty):
importance: Undecided → Medium
Changed in linux (Ubuntu Oneiric):
importance: Undecided → Medium
Changed in linux (Ubuntu Precise):
importance: Undecided → Medium
tags: added: precise
removed: maverick
Revision history for this message
Leszek (bigl-aff) wrote :

I've also this bug on my Gigabyte GA-MA770T-UD3P motherboard with nVidia 9550 and AMD Phenom(tm) II X2 550 Processor. Exactly it's:

[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu

And my GPU:

VGA compatible controller: NVIDIA Corporation G96 [GeForce 9500 GT] (rev a1)

I get it on every resume. But Ubuntu works OK - it's just a question of few seconds delay after resume, during which i can see black screen with this message.

Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

I've tested 3.2.20-generic and the bug is still there. The first fixed version was 3.3-rc6 (both from the Mainline PPA).

Changed in linux:
status: Confirmed → Incomplete
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
assignee: Tim Gardner (timg-tpi) → nobody
Changed in linux (Ubuntu Natty):
assignee: Tim Gardner (timg-tpi) → nobody
Changed in linux (Ubuntu Oneiric):
assignee: Tim Gardner (timg-tpi) → nobody
Changed in linux (Ubuntu Precise):
assignee: Tim Gardner (timg-tpi) → nobody
Changed in linux (Ubuntu):
status: In Progress → Triaged
Changed in linux (Ubuntu Natty):
status: In Progress → Triaged
Changed in linux (Ubuntu Oneiric):
status: In Progress → Triaged
Changed in linux (Ubuntu Precise):
status: In Progress → Triaged
Changed in linux:
status: Incomplete → In Progress
Revision history for this message
billstei (billstei) wrote :

I have an Asus M4A88TD-V EVO/USB3 motherboard, running Ubuntu 12.10 and 3.5.0-26-generic kernel, and was getting the typical "[Firmware Bug]: cpu 1, try to use APIC500 ..." error message(s) on a Resume from suspended state. Adding the "acpi_skip_timer_override" option to /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT= causes the error message(s) to stop. As I recall this error message goes back at least as far as Unbuntu 12.04, and probably further, for my hardware.

Revision history for this message
billstei (billstei) wrote :

Re: #17 -- Now I am getting the error message even with acpi_skip_timer_override option.

Revision history for this message
dino99 (9d9) wrote :

Some releases have reached eol https://wiki.ubuntu.com/Releases

Changed in linux (Ubuntu Natty):
status: Triaged → Invalid
Changed in linux (Ubuntu Oneiric):
status: Triaged → Invalid
penalvch (penalvch)
tags: added: latest-bios-f7 needs-upstream-testing
Revision history for this message
Angus Turnbull (angus-twinhelix) wrote :

FYI, I no longer own this hardware (sold and upgraded)

Revision history for this message
penalvch (penalvch) wrote :

Angus Turnbull, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/658955/comments/20 regarding you no longer have the hardware. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu Precise):
status: Triaged → Invalid
no longer affects: linux (Ubuntu)
Changed in linux:
importance: Medium → Undecided
status: In Progress → New
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.