[amdgpu] Screen Glitching & Kernel Panic

Bug #1848108 reported by Jarrod Farrell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
xserver-xorg-video-amdgpu (Ubuntu)
New
Undecided
Unassigned

Bug Description

I've recently updated to Eoan, but it seems like some nasty bugs came with it, particularly from the amdgpu package.

It started with visual glitches but it would occasionally freeze the screen with whatever playing in the background and require me to hard-shutdown to escape. Attached is an abridged `journalctl -b -1` showing the lines were the kernal panic occurred. I'll probably attach more as they occur and if it's not redundant.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: xorg 1:7.7+19ubuntu12
ProcVersionSignature: Ubuntu 5.3.0-18.19-generic 5.3.1
Uname: Linux 5.3.0-18-generic x86_64
ApportVersion: 2.20.11-0ubuntu8
Architecture: amd64
CompositorRunning: None
Date: Mon Oct 14 21:27:34 2019
DistUpgraded: 2019-10-13 05:03:20,327 DEBUG entry '# deb http://linux.teamviewer.com/deb stable main # disabled on upgrade to eoan' was disabled (unknown mirror)
DistroCodename: eoan
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GpuHangFrequency: Very infrequently
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15dd] (rev c3) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [17aa:506f]
InstallationDate: Installed on 2019-03-22 (206 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
Lsusb:
 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 003 Device 003: ID 13d3:56a6 IMC Networks Integrated Camera
 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 20KVCTO1WW
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.3.0-18-generic root=/dev/mapper/ubuntu--vg-root ro DEFAULT quiet splash ivrs_ioapic[32]=00:14.0
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to eoan on 2019-10-13 (1 days ago)
dmi.bios.date: 12/07/2018
dmi.bios.vendor: LENOVO
dmi.bios.version: R0UET68W (1.48 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20KVCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrR0UET68W(1.48):bd12/07/2018:svnLENOVO:pn20KVCTO1WW:pvrThinkPadE585:rvnLENOVO:rn20KVCTO1WW:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad E585
dmi.product.name: 20KVCTO1WW
dmi.product.sku: LENOVO_MT_20KV_BU_Think_FM_ThinkPad E585
dmi.product.version: ThinkPad E585
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.99-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 19.2.1-1ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx 19.2.1-1ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.20.5+git20191008-0ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.0.1-1ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20190815-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This looks like one or two bugs. And at least the first one is a kernel bug so reassigning there...

https://launchpadlibrarian.net/446679771/gpf-amdpu

tags: added: amdgpu
summary: - Screen Glitching & Kernal Panic
+ [amdgpu] Screen Glitching & Kernel Panic
affects: xorg (Ubuntu) → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

@vanvugt
That is fine. The screen glitching might be a symptom, but if it still persists after a patch then I'll create a new report exclusively for the screen glitching.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

@kaihengfeng
Following instructions here,
https://wiki.ubuntu.com/Kernel/MainlineBuilds
I've installed generics, rebooted, and was welcomed to the blinding white void.
Hazarding a guess, I typed in my system's encryption password and it subtly changed brightness like it usually did and a mouse cursor appeared. So that's something.
Switching to a virtual console still was white but I was able to blindly login and type in `sudo reboot now` and switch back to a stable kernel.

Removing a workaround employed here,
https://forums.lenovo.com/t5/Other-Linux-Discussions/ThinkPad-E485-E585-Firmware-bug-ACPI-IVRS-table/td-p/4191484
particularly the `ivrs_ioapic[32]=00:14.0` line didn't resolve and just left me hung after GRUB like it did normally.

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

Another kernel panic, this time related to web content.

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

I'm using 5.0.0-31-generic at the moment and it seems to have quelled the issue, including the graphical issues. So I guess the graphical issues is a symptom of a problem with the kernel baring something else having a disagreement with it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That last one in comment #7 does not look GPU related, but just generally memory related. Maybe do a memtest just to be sure the RAM is OK...

https://www.memtest.org/
https://www.memtest86.com/

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

@vanvugt
Ran my laptop's diagnostic for memory and it passed. Didn't have a flash drive I was willing to sacrifice. Attached is it's log.

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

I did a reinstall of Eoan using last night's daily image and even during live it still had the visual issues among the occasional crash which continued after installation. Reverted to 5.0.21 for the meantime.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

@kaihenfeng
Same white screen issue. But this time I actually collected last boot's journal and noticed this boot's and last boot's share a similar stack-trace at boot. Searching comes up with,
https://bugs.freedesktop.org/show_bug.cgi?id=107296
And judging from the comments they mention blank screens. But this stack trace appears regardless on a boot that gets a usable or a white void, albeit discretely different due to file differences it seems so it might not even be related. It's something I've observed.

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest drm-tip kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

And file an upstream bug if the issue persists:
https://gitlab.freedesktop.org/drm/amd/issues

Revision history for this message
Jarrod Farrell (jarrodmaddy) wrote :

@kaihengfeng

Sorry about the delay. Life being distracting.

But it seems like the issue is /mostly/ fixed in the provided 2019-12-16 build and so far hasn't required me to hard shutdown which is good. What isn't particularly good is the occasional "freeze" where everything on screen stops for a few moments. Occasionally during these freezes, items that were occluded by something appeared in front of it instead, like desktop icons in front of Firefox with mild graphical glitching on a terminal window (tiling parts of a window somewhere else). It seems like the issue occurs when something even mildly graphically intensive (searching through a large webpage, alt-tabbing, etc.) will cause one of these freezes. I've even had it happen several times in a row before it finally recovered.

I did open journalctl -f and noted this:

Dec 16 16:12:29 DarkBolt kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
Dec 16 16:12:29 DarkBolt kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Dec 16 16:12:39 DarkBolt kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
Dec 16 16:12:39 DarkBolt kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please file an upstream bug to let AMDGPU maintainers know.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.