[amdgpu] System locks up with VMC page fault on Ryzen 2400G and HWE kernel 5.0.

Bug #1839758 reported by Oleh Dmytrychenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since kernel 5.0 rolled out through HWE last week my system is consistently locking up upon launch of Steam or 3D applications in WINE.

Typical log: https://paste.ubuntu.com/p/WhPYXdkZVw/

Worked around by using pre-update 4.18 kernel.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: xorg 1:7.7+19ubuntu7.1
ProcVersionSignature: Ubuntu 5.0.0-23.24~18.04.1-generic 5.0.15
Uname: Linux 5.0.0-23-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Sun Aug 11 15:47:27 2019
DistUpgraded: 2018-03-26 22:50:10,989 ERROR got error from PostInstallScript ./xorg_fix_proprietary.py (g-exec-error-quark: Failed to execute child process “./xorg_fix_proprietary.py” (No such file or directory) (8))
DistroCodename: bionic
DistroVariant: ubuntu
DkmsStatus:
 virtualbox, 5.2.32, 4.18.0-25-generic, x86_64: installed
 virtualbox, 5.2.32, 5.0.0-23-generic, x86_64: installed
EcryptfsInUse: Yes
ExtraDebuggingInterest: Yes
GpuHangFrequency: Continuously
GpuHangReproducibility: Yes, I can easily reproduce it
GpuHangStarted: Within the last week or two
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15dd] (rev c6) (prog-if 00 [VGA controller])
   Subsystem: Gigabyte Technology Co., Ltd Radeon RX Vega 11 [1458:d000]
InstallationDate: Installed on 2018-03-26 (502 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
MachineType: Gigabyte Technology Co., Ltd. AX370-Gaming K7
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-23-generic root=/dev/mapper/ubuntu--vg-root ro
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to bionic on 2018-03-26 (502 days ago)
dmi.bios.date: 01/16/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F25
dmi.board.asset.tag: Default string
dmi.board.name: AX370-Gaming K7
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: se3
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF25:bd01/16/2019:svnGigabyteTechnologyCo.,Ltd.:pnAX370-GamingK7:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnAX370-GamingK7:rvrse3:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: AX370-Gaming K7
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.97-1ubuntu1~18.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 19.0.2-1ubuntu1.1~18.04.2
version.libgl1-mesa-glx: libgl1-mesa-glx 19.0.2-1ubuntu1.1~18.04.2
version.xserver-xorg-core: xserver-xorg-core N/A
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati N/A
version.xserver-xorg-video-intel: xserver-xorg-video-intel N/A
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau N/A

Revision history for this message
Oleh Dmytrychenko (nitrogen-ua) wrote :
summary: - System lock-up with VMC page fault since HWE kernel update.
+ System locks up with VMC page fault on Ryzen 2400G and HWE kernel 5.0.
summary: - System locks up with VMC page fault on Ryzen 2400G and HWE kernel 5.0.
+ [amdgpu] System locks up with VMC page fault on Ryzen 2400G and HWE
+ kernel 5.0.
affects: xorg (Ubuntu) → linux (Ubuntu)
tags: added: regression-update
tags: added: amdgpu
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Oleh Dmytrychenko (nitrogen-ua) wrote :

Tested with latest mainline kernel and gnome is failing to start with similar issue.

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1, emitted seq=3
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2210 thread gnome-shel:cs0 pid 2239
kernel: [drm] GPU recovery disabled.

Revision history for this message
Chi-Thanh Christopher Nguyen (chithanh) wrote :

> kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1, emitted seq=3

That looks more like https://bugs.freedesktop.org/show_bug.cgi?id=111122
Try setting AMD_DEBUG="nodcc" environment variable.

Revision history for this message
Oleh Dmytrychenko (nitrogen-ua) wrote :

Tried your suggestion, but it didn't help so far. Latest kernel still can't have gnome started, while 5.0 HWE kernel produced result similar to previous ones.

I have verified that environment variable was set successfully:
$ printenv | grep -i amd_debug
AMD_DEBUG=nodcc

syslog:
steam.desktop[3741]: STEAM_RUNTIME_HEAVY: ./steam-runtime-heavy
kernel: gmc_v9_0_process_interrupt: 49 callbacks suppressed
kernel: amdgpu 0000:08:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:2 pasid:32784, for process vulkandriverque pid 4088 thread vulkandriverque pid 4088)
kernel: amdgpu 0000:08:00.0: in page starting at address 0x0000000000000000 from 27
kernel: amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0020153D

drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=5205, emitted seq=5207
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process vulkandriverque pid 4088 thread vulkandriverque pid 4088
kernel: [drm] GPU recovery disabled.

Revision history for this message
Oleh Dmytrychenko (nitrogen-ua) wrote :

Not reproducible anymore with all the latest updates applied. Future regression testing isn't possible either due to hardware change. It's fair to close this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.