AGP disablement leaves GPUs without working alternative (PCI fallback is broken), makes very-capable ATI TeraScale GPUs unusable

Bug #1899304 reported by Thomas Debesse
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This system runs Ubuntu 20.04, freshly installed 3~4 months ago (July 2020).

There is two kernels available on this system:

- 5.4.0-47-generic
- 5.4.0-48-generic

With kernel 5.4.0-47-generic:

- GNOME shell loads properly on Radeon X1950 PRO,
- Unvanquished game runs on ATI Radeon X1950 PRO at 70 fps on 1280×720 resolution,
- Unvanquished game runs on ATI Radeon 9500 at 40 fps on 640×480 resolution.

Everything looks consistent with the limits and the age of the hardware.

With kernel 5.4.0-48-generic:

- GNOME Shell never finish to load on Radeon X1950 PRO, either a grey screen is displayed and keyboard shortcuts does not respond, or the top bar is stuck between the center and the top of the screen and the shell does not respond, or the top bar on the top of the screen but the shell does not respond, to get a desktop I run `sudo systemctl stop display-manager` then `startx /usr/bin/lxsession` from a TTY. Running GNOME Shell with startx or from a lone xterm started with startx leads to same issues.
- Unvanquished game runs on ATI Radeon X1950 PRO at 7 fps on 1280×720 resolution,
- Unvanquished game runs on ATI Radeon 9500 at 3 fps on 640×480 resolution.

Note: for unknown reasons, GNOME Shell loads properly on the ATI Radeon 9500 but not on the Radeon X1950 PRO.

Everything is slow. When the game is running, a very high load is reported by htop, which does not look like the experience seen on the 5.4.0-47-generic kernel. When the game is running cycling between windows using Alt-Tab takes a lot of second while it's immediate on 5.4.0-47-generic kernel. Also, even without the game running or on a lightweight desktop like LXDE, cycling windows is not smooth and window refreshing is slow enough to be noticeable.

About the hardware, note that:

- the CPU only has one core, no hyperthreading (AMD Athlon 64 FX for socket 939),
- the GPUs are AGP ones using R300 and R500 technology (pre-TeraScale),
- there is 3GB of DDR RAM,
- there is no on-disk swap but zram-based swap in compressed ram is used,
- system is stored on and boots from an USB 3.1 key plugged on an USB 2.0 port,
- /tmp is a tmpfs ram disk,
- CPU is set to performance profile,
- the install is not really messy and not many packages are installed, this USB key is purposed for hardware/system testing and to diagnose such issues.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-48-generic 5.4.0-48.52
ProcVersionSignature: Ubuntu 5.4.0-48.52-generic 5.4.60
Uname: Linux 5.4.0-48-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: illwieckz 7503 F.... pulseaudio
CasperMD5CheckResult: skip
Date: Sun Oct 11 01:47:48 2020
InstallationDate: Installed on 2020-07-09 (93 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
IwConfig:
 enp0s11 no wireless extensions.

 lo no wireless extensions.
MachineType: MSI MS-6702E
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-48-generic root=UUID=10314d0c-ec6b-4f7f-b926-ed8b80185331 ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-48-generic N/A
 linux-backports-modules-5.4.0-48-generic N/A
 linux-firmware 1.187.3
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/12/2006
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 080011
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MS-6702E
dmi.board.vendor: MSI
dmi.board.version: 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr080011:bd10/12/2006:svnMSI:pnMS-6702E:pvr1.0:rvnMSI:rnMS-6702E:rvr1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: MS-6702E
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: 1.0
dmi.sys.vendor: MSI

Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
description: updated
description: updated
Revision history for this message
Thomas Debesse (illwieckz) wrote : Re: Huge performance regression, Unvanquished game goes from 70fps to 7fps, GNOME Shell never finishes to load, GNOME desktop unusable

I forgot to mention:

After switching from the 5.4.0-47-generic kernel to the 5.4.0-48-generic one, it becomes very very slow for the game to load the levels and related assets from filesystem (textures, etc.), so the issue is likely to no be only about graphical performance.

Revision history for this message
Thomas Debesse (illwieckz) wrote :

I tested with another ATI GPU, the Radeon HD 4670 (AGP) one, an high-end TeraScale 1 generation card. The computer did not manage to display the desktop, the dmesg was full of errors about GPU lockup, and I did not managed to reboot properly the computer (had to uses magic SysRq keys).

I tested with an Nvidia GPU, the GeForce 8400 GS rev.2 (PCI), a low-end Tesla 1.0 generation card, running nouveau. The computer did not manage to display the desktop (but displayed garbage instead). Though, I noticed nothing relevant in dmesg and could reboot properly from another host through SSH.

I entirely reinstalled the 5.4.0-48-generic kernel and modules packages, regenerated initramfs, problem is still there.

I noticed that at the time the 5.4.0-48-generic kernel was installed many other packages were upgraded as well (including some related to firmwares and initramfs) so to rules them out I regenerated the 5.4.0-47-generic again, making sure this one is generated against the same files and with the same tools, and I still don't reproduce the bug on 5.4.0-47-generic kernel.

So, that looks really kernel related, all the other packages are the same, and when using other kernel, that other kernel runs with exact same files or files generated with exact same tools without reproducing the bug.

I also tested the system on another computer but could not reproduce the bug on that other computer. So, it looks like the bug is tied to the 5.4.0-48-generic kernel on that particular hardware.

This bug makes Ubuntu 20.04 and this hardware completely unusable once the 5.4.0-48-generic kernel is installed and used.

Revision history for this message
Thomas Debesse (illwieckz) wrote :
summary: - Huge performance regression, Unvanquished game goes from 70fps to 7fps,
- GNOME Shell never finishes to load, GNOME desktop unusable
+ Linux 5.4.0-48 causes GPU lockup, huge performance drop, makes GNOME
+ desktop fail to start and games going from 70fps to 7fps, slow file
+ loading, audio issues
Revision history for this message
Thomas Debesse (illwieckz) wrote : Re: Linux 5.4.0-48 causes GPU lockup, huge performance drop, makes GNOME desktop fail to start and games going from 70fps to 7fps, slow file loading, audio issues

I forgot to say that when I manage to get a non-composited LXDE desktop running on Radeon X1950 PRO, started by hand using `startx`, and I run the Unvanquished game, more than 51% of the CPU time is spent on OpenAL thread, i.e. the audio thread, which is wrong. This thread is usually really lightweight compared to other threads. Also, sound crackling is heard.

Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Thomas Debesse (illwieckz) wrote :

I reproduce the bug with the 5.4.0-49-generic kernel from `proposed` when running the Radeon X1950 PRO.

Note: I previously joined a dmesg log file about GPU lockup, such lockup is not always logged but the GNOME Shell session is stuck anytime it is started.

Revision history for this message
Thomas Debesse (illwieckz) wrote :

I reproduce the bug with 5.4.0-51-generic and 5.4.0-52-generic.
The 5.4.0-47-generic one is the last known kernel to work on that system.

summary: - Linux 5.4.0-48 causes GPU lockup, huge performance drop, makes GNOME
- desktop fail to start and games going from 70fps to 7fps, slow file
- loading, audio issues
+ Linux 5.4.0-48 (and later) causes GPU lockup, huge performance drop,
+ makes GNOME desktop fail to start and games going from 70fps to 7fps,
+ slow file loading, audio issues
Revision history for this message
Thomas Debesse (illwieckz) wrote : Re: Linux 5.4.0-48 (and later) causes GPU lockup, huge performance drop, makes GNOME desktop fail to start and games going from 70fps to 7fps, slow file loading, audio issues

I reproduce the bug on another computer, using 5.4.0-52-generic kernel:

Motherboard: Asrock AM2NF3 VSTA
CPU: AMD Phenom II X4 970 (Quad core)
RAM: 16GB DDR2 800MHz (4×4 GB)
GPU: ATI Radeon HD 4670 AGP
VRAM: 1GB DDR3

Of course the same computer works flawlessly with 5.4.0-47-generic kernel.

I noticed that if I plug a Radeon 9250 instead of the Radeon HD 4670, the desktop uses LLVMpipe software and GNOME manages to start successfully but this is so slow I can see the desktop being drawn. Note that LLVMpipe rendering on that same GPU but on 5.4.0-47-generic kernel is not that slow. When running 5.4.0-53-generic I can see the pixels being painted, that say so much how slow Ubuntu becomes on kernel greater than 5.4.0-47-generic.

Revision history for this message
Thomas Debesse (illwieckz) wrote :

Here is another photo of display glitch happening when the computer hangs. I got it with both the Radeon X1950 and the Radeon HD 4670.

What happens it at startup, the GNOME Shell desktop makes an animation, expanding itself from the center of the screen. In such screenshot, we see the computer hang before the full size of the screen is reached (blues square is the frame of the screen).

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I guess it's caused by commit "drm/radeon: disable AGP by default".

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.10-rc1/amd64/

Hopefully it's already fixed.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Thomas Debesse (illwieckz) wrote :
Download full text (4.3 KiB)

Hi, thank you for your answer and your attention,

This issue is confirmed again.

1. 5.10.0-rc1 does not fix the problem introduced in 5.4.0-48 regarding ATI/AMD AGP GPU.
2. PCI GPUs are broken on AMD K8/K10 platform since years but they work on Intel platform, GPU being ATI/AMD or Nvidia doesn't make a difference, so there is no display fallback if AGP is disabled.
3. Present issue not only reports graphical performance regression, but global performance regression on the computers as a whole: slow disk IO, audio glitches…

Because the commit you talked about seems to be related to Radeon and AGP and I've experienced bugs with PCI nVidia hardware too, I've done more testing.

I previously reported issues with the PCI Nvidia GeForce 8400 GS rev.2 on 5.4.0-48 kernel, but at the time I did not tried if this one worked on 5.40.0-47 on this host computer. This is now done and in fact, this hardware does not work on this host computer on 4.15.0-118-generic kernel from Ubuntu 16.04 LTS (Xenial), either on 4.8.0-36 and 4.4.0-190 (Xenial). I'm not saying such hardware does not work at all, such hardware works on Intel based computers, they work on Intel-based PCIe computer with Ubuntu 20.04 (I have not tested older).

I've tested Nvidia and ATI PCI GPUs on the K8 AGP-based computer, the K10 AGP-based computer, a K8 PCIe based computer, and an Intel based computer. Teh tested PCI ATI GPU is an ATI Radeon HD 4350, a not-so-old GPU (TeraScale generation) with HDMI output.

Either ATI or Nvidia, those GPU works on Intel based computer but not in AMD K8 AGP or K10 AGP and even not on K8 PCIe-based computer.

So, here is the list of GPU tested:

AGP ATI Radeon HD 4670 (RV730 XT, TeraScale 1), HDMI + DVI-I + VGA
AGP ATI Radeon X1950 PRO (RV570, R500), DVI-I + DVI-I
PCI ATI Radeon HD 4350 (RV710, Terascale 1), HDMI + DVI-I + VGA
PCI Nvidia Geforce 8400 GS rev.2 (NV98, Tesla 1.0), DVI-I + VGA

AGP GPUs worked up to 5.4.0-47 and stopped working with 5.4.0-48, they still does not work on 5.10.0-rc1, I reproduced the issue on 5.10.0-rc1 on both K8 and K10 AGP based computers. I don't have access right now to Intel-based AGP computers to test them.

Those are the computers tested:

K8 AGP based: ASRock AM2NF3-VSTA motherboard with AMD Phenom II X4 970 CPU (quad core), Nvidia nForce3 bridge, 16GB DDR2 800MHz, AGP + PCI
K10 AGP based: Dell Optiplex 740 motherboard with AMD Athlon 64 X2 CPU (dual core), Nvidia C51 host bridge, C51 PCI Express bridge, 6GB DDR2 667MHz, PCIe + PCI
K8 PCIe based: MSI MS-6702E motherboard with AMD Athlon 64 3200+ CPU (single core), VIA K8T800Pro, VT8237/8251 bridge, 3GB DDR 400MHz, AGP + PCI
Intel PCIe based: Lenovo ThinkCentre M58 motherboard with Pentium E5200 CPU (dual core), Intel 82801 PCI Bridge, 1GB DDR2 800MHz, PCIe + PCI

On all K8 or K10-based computers with AGP, the screen may displays garbage but host can be accessed through SSH, so I got some logs (more to come).

So to sum it up:

- PCI Graphics are broken in kernel since ages on K8/K10 platform but went probably unnoticed since they work on Intel platform and those GPU are not very commons.
- AGP Graphics regressed starting with 5.4.0-48.

Because PCI graphics seems to be...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to do a kernel bisection?

First, find the last -rc kernel works and the first -rc kernel doesn’t work from http://kernel.ubuntu.com/~kernel-ppa/mainline/

Then,
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect good $(the working version you found)
$ git bisect bad $(the non-working version found)
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If it still have the same issue,
$ git bisect bad
Otherwise,
$ git bisect good
Repeat to "make -j`nproc` deb-pkg" until you find the offending commit.

Revision history for this message
Thomas Debesse (illwieckz) wrote :

I built the v5.5.0 version from torvalds's branch and it works.

So, if it does not work on Ubuntu's 5.4.0-48 I can assume it's was broken by some Ubuntu custom patch or some backports, making it harder for me to identify what may have introduced the regression.

I'll try to find the vanilla version that started to fail anyway, that will be helpful.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Thomas Debesse (illwieckz) wrote :
Download full text (5.2 KiB)

For some reason I was able to compile v5.4 and v5.5 from torvalds branchs with `make -j$(nproc) deb-pkg` but starting with v5.6 I had to use `make -j$(nproc) bindeb-pkg`, in the end I lacked some modules (like my network driver, that did not helped me) but radeon one was there so tests could have been done.

The GNOME desktop opened properly with 5.4, 5.5, 5.6, 5.7 and 5.8. But when v5.9, I just got the desktop (GDM is configured to autologin) restarting again and again, only showing me a mouse cursor before dying.

The errors in dmesg were similar to the one I found with PCI devices (not AGP ones!) on the same AMD K8 or K10-based motherboard I reproduce issue with AGP devices.

Also, the errors in vanilla 5.9 look to be the same as the one seen in 5.10-rc1 from mainline PPA.

Here is a sample of captured dmesg error on vanilla 5.9:

```
[ 5.242322] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[ 5.242359] radeon 0000:01:00.0: disabling GPU acceleration

[ 34.558885] ------------[ cut here ]------------
[ 34.558889] trying to bind memory to uninitialized GART !
[ 34.559048] WARNING: CPU: 1 PID: 2516 at drivers/gpu/drm/radeon/radeon_gart.c:299 radeon_gart_bind+0xdf/0xf0 [radeon]
[ 34.559050] Modules linked in: zram snd_usb_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm snd_seq_midi kvm_amd snd_seq_midi_event ccp joydev kvm snd_seq snd_rawmidi input_leds snd_timer snd_seq_device snd soundcore k10temp mac_hid serio_raw binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear uas usb_storage hid_generic usbhid hid radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm psmouse forcedeth i2c_nforce2
[ 34.559107] CPU: 1 PID: 2516 Comm: gnome-shell Not tainted 5.9.0 #1
[ 34.559109] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AM2NF3-VSTA, BIOS P3.20 10/09/2009
[ 34.559178] RIP: 0010:radeon_gart_bind+0xdf/0xf0 [radeon]
[ 34.559184] Code: 00 48 89 ef 48 8b 40 60 e8 0e 2f 44 df 31 c0 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 38 6f 6b c0 e8 23 0c 6d de <0f> 0b b8 ea ff ff ff eb dc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 34.559187] RSP: 0018:ffffc030838f7a28 EFLAGS: 00010282
[ 34.559191] RAX: 0000000000000000 RBX: ffffa0cf6b88eb80 RCX: 0000000000000027
[ 34.559193] RDX: 0000000000000027 RSI: 0000000000000086 RDI: ffffa0cf6fc98d08
[ 34.559196] RBP: ffffc030838f7b28 R08: ffffa0cf6fc98d00 R09: 0000000000000004
[ 34.559198] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc030838f7b28
[ 34.559201] R13: ffffa0cf6a622868 R14: ffffa0cf6c7cc6e8 R15: ffffc030838f7b28
[ 34.559204] FS: 00007f46ae245cc0(0000) GS:ffffa0cf6fc80000(0000) knlGS:0000000000000000
[ 34.559207] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.559210] CR2: 000056494261c1c8 CR3: 000000040bfe6000 CR4: 00000000000006e0
[ 34.559212] Call Trace:
[ 34.559286] radeon_ttm_backend_bind+0x58/0x210 [radeon]
[ 34...

Read more...

Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Thomas Debesse (illwieckz) wrote :

Related and similar issue with PCI graphic cards (not AGP ones):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902795

While PCI graphic cards are broken on AMD K8/K10 platform for years (I've reproduced on Linux 4.4, 4.8 and 4.15 from Ubuntu 16.04 Xenial), AGP cards started to break on Ubuntu with 5.4.0-48-generic (was still working with 5.4.0-47-generic).

Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Thomas Debesse (illwieckz) wrote :
Download full text (8.8 KiB)

So, the 5.4.0-48 error is the same as the one that appears with 5.9 (and the one we see with PCI GPUs):

```
[ 0.000000] Linux version 5.4.0-48-generic (buildd@lcy01-amd64-010) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 (Ubuntu 5.4.0-48.52-generic 5.4.60)

[ 3.366387] PCI Interrupt Link [LKLN] enabled at IRQ 21
[ 3.426007] [drm] radeon kernel modesetting enabled.
[ 3.435680] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[ 3.456237] radeon 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xfebe0000 -> 0xfebeffff
[ 3.478847] checking generic (e0000000 130000) vs hw (e0000000 10000000)
[ 3.478853] fb0: switching to radeondrmfb from VESA VGA
[ 3.490892] Console: switching to colour dummy device 80x25
[ 3.490909] radeon 0000:01:00.0: vgaarb: deactivate vga console
[ 3.491182] PCI Interrupt Link [LNKD] enabled at IRQ 19
[ 3.491321] [drm] initializing kernel modesetting (RV730 0x1002:0x9495 0x1002:0x0028 0x00).
[ 3.491325] [drm] Forcing AGP to PCIE mode
[ 3.491353] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
[ 3.491359] caller pci_map_rom+0x71/0x18c mapping multiple BARs
[ 3.492982] ATOM BIOS: RV730XT
[ 3.493101] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[ 3.493104] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
[ 3.493108] [drm] Detected VRAM RAM=1024M, BAR=256M
[ 3.493110] [drm] RAM width 128bits DDR
[ 3.493193] [TTM] Zone kernel: Available graphics memory: 8231890 KiB
[ 3.493195] [TTM] Zone dma32: Available graphics memory: 2097152 KiB
[ 3.493198] [TTM] Initializing pool allocator
[ 3.493203] [TTM] Initializing DMA pool allocator
[ 3.493221] [drm] radeon: 1024M of VRAM memory ready
[ 3.493223] [drm] radeon: 1024M of GTT memory ready.
[ 3.493230] [drm] Loading RV730 Microcode
[ 3.493316] [drm] Internal thermal controller without fan control
[ 3.512582] [drm] radeon: dpm initialized
[ 3.512696] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 3.534540] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000).
[ 3.534726] radeon 0000:01:00.0: WB enabled
[ 3.534730] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x(____ptrval____)
[ 3.534733] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x(____ptrval____)
[ 3.541143] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c598 and cpu addr 0x(____ptrval____)
[ 3.541147] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 3.541149] [drm] Driver supports precise vblank timestamp query.
[ 3.541151] radeon 0000:01:00.0: radeon: MSI limited to 32-bit
[ 3.541204] [drm] radeon: irq initialized.

[ 5.677888] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[ 5.677934] radeon 0000:01:00.0: disabling GPU acceleration
[ 5.696050] [drm] Radeon Display Connectors
[ ...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

So please bisect between 5.8 and 5.9.

summary: - Linux 5.4.0-48 (and later) causes GPU lockup, huge performance drop,
+ Linux 5.4.0-48 (and later) causes AGP GPU lockup, huge performance drop,
makes GNOME desktop fail to start and games going from 70fps to 7fps,
slow file loading, audio issues
summary: - Linux 5.4.0-48 (and later) causes AGP GPU lockup, huge performance drop,
- makes GNOME desktop fail to start and games going from 70fps to 7fps,
- slow file loading, audio issues
+ AGP disablement leaves GPUs without working alternative (PCI fallback is
+ broken), makes very-capable ATI TeraScale AGP GPUs unusable
summary: AGP disablement leaves GPUs without working alternative (PCI fallback is
- broken), makes very-capable ATI TeraScale AGP GPUs unusable
+ broken), makes very-capable ATI TeraScale GPUs unusable
tags: added: kernel-bug
Revision history for this message
Thomas Debesse (illwieckz) wrote :

Before bisecting, I investigated the PCI issue:
https://bugs.launchpad.net/bugs/1902795

I've faced the PCI issue before the AGP one, but it was less critical. I've submitted a patch that may fix some issues (with drawback of being non-optimal on platforms were PCI graphics are known to already work) but not everything is fixed. To be clear: that does not fix PCI GPU failure on K8 / K10 neither AGP-as-PCI failure, but that is at least a step forward, and would help specialists to investigate more the issue. At this point I may have reached my skill cap on such topic.

Because PCI and AGP-as-PCI behaves differently and the later code may have specific bugs, I opened an issue dedicated to track bugs when AGP cards are driven as PCI ones:
https://bugs.launchpad.net/bugs/1902981

I did not bisected anything after having identified the breakage was introduced by 5.9-rc1, and I directly tried to revert commit ba806f98f868ce107aa9c453fef751de9980e4af that disabled AGP at kernel build. I reverted this commit over the 5.10-rc2 tag from the torvalds branch and both the ATI Radeon HD 4670 on the K10 computer and the ATI Radeon X1950 PRO on the AMD K8 computer started to work again immediately. Game performance was as expected and desktop experience was really smooth like we can expect from such aging but high end hardware from its time.

It looks like I misread your first comment, I believed you were asking me to try the 5.10-rc1 build like if that commit was already reverted (the only action that seems to be able to fix the issue at this point).

After this commit is reverted, I have not yet noticed the other issues I've reported (disk IO seems to be OK, I have not tested audio yet), so the other ones may have been collateral damage of the AGP/PCI one.

So, you were right, the regression was introduced by the "drm/radeon: disable AGP by default" commit. It sounds too early to disable without alternative some hardware that was still sold as brand new in 2012 (the ATI Radeon HD 4670 AGP was).

Revision history for this message
Thomas Debesse (illwieckz) wrote :

This reverts commit ba806f98f868ce107aa9c453fef751de9980e4af.

Disabling AGP leaves some hardware without working alternative
on some platforms. For example, PCI GPUs are known to be broken
on K8 and K10 platforms since years: the breakage was reproduced
from Linux 4.4 on Ubuntu 16.04 Xenial to Linux 5.10-rc1 on Ubuntu
20.04 Focal, and it is expected to be older than Linux 4.4.

Also, there may be some bugs specific to AGP GPUs being driven
as PCI ones since fixing some PCI bugs introduces newer bugs
that are very specific to AGP GPUs driven as PCI ones and not
to PCI native ones.

Some AGP GPUs are still relevant to this day, like the high-end
ATI Radeon HD 4670 AGP (RV730 XT), a very capable TeraScale GPU
designed for OpenGL 3.3 and OpenCL 1.0 and featuring HDMI port
and 1GB of VRAM. This GPU was distributed by various manufacturers
and was still sold as brand new in 2012, for example this one:
http://www.hisdigital.com/un/product2-448.shtml
https://web.archive.org/web/2012/https://www.amazon.com/gp/product/B003CYKCG8/

As an example, this AGP GPU still gets 140+ fps on the competitive
Xonotic game in 2020, as verified during the XDWC 2020 event, also
when compared to other games on the Unvanquished GPU compatibility
matrix, we can notice that to outperform such GPU, Intel users
have to acquire an UHD 600 graphic chip from 2016, and Nvidia users
relying on the free open source nouveau driver have to acquire a
GTX 1060 from 2016:
https://wiki.unvanquished.net/wiki/GPU_compatibility_matrix

Motherboards compatible with powerful CPUs like the quad core AMD
AM3 Phenom II CPU X4 970 (3.5GHz) supporting virtualization, 16GB
of RAM and featuring AGP and PCI slots (not PCI Express ones) were
sold, like this motherboard from 2006 supporting this CPU from 2010:
https://www.asrock.com/mb/nvidia/am2nf3-vsta/
https://www.cpu-world.com/cgi-bin/IdentifyPart.pl?PART=HDZ970FBK4DGM

This is basically among the best the market had to offer in 2012
for AGP users. Disabling AGP turns such very capable computers and
their AGP GPUs into paperweights.

Even if PCI and AGP-as-PCI issues are fixed, disabling AGP is
expected to strongly affect performance of such GPUs, and disabling
AGP may hide bugs that may be introduced after the disablement.

A boot command line switch to disable AGP to rely on PCI fallback
may be welcome to help testing the PCI code and prevent it to rot
as it is easier to find AGP cards than PCI ones.

See related bugs:

- https://bugs.launchpad.net/bugs/1899304 (this one)
> AGP disablement leaves GPUs without working alternative
> (PCI fallback is broken), makes very-capable ATI TeraScale GPUs
> unusable

- https://bugs.launchpad.net/bugs/1902981
> AGP GPUs driven as PCI ones (when AGP is disabled at kernel build
> time) are known to fail on K8 and K10 platforms

- https://bugs.launchpad.net/bugs/1902795
> PCI graphics broken on AMD K8/K10 platform (while it works on Intel)
> verified from Linux 4.4 to 5.10-rc1

Revision history for this message
Thomas Debesse (illwieckz) wrote :

See patch and comments on https://lkml.org/lkml/2020/11/5/308

Patch was rewritten in a way the message is shorter.

tags: added: patch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.