Comment 15 for bug 820746

Revision history for this message
a.r.karthick@gmail.com (a-r-karthick) wrote :

@Mynk: Thanks for your efforts in testing out the patch. I know you spent sleepless nights to get this verified amidst other preemptions.
Regarding the error log on resume, its a different one altogether. It has something to do with the fact that the radeon_pcie_gart_enable never really happend. (advanced relocation table/iommu for PCI express slots).
So on device startup, the GPU acceleration was disabled for your hardware which also releases the gart table. (radeon_gart_table_vram_free invoked on gart finalize which clears up the vram)

Aug 8 05:34:29 mayankr-T400 kernel: [ 10.481258] [drm:radeon_ring_write] *ERROR* radeon: writting more dword to ring than expected !
Aug 8 05:34:29 mayankr-T400 kernel: [ 10.626140] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xFFFFFFFF)
Aug 8 05:34:29 mayankr-T400 kernel: [ 10.626146] radeon 0000:01:00.0: disabling GPU acceleration

However this fact isn't marked by the radeon driver when it disables the GPU acceleration in r600_startup (no flags marked).
So during resume when it tries to re-enable the gart table, it found an empty: vram object in r600_pcie_gart_enable.
And then fails the resume but since your ring and gpu were anyway initialized and suspend doesn't touch it, you were not impacted but just got left with a "Resume failed message".
I think this has something to do with the fact that your hardware is returning invalid (~0U) values during initialize for the ring buffer write index which we are now anding it with the ring buffer size which effectively leaves us with 1 byte of write space at the tail or reduced ring buffer write space for the GPU acceleration feature to be enabled.
So I guess our quirk for your hardware is making you _live_ or exist with a broken/crazy hardware :)
Otherwise you would be Oopsing as before. If you want to enable GPU acceleration, maybe we retry a finite number of times on receiving invalid ring buffer write index values as my original patch with the expectation that the subsequent retries work but I guess its not worth it and it makes sense to live without GPU acceleration for your seemingly broken graphics chipset.

I also believe that we can mark a flag in rdev->flags like rdev->flags |= RADEON_IS_GART_DISABLED and then check against this flag when the pcie_gart_enable fails on resume and continue by rdev->flags &= ~RADEON_IS_GART_DISABLED with the resume instead of failing the resume since the gart vram object was freed during r600_startup while disabling the gart and continuing.

But I don't think its a big deal and we can treat it as benign for now for the reasons mentioned above.

So to cut it short, lets now pull the trigger for the patch to be pushed upstream :)