S3 stress test fails with amdgpu errors

Bug #1909453 reported by AceLan Kao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
AceLan Kao

Bug Description

[Impact]
It fails to resume from S3 with below error messages
   Nov 17 03:15:27 u kernel: amdgpu 0000:04:00.0:[drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring vcn_dec test failed (-110)
   Nov 17 03:15:27 u kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <vcn_v1_0> failed -110
   Nov 17 03:15:27 u kernel: [drm:amdgpu_device_resume [amdgpu]] ERROR amdgpu_device_ip_resume failed (-110).

[Fix]
AMD provides the 2 commits in 5.9-rc1 to fix this issue, and groovy has these commits from stable update.
   429f3d24384b drm/amdgpu: asd function needs to be unloaded in suspend phase
   90937420c44f drm/amdgpu: add TMR destory function for psp

[Test]
Verified on problematic Dell machine,and it passes 500 S3 test.

[Where problems could occur]
TMR will be created after resumed, so it should be destroyed while entering S3. The patch does what is required, should be pretty safe to include this commit.

AceLan Kao (acelankao)
Changed in linux-oem-5.6 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.6 (Ubuntu Focal):
status: New → In Progress
assignee: nobody → AceLan Kao (acelankao)
AceLan Kao (acelankao)
description: updated
tags: added: oem-priority originate-from-1908327 somerville
AceLan Kao (acelankao)
Changed in linux-oem-5.6 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
AceLan Kao (acelankao)
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.6 - 5.6.0-1048.52

---------------
linux-oem-5.6 (5.6.0-1048.52) focal; urgency=medium

  * focal/linux-oem-5.6: 5.6.0-1048.52 -proposed tracker (LP: #1913153)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * CVE-2020-27815
    - jfs: Fix array index bounds check in dbAdjTree

  * CVE-2020-25704
    - perf/core: Fix a memory leak in perf_event_parse_addr_filter()

  * CVE-2020-25643
    - hdlc_ppp: add range checks in ppp_cp_parse_cr()

  * CVE-2020-25641
    - block: allow for_each_bvec to support zero len bvec

  * CVE-2020-25284
    - rbd: require global CAP_SYS_ADMIN for mapping and unmapping

  * CVE-2020-25212
    - nfs: Fix getxattr kernel panic and memory overflow

  * CVE-2020-28588
    - lib/syscall: fix syscall registers retrieval on 32-bit platforms

  * CVE-2020-29371
    - romfs: fix uninitialized memory leak in romfs_dev_read()

  * CVE-2020-29369
    - mm/mmap.c: close race between munmap() and expand_upwards()/downwards()

  * CVE-2020-29368
    - mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()

  * CVE-2020-29660
    - tty: Fix ->session locking

  * CVE-2020-29661
    - tty: Fix ->pgrp locking in tiocspgrp()

  * CVE-2020-35508
    - fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent

  * CVE-2020-24490
    - Bluetooth: fix kernel oops in store_pending_adv_report

  * CVE-2020-14314
    - ext4: fix potential negative array index in do_split()

  * CVE-2020-10135
    - Bluetooth: Consolidate encryption handling in hci_encrypt_cfm
    - Bluetooth: Disconnect if E0 is used for Level 4

  * CVE-2020-27152
    - KVM: ioapic: break infinite recursion on lazy EOI

  * CVE-2020-28915
    - fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h
    - Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts

  * CVE-2020-15437
    - serial: 8250: fix null-ptr-deref in serial8250_start_tx()

  * CVE-2020-15436
    - block: Fix use-after-free in blkdev_get()

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields

  * S3 stress test fails with amdgpu errors (LP: #1909453)
    - drm/amdgpu: asd function needs to be unloaded in suspend phase
    - drm/amdgpu: add TMR destory function for psp

 -- Timo Aaltonen <email address hidden> Thu, 18 Feb 2021 13:11:14 +0200

Changed in linux-oem-5.6 (Ubuntu Focal):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.