Samsung SSD corruption (fsck needed)

Bug #1746340 reported by Lucas Zanella
90
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Ubuntu 4.13.0-21.24-generic 4.13.13

I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.

I notice the error when I try to save something on disk and it says me that the disk is in read-only mode:

lz@lz:/var/log$ touch something
touch: cannot touch 'something': Read-only file system

lz@lz:/var/log$ cat syslog
Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0

lz@lz:/var/log$ dmesg
[62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.377374] Aborting journal on device nvme0n1p2-8.
[62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
[63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"

Rebooting the ubuntu will give me a black terminal where I can run fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of orphaned inodes. The majority of time it boots back to the Ubuntu working good, but some times it boots to a broken ubuntu (no images, lots of things broken). I have to reinstall ubuntu then.

Every time I reinstall my Ubuntu, I have to try lots of times until it installs without an Input/Output error. When it installs, I can use it for some hours without having the problem, but if I run the software updates, it ALWAYS crashes and enters in read-only mode, specifically in the part that is installing kernel updates.

I noticed that Ubuntu installs updates automatically when they're for security reasons. Could this be the reason my Ubuntu worked for months without the problem, but then an update was applied and it broke?

I thought that this bug was happening: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried different nvme_core.default_ps_max_latency_us= combinations, all them gave errors. I just changed to 0 and I had no error while using ubuntu (however I didn't test for a long time) but I still had the error after trying to update my ubuntu.

My Samsung 512gb SSD is:

SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q

on a Razer Blade Stealth.

I also asked this on ask ubuntu, without success: https://askubuntu.com/questions/998471/razer-blade-stealth-disk-corruption-fsck-needed-probably-samsung-ssd-bug-afte

Please help me, as I need this computer to work on lots of things :c
---
ApportVersion: 2.20.7-0ubuntu3.7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lz 1088 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 17.10
InstallationDate: Installed on 2018-01-30 (0 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
MachineType: Razer Blade Stealth
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-21-generic N/A
 linux-backports-modules-4.13.0-21-generic N/A
 linux-firmware 1.169.1
Tags: wayland-session artful
Uname: Linux 4.13.0-21-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 01/12/2017
dmi.bios.vendor: Razer
dmi.bios.version: 6.00
dmi.board.name: Razer
dmi.board.vendor: Razer
dmi.chassis.type: 9
dmi.chassis.vendor: Razer
dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
dmi.product.family: 1A586752
dmi.product.name: Blade Stealth
dmi.product.version: 2.04
dmi.sys.vendor: Razer

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1746340

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Lucas Zanella (lucaszanella) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected artful wayland-session
description: updated
Revision history for this message
Lucas Zanella (lucaszanella) wrote : CRDA.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : IwConfig.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : JournalErrors.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : Lspci.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : Lsusb.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : ProcEnviron.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : ProcModules.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : PulseList.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : RfKill.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : UdevDb.txt

apport information

Revision history for this message
Lucas Zanella (lucaszanella) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Which kernel should I install exactly, and how to? Don't feel safe to download from http

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

This is a known issue for Samsung NVMe.

Please attach the output of `sudo nvme id-ctrl /dev/nvme0` and `sudo nvme get-feature -f 0x0c -H /dev/nvme0 | less`, Thanks!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Uhh sans the "less", thanks.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Thank you for your answer. I'm desperated. I just installed debian therefore I'm not going to able to do it right now, but I have output from the last time I was using Ubuntu.

I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work. Then I've put it to 0, which didn't work too. Well, with 0 it didn't generate errors while using, but while trying to update my machine, which always happens too, so I don't know anymore. I remember seeing ATSP Disabled at the output, but the error always happens when I try to update my software...

Shouldn't this bug be already fixed? Or not in my kernel? I could pay to get to the bottom of this, because I need my computer so much right now and this bug is happening every day and I can't continue my work!

The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian and I have 4.9.0-4.

sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote : Re: [Bug 1746340] Re: Samsung SSD corruption (fsck needed)
Download full text (8.7 KiB)

Kai-Heng

> On 31 Jan 2018, at 1:38 PM, Lucas Zanella <email address hidden> wrote:
>
> Thank you for your answer. I'm desperated. I just installed debian
> therefore I'm not going to able to do it right now, but I have output
> from the last time I was using Ubuntu.
>
> I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work.
> Then I've put it to 0, which didn't work too. Well, with 0 it didn't
> generate errors while using, but while trying to update my machine,
> which always happens too, so I don't know anymore. I remember seeing
> ATSP Disabled at the output, but the error always happens when I try to
> update my software…

I’d like to see the output of `sudo nvme get-feature -f 0x0c -H /dev/nvme0` when you use nvme_core.default_ps_max_latency_us=0.

>
> Shouldn't this bug be already fixed? Or not in my kernel? I could pay to
> get to the bottom of this, because I need my computer so much right now
> and this bug is happening every day and I can't continue my work!

This is more likely to a low level NVMe/PCIe issue. If possible, please try to upgrade the firmware for the NVMe.

>
> The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian
> and I have 4.9.0-4.

You’ll get hit by this issue (again) once next Debian release uses newer kernel.

>
> sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
>
> NVME Identify Controller:
> vid : 0x144d
> ssvid : 0x144d
> sn : S33UNX0J324060
> mn : SAMSUNG MZVLW512HMJP-00000
> fr : CXY7501Q
> rab : 2
> ieee : 002538
> cmic : 0
> mdts : 0
> cntlid : 2
> ver : 10200
> rtd3r : 186a0
> rtd3e : 4c4b40
> oaes : 0
> oacs : 0x17
> acl : 7
> aerl : 3
> frmw : 0x16
> lpa : 0x3
> elpe : 63
> npss : 4
> avscc : 0x1
> apsta : 0x1
> wctemp : 341
> cctemp : 344
> mtfa : 0
> hmpre : 0
> hmmin : 0
> tnvmcap : 512110190592
> unvmcap : 0
> rpmbs : 0
> sqes : 0x66
> cqes : 0x44
> nn : 1
> oncs : 0x1f
> fuses : 0
> fna : 0
> vwc : 0x1
> awun : 255
> awupf : 0
> nvscc : 1
> acwu : 0
> sgls : 0
> subnqn :
> ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
> rwt:0 rwl:0 idle_power:- active_power:-
> ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
> rwt:1 rwl:1 idle_power:- active_power:-
> ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
> rwt:2 rwl:2 idle_power:- active_power:-
> ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
> rwt:3 rwl:3 idle_power:- active_power:-
> ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
> rwt:4 rwl:4 idle_power:- active_power:-
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed w...

Read more...

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi. I've been trying to install Windows 10 in order to try to update my SSD firmware, but I'm getting an error:

https://imgur.com/a/BM0gG

could it be that my SSD has a real hardware problem? I tried many different pen drives, in different USB ports, but I always get the same error.

I'm trying to install Ubuntu to get the output of nvme_core.default_ps_max_latency_us=0 but the installation always fails

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (5.8 KiB)

Hi! I managed to install ubuntu again, these are the outputs you asked for the ms tie of 0 milliseconds:

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

get-feature:0xc (Autonomous Power State Transition), Current value:00000000
 Autonomous Power State Transition Enable (APSTE): Disabled
 Auto PST Entries .................
 Entry[ 0]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 1]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 2]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 3]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 4]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 5]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 6]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 7]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 8]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 9]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[10]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[11]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[12]
 ...

Read more...

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I just installed 4.15.0-041500-generic

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Problem persists with 4.15.0-041500-generic, just happened

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

So you have the issue on Linux v4.15 with nvme_core.default_ps_max_latency_us=0, but not on v4.9?

APST doesn't get enabled on both of them.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

On debian (4.9) I didn't notice the issue but I didn't use much. HOWEVER, when I do apt-get upgrade on debian I do get the issue. It just updated the kernel file, didn't run the new kernel (a boot would have to happen).

On v4.15 I didn't change the nvme_core.default_ps_max_latency_us=0, I guess. I did before upgrading to v4.15, I guess. But I can try again.

This is all very strange

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I forgot to mention that I reinstalled windows and everything is fine. Even did a benchmark test on the SSD and I'm downloading lots of files to test

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I am not familiar with Windows, is there anyway to check its APST table? I'd like to see if deepest power state is enabled or not.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I searched and found nothing.

So, even with APST disabled my ssd will fail on linux. What should I do?
Does it work normally for other people when they disable it?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I found a guy with same problem as mine and had a Razer Blade Stealth, but he didn't post anything more after that. And he was in a thread with you. I also found some people with this same problem on the same SSD. Together with the fact that I had no problem on windows (ore than 24hrs of usage by now) I think it can be fixed in the kernel.

I had no luck updating my SSD's firmware as it's OEM and Samsung's updater won't work for it. Do you have any idea? I don't have money to buy a new SSD, and I really need to work. I'd be so grateful if you could help with a solution.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does the issue happen after system suspend?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Initially I noted that it'd happen after opening the lid of the notebook, so yes. But now after I install Ubuntu it immediately starts looking for software updates and that's when the problem happens for the first time, when I haven't even had time to close the notebook to suspend it.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please try [1]. It will do a PCI reset for NVMe device after resume.

people.canonical.com/~khfeng/lp1746340/

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Thanks. What's a 'PCI reset for NVMe device after resume'?

Here's the output of running sudo dpkg -i *.deb on the 4 files:

Selecting previously unselected package linux-headers-4.15.0+.
(Reading database ... 137951 files and directories currently installed.)
Preparing to unpack linux-headers-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-headers-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+.
Preparing to unpack linux-image-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+-dbg.
Preparing to unpack linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+-dbg (4.15.0+-2) ...
dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing archive linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb (--install):
 cannot copy extracted data for './usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko' to '/usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko.dpkg-new': unexpected end of file or stream
Selecting previously unselected package linux-libc-dev.
Preparing to unpack linux-libc-dev_4.15.0+-2_amd64.deb ...
Unpacking linux-libc-dev (4.15.0+-2) ...
Setting up linux-headers-4.15.0+ (4.15.0+-2) ...
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.img-4.15.0+
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_39.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver9_29.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_ver9_33.bin for module i915
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-libc-dev (4.15.0+-2) ...
Errors were encountered while processing:
 linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I downloaded again and it seems that this time it wasn't corrupted.

Output:

Preparing to unpack linux-headers-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-headers-4.15.0+ (4.15.0+-2) over (4.15.0+-2) ...
Preparing to unpack linux-image-4.15.0+_4.15.0+-2_amd64(1).deb ...
Unpacking linux-image-4.15.0+ (4.15.0+-2) over (4.15.0+-2) ...
Preparing to unpack linux-image-4.15.0+-dbg_4.15.0+-2_amd64(1).deb ...
Unpacking linux-image-4.15.0+-dbg (4.15.0+-2) ...
Preparing to unpack linux-libc-dev_4.15.0+-2_amd64.deb ...
Unpacking linux-libc-dev (4.15.0+-2) over (4.15.0+-2) ...
Setting up linux-headers-4.15.0+ (4.15.0+-2) ...
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.img-4.15.0+
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_39.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver9_29.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_ver9_33.bin for module i915
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-4.15.0+-dbg (4.15.0+-2) ...
Setting up linux-libc-dev (4.15.0+-2) ...

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

After installing everything, I rebooted to use the new kernel. I then installed updates on the machine to see if the problem would happen (easier way to make it happen is on the moment I try to update). After the update, wireless stopped working. Restarted many times and still not working.

Could it be that the update triggered the error and the so called pcie reset of this kernel made the wireless go wrong?

I'm gonna still use this kernel to see if the read only filesystem happens though

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I added an USB wireless receiver to use internet to download things so I can see if something happens. I installed more system updates through the ubuntu software updates. Is this ok? The kernel will still be yours, rigtht?

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
tags: added: patch
Changed in linux (Ubuntu):
assignee: Kai-Heng Feng (kaihengfeng) → nobody
Brad Figg (brad-figg)
tags: added: cscc
141 comments hidden view all 221 comments
Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi tronglx, the only way I found was to install Qubes OS

Revision history for this message
trong luu (tronglx) wrote :

Thank Lucas, do you have tried with arch linux? Qubes OS is very strange with me. I'm developer and os community is very importance.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi tronglx,

I didn't try Arch but I THINK I tried Manjaro which is based on it. If I did it didn't work. I remember trying lots of linux and all of them failed.

Qubes OS works because it doesn't use linux kernels directly because it uses Xen microkernel, so somehow it excludes the bug. You can install Arch as a Qubes VM, there's a script for it, you just run it and then it generates an image that you can install. They also provide Ubuntu, Kali and others.

Since this bug is rare I don't think they'll try to fix, the guy that was helping here gave up.

Revision history for this message
trong luu (tronglx) wrote :

Thank Lucas,
It just happened to my laptop. I will try find out the solution.

Revision history for this message
trong luu (tronglx) wrote :

I switched to recovery mode and run: mount -o remount,rw /. The problem no longer appears, it seem be fixed.

Revision history for this message
trong luu (tronglx) wrote :

The error still happens.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi trong luu, for me the error happened every day, which is why I ended up using Qubes. It's the only way that I could find except for Windows

You can try older kernels but it didn't work for me. Remember that downloading older ubuntus will still give you a recent kernel, you have to downgrade by yourself. However ni Ubuntu or Debian kernels fixed the problem for me

Revision history for this message
trong luu (tronglx) wrote :

I think other SSD type is last option. But, i really want find out root cause of the problem. As my understanding, system booted up with Opts: errors=remount-ro. Then something went wrong, system switched to ro mode to protect file system. Do you have checked system log, have any abnormal log? NVME is becoming more and more popular. This is the big problem with linux user.

Revision history for this message
Juan Carlos Carvajal Bermúdez (jucajuca) wrote :

For anyone struggling with this hideous bug, try the following:

add "nvme_core.default_ps_max_latency_us=250" in /etc/default/grub, for example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off nvme_core.default_ps_max_latency_us=250"

then run "update-grub"

My laptop has been running smoothly for a week now. (/dev/nvme0n1 S444NY0K600040 SAMSUNG MZVLB256HAHQ-00000 1 81.09 GB / 256.06 GB 512 B + 0 B EXD7101Q)

see more infos here: https://wiki.archlinux.org/index.php/Solid_state_drive/NVMe

@kernel developers, would it not be great to detect such disks and lower automatically the nvme_core.default_ps_max_latency_us? Thi bug is really hard to detect and solve because there are NO logs whatsoever. Disk goes read-only ya know?

Revision history for this message
trong luu (tronglx) wrote :

Hi Juan, i have tried with your suggest many time but the problem still happens. I also tried with nvme_core.default_ps_max_latency_us=0 but no hope. I'm not sure the APST be disabled. How to check APST status when system booted?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/core.c#n2282

Revision history for this message
Juan Carlos Carvajal Bermúdez (jucajuca) wrote :

Hi
try:

cat /sys/module/nvme_core/parameters/default_ps_max_latency_us

sudo nvme get-feature -f 0x0c -H /dev/nvme0

please read carefully the link provided. the info is there.

Revision history for this message
trong luu (tronglx) wrote :
Download full text (6.0 KiB)

After running cat /sys/module/nvme_core/parameters/default_ps_max_latency_us command and output is 0.
sudo nvme get-feature -f 0x0c -H /dev/nvme0n1p2
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
 Autonomous Power State Transition Enable (APSTE): Disabled
 Auto PST Entries .................
 Entry[ 0]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 1]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 2]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 3]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 4]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 5]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 6]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 7]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 8]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 9]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[10]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[11]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[12]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[13]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[14]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[15]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[16]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[17]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[18]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[19]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[20]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
...

Read more...

Revision history for this message
trong luu (tronglx) wrote :

Hi Lucas, is it ok if installing window and using ubuntu in VMware?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi trong luu, I didn't test it, but I think that it depends on the way VMware virtualizes access to the disk. There may be multiple ways, one of which will work.

Revision history for this message
trong luu (tronglx) wrote :

Hi Lucas, do you have tried with new SSD? I don't think this is the hw issue. My SSD Power Cycles is only 807. Eventually, if not having any other solution, i think i will buy new SSD, do you know which type of SSD would work properly with linux?
Smartctl output:
sudo smartctl -t long -a /dev/nvme0n1p2
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-37-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: LITEON CA3-8D512
Serial Number: 0028104000DN
Firmware Version: C49640A
PCI Vendor ID: 0x14a4
PCI Vendor Subsystem ID: 0x1b4b
IEEE OUI Identifier: 0x002303
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Dec 19 08:54:28 2019 +07
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt *Other*
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 83 Celsius
Critical Comp. Temp. Threshold: 85 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
 0 + 8.00W - - 0 0 0 0 0 0
 1 + 4.50W - - 1 1 1 1 5 5
 2 + 3.00W - - 2 2 2 2 5 5
 3 - 0.0700W - - 3 3 3 3 1000 5000
 4 - 0.0100W - - 4 4 4 4 5000 45000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
 0 - 512 0 1
 1 - 4096 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 47 Celsius
Available Spare: 100%
Available Spare Threshold: 0%
Percentage Used: 0%
Data Units Read: 5,773,150 [2.95 TB]
Data Units Written: 6,405,757 [3.27 TB]
Host Read Commands: 78,674,228
Host Write Commands: 91,754,035
Controller Busy Time: 10,405
Power Cycles: 807
Power On Hours: 312
Unsafe Shutdowns: 104
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 47 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hu trong lu. I indeed bought a new SSD because I thought mine was faulty. However I bought one of the same brand (Samsung). I didn't have the idea of buying another brand. Anyways, the brand new SSD also has the problem.

For my case it definitely is not a hardware problem. With Linux the problem happens every day, sometimes more than once per day. With Windows the error never happened and with Qubes I'm running for more than 2 months without any problems. So it's not hardware, definitely is something wrong with Linux kernel

Revision history for this message
trong luu (tronglx) wrote :

Thank Lucas, i think i will buy another type of SSD. Do you have any suggestion?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

The other 2 good brands I know are Corsair and WD Black. Don't buy Samsung, the majority of people with this problem have Samsung

Revision history for this message
trong luu (tronglx) wrote :

Thank you. My SSD is LITEON CA3-8D512, not Samsung. So, would i buy another type of SSD? non nvme?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

non nvme SSDs are pretty slow, like 8 times slower. Stick with NVME and if nothing works install Qubes

Revision history for this message
trong luu (tronglx) wrote :

Thank Lucas.

Revision history for this message
Craigums Carlonious (craigsidcarlson) wrote :

It's 2020, is there still no solution to this problem? Getting this error with ubuntu 18 LTS and 19

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Craigums Carlonious,

Is the system exact the same?

Revision history for this message
Craigums Carlonious (craigsidcarlson) wrote :

Hi, yes I also am trying to install onto the razer blade stealth like a lot of the other people above which have the SAMSUNG MZVLB256HAHQ-00000 nvme ssd. Getting the I/O Error and have tried most of the fixes mentioned above, but no luck, and I would rather continue using Windows instead of Qubes.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please attach `sudo nvme id-ctrl /dev/nvme0`?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Hi Kai-Heng Feng, please note that after I installed Qubes, I never ever had the problem again. It may be useful in the debug process, and maybe the way Xen, PCIe and Linux work together in Qubes can give a hint on what's happening. Thank you for all your help to this day.

Revision history for this message
Juan Carlos Carvajal Bermúdez (jucajuca) wrote :

an update on this:

it was actually pcie_aspm=off what helped to solve the problem.

I think the problem is related to the power management of PCIe ports.

Without pcie_aspm=off I started seeing errors like the following ones:

- [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=7841 end=7842) time 291 us, min 1063, max 1079, scanline start 1044, end 1092

 pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
 pcieport 0000:00:1d.0: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
 pcieport 0000:00:1d.0: AER: device [8086:a330] error status/mask=00000001/00002000

I think the bug is not with the nvme controller but somewhere in ASPM. But I am not a kernel developer.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

Juan, which hardware you're on? Razer?

Revision history for this message
Juan Carlos Carvajal Bermúdez (jucajuca) wrote :

No I have a laptop from a XMG, it is a German brand.

Revision history for this message
Ramon Fontes (ramonreisfontes) wrote :

Hello all!

I'm experiencing the same problem with an adata SU800NS38. My SSD works fine with the 4.17.0-041700-generic kernel version but unfortunately this is the only kernel version it works perfectly. In addition to try other kernel versions such as 4.x, I also tried 5.0 - 5.5. The disk becomes read-only during use and I need to use fsck whenever I start the system.

Revision history for this message
Ramon Fontes (ramonreisfontes) wrote :

BTW, pcie_aspm=off and nvme_core.default_ps_max_latency_us=5500 didn't work.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (6.1 KiB)

Ramon, which hardware is yours? Razer?

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 6 de mar de 2020 16:34, Ramon Fontes escreveu:

> BTW, pcie_aspm=off and nvme_core.default_ps_max_latency_us=5500 didn't
> work.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
> Rebooting the ubuntu will give me a black terminal where I can run
> fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
> orphaned inodes. The majority of time it boots back to the Ubuntu
> working good, but some times...

Read more...

Revision history for this message
Ramon Fontes (ramonreisfontes) wrote :

I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is that I bought my laptop about 1 year ago and I've installed Ubuntu 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything worked as expected. However, the same problem happened with any other kernel version (including 4.17.0-041700-generic).

Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-041700-generic works and I don't know why. It didn't work with Ubuntu 18.04.1 and it works with 18.04.4. This is really strange and I need to use most recent kernel versions, because I need to use some features I've implemented for v5.5-rc1.

[1] https://github.com/torvalds/linux/commit/b5764696ac409523414f70421c13b7e7a9309454#diff-21081ef83e1374560c2e244926168e49
[2] https://github.com/torvalds/linux/commit/7dfd8ac327301f302b03072066c66eb32578e940#diff-21081ef83e1374560c2e244926168e49

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (7.3 KiB)

I also had this problem of it working for a year, then I update it and it stops working. Then I roll back the kernel and it won't work again

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 7 de mar de 2020 10:17, Ramon Fontes escreveu:

> I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is
> that I bought my laptop about 1 year ago and I've installed Ubuntu
> 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything
> worked as expected. However, the same problem happened with any other
> kernel version (including 4.17.0-041700-generic).
>
> Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-041700-generic works and I don't know why. It didn't work with Ubuntu 18.04.1 and it works with 18.04.4. This is really strange and I need to use most recent kernel versions, because I need to use some features I've implemented for v5.5-rc1.
>
> [1] https://github.com/torvalds/linux/commit/b5764696ac409523414f70421c13b7e7a9309454#diff-21081ef83e1374560c2e244926168e49
> [2] https://github.com/torvalds/linux/commit/7dfd8ac327301f302b03072066c66eb32578e940#diff-21081ef83e1374560c2e244926168e49
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (devic...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Ramon,
Please file a separate bug since it's platform specific.

Revision history for this message
Ramon Fontes (ramonreisfontes) wrote :

I thought I could help in some way with more information. By the way, I've found the solution and my SSD works fine right now. You may want to take a lookt at https://bugzilla.kernel.org/show_bug.cgi?id=201685. Comment #294 (https://bugzilla.kernel.org/show_bug.cgi?id=201685#c294), in particular, helped me to solve the problem.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I just want to say that after 2 years I remembered I had an SSD with different brand tham Samsung, a Kingston one. I installed it on my razer and it worked perfectly for days, I did several SSD stress tests and no errors.

The error is defintely with Samsung AND linux. And it's not a faulty SSD because it happens on both of my samsung SSDs. It does not happen on Windows, neither Qubes, with any SSD.

I tested the latest Ubuntu 21.04 and the problem still happens on Samsung SSDs right on the installation screen.

Anyways I'm not even using this computer anymore, I bought a Dell XPS 13, but the error persists and it's either Samsung's or Linux fault. Probably Samsung since other brands work ok with Samsung.

Revision history for this message
Anthony Durity (anthony-durity) wrote :

I've hit this "bug". I've a nice Clevo ODM based laptop and luckily I have two nvme drives in it so it's not a show-stopper for me but obv. it's a concern. I have an Intel one which is the boot drive and a Samsung one which is the data drive. I have a dual-boot setup. So two data points to note. The Intel nvme works in both Windows and Linux. The Samsung works in Windows, but not in Linux. When I say that it doesn't work in Linux I should say that the system brings the drive up, I can mount it read-write, everything looks good but as soon as I try and write files to it it craps out with nothing written:

[369.798910] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[369.798916] nvme nvme0: Does your device have a faulty power saving mode enabled?
[369.798918] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[369.870912] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[369.871064] nvme nvme0: Removing after probe failure status: -19
[369.890931] nvme0n1: detected capacity change from 1953525168 to 0

Output of `dmesg` attached.

Revision history for this message
tetebueno (tetebueno) wrote (last edit ):

Bump:

PC: Lenovo Legion Y520-15IKBN

SSD: Samsung SM951 M.2 PCIe SSD Drive (MZ-HPV256)

OS: Elementary OS 7.1 Horus (Ubuntu 22.04.1)

Kernel: 6.5.0-14-generic

---

lshw relevant parts:

computer
     description: Notebook
    product: 80WK (LENOVO_MT_80WK_BU_idea_FM_Lenovo Y520-15IKBN)
    vendor: LENOVO
    version: Lenovo Y520-15IKBN
    serial: PF0UM7F3
    width: 64 bits
    capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
    configuration: administrator_password=disabled boot=normal chassis=notebook family=IDEAPAD frontpanel_password=disabled keyboard_password=disabled power-on_password=disabled sku=LENOVO_MT_80WK_BU_idea_FM_Lenovo Y520-15IKBN uuid=e974c0b6-54a9-11ef-8ff5-54e13d454041
(...)
              *-disk
                   description: ATA Disk
                   product: SAMSUNG MZHPV256
                   physical id: 0.0.0
                   bus info: scsi@3:0.0.0
                   logical name: /dev/sdb
                   version: 500Q
                   serial: S1X2NYAG810617
                   size: 238GiB (256GB)
                   capabilities: gpt-1.00 partitioned partitioned:gpt
                   configuration: ansiversion=5 guid=094df7bd-71f3-4587-b36f-8cb0bd3ba964 logicalsectorsize=512 sectorsize=512

---

Update: I first tried changes in comment #190 but that didn't work, the error persisted. Then, I tried only adding the pcie_aspm=off parameter only (removing the nvme_core.default_ps_max_latency_us parameter) and things got better, I was without errors for about three weeks straight; then the error manifested again. One thing to note is that the day of the failure was the only one that the computer was suspended intentionally. For now I'll keep this configuration as it seems to be the most "stable" one.

Displaying first 40 and last 40 comments. View all 221 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.