Comment 184 for bug 263555

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

So in the interests of adding some closure to this bug. The issue turns out to
have never been the e1000e driver's fault. The fault lies with the
CONFIG_DYNAMIC_FTRACE option. So specifically when the FTRACE code was
enabled, it was doing a locked cmpxchg instruction on memory that had been
previously used as __INIT code from some other module.

a) some other module loads
b) that module's init code calls into ftrace which stores the EIP
c) that module discards its init code
d) e1000e loads
e) e1000e asks the kernel for memory to ioremap onto, and gets the memory
location of the code at b) and maps the flash/NVM control registers there.
f) ftraced runs and rewrites onto bytes 4-8 of the memory location from b/e
g) since the lock/cmpxchg instruction is undefined for memory mapped registers,
random junk is written to the b/e location
h) depending on the contents of the junk in g) the NVM is either byte corrupted
or block erased, which is detected the next time the e1000e driver is loaded.

a short term workaround is in 2.6.27.1 (disable CONFIG_DYNAMIC_FTRACE) and the
longer term fix is rewrites of the cmpxchg code (which is already done and will
be in 2.6.28-rc1)

I strongly recommend that 2.6.27.1 be picked up in ubuntu immediately