[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Bug #263555 reported by Chris Jones
492
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Fedora)
Fix Released
Medium
linux (Gentoo Linux)
Fix Released
Medium
linux (Mandriva)
Fix Released
Critical
linux (Suse)
Fix Released
Critical
linux (Ubuntu)
Fix Released
Critical
Tim Gardner
Intrepid
Fix Released
Critical
Tim Gardner
linux-lpia (Ubuntu)
Fix Released
Critical
Amit Kucheria
Intrepid
Fix Released
Critical
Amit Kucheria

Bug Description

In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.html
http://<email address hidden>/msg00398.html

Related branches

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Description of problem:
I am unable to use my Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03). System does not see it. Pleae find dmesg output.

e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:19.0 to 64
iTCO_vendor_support: vendor-support=0
0000:00:19.0: The NVM Checksum Is Not Valid
ACPI: PCI interrupt for device 0000:00:19.0 disabled
e1000e: probe of 0000:00:19.0 failed with error -5

Version-Release number of selected component (if applicable):
Driver version 0.2.0

How reproducible:
Happens everytime

Steps to Reproduce:
1.Boot computer

Revision history for this message
In , Yanko (yanko-redhat-bugs) wrote :

What kernel version is this? Has this adapter ever worked under Fedora. If yes when did it stop?

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

I am sorry, i totally forgot about these details.
Kernels which i have:
2.6.25.11-97.fc9.x86_64
2.6.25.14-108.fc9.x86_64

I guess it stopped shortly after i upgraded to F9. It must have been one of first kernel updates. I am not sure if that ever worked in F9.

Strange thing, on ubuntu i can not use it too. I do not have dmesg output yet. I will try and see if this matches.

Revision history for this message
In , Chuck (chuck-redhat-bugs) wrote :

Can you post the output of 'lspci -nn -s 0000:00:19.0'?

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Output you have requested:

00:19.0 Ethernet controller [0200]: Intel Corporation 82566DC Gigabit Network Connection [8086:104b] (rev 03)

Chris Jones (cmsj)
Changed in linux:
importance: Undecided → Critical
Revision history for this message
Chris Jones (cmsj) wrote :

I'm wondering if it would be possible for us to patch out the sections of the driver which write to the NVRAM, assuming Intel are not able to make suitable changes before 2.6.27 is released, which prevent this from being possible (e.g. splitting the writing parts out into a separate module which is not loaded by default?)

Revision history for this message
Ben Collins (ben-collins) wrote :

Removed the regression-2.6.27 tag from this. The 2.6.26 kernel and 2.6.27 kernel have the exact same e1000e driver (one which we downloaded from Intel's e1000 sf.net project).

Still a serious issue, but I don't want it to be classified as a regression.

Revision history for this message
Chris Jones (cmsj) wrote :

http://marc.info/?t=122038337000003&r=1&w=2 is another interesting thread about this, on linux-netdev.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Chris,

Just an update here in case you missed chatter in #kernel on Sept 03, tim has already began investigating this issue.

Changed in linux:
assignee: nobody → timg-tpi
status: New → Triaged
Revision history for this message
In , Stbinner (stbinner) wrote :

Updated today my work station after two or three weeks to current Factory kernel and since then the onboard network card doesn't show up anymore:

 e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
 e1000e: Copyright (c) 1999-2008 Intel Corporation.
 e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
 e1000e 0000:00:19.0: setting latency timer to 64
 input: PC Speaker as /devices/platform/pcspkr/input/input3
 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
 e1000e 0000:00:19.0: PCI INT A disabled
 e1000e: probe of 0000:00:19.0 failed with error -5

Booted an openSUSE 11.0 installation and same issue there now too. Some BIOS/checksum got broken?

Revision history for this message
In , Stbinner (stbinner) wrote :

Created attachment 239061
dmesg of Factory installation

Revision history for this message
In , Stbinner (stbinner) wrote :

Created attachment 239062
dmesg of 11.0 installation

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

The driver you have supports your hardware, but is erroring out on load.
The "NVM checksum is not valid" means that something corrupted your system BIOS flash.

Can you please give us details about the hardware in your system, attach the output of
# lspci -vvv > lspci.txt

# dmidecode > dmiout.txt

we have some reports that Lenovo systems (a lot of them) are starting to have this issue.

Please DO NOT run ibautil as some sites on the web suggest to try to fix this issue. It will likely cause you to have to replace your motherboard to get LAN functionality back.

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Created attachment 316491
dmiout.txt

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Created attachment 316492
lspci.txt

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

I have messed around a little with my card. Just wanted to check some suggestions point out here http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid#Solutions

Little orange led on my ethernet is constantly flashing, when i tried with unloading e1000e module it did not changed anything. When i plugged in cable it stopped and green led showed up, meaning that connection is ok though driver still failed to load.
If you need any other info i will gladly help.

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

okay, so you have an HP machine with an ICH8 chipset. I don't know what the little orange LED flashing means, I will have to check on that.

can you get into the iAMT setup just after BIOS completes by pressing CTRL-p?
not sure if that might help you or not.

If I attach a debug driver here would you be willing to compile and run it?

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

I am not able to open iAMT setup. I believe that i do not have that option as i have found that to enable that i need to go to my BIOS settings and turn it on in Power section. Well, i do not have it there.

Yes, please attach driver.

Revision history for this message
Yingying Zhao (yingying-zhao) wrote :

We just met a similar issue in the testing for Intrepid Alpha5. In the beginning, the LAN works fine for x86 system. But after we met a system hangs up in X86_64 system (caused by gfx) in the same machine,we found the Ethernet card can't work any more. "lspci" can't show the correct Ethernet card info. The X86 system which e1000e works before can't recognize the card neither.

Our investigation is underway now.

Revision history for this message
In , Stbinner (stbinner) wrote :

*** Bug 428180 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Intel did cleanup e1000 and e1000e to have no duplicate PCI IDs in both drivers. Maybe they removed this on the wrong driver.
Can you please try to unload e1000e and load e1000 manually if the card is not detected, then please add the ids to the driver on runtime:

echo "vendor device subvendor subdevice class class_mask driver_data" > \
/sys/bus/pci/drivers/e1000/new_id

All fields are passed in as hexadecimal values (no leading 0x).
The vendor and device fields are mandatory, the others are optional.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

See bug 428180 - I can load the e1000 just fine but it does not work at all.

after: echo "8086 104b" > /sys/bus/pci/drivers/e1000/new_id

I see the following in dmesg:
e1000 0000:00:19.0: setting latency timer to 64
e1000: 0000:00:19.0: e1000_probe: The EEPROM Checksum Is Not Valid
/*********************/
Current EEPROM Checksum : 0xffff
Calculated : 0xbaf9
Offset Values
======== ======
00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Include this output when contacting your support provider.
This is not a software error! Something bad happened to your hardware or
EEPROM image. Ignoring this problem could result in further problems,
possibly loss of data, corruption or system hangs!
The MAC Address will be reset to 00:00:00:00:00:00, which is invalid
and requires you to set the proper MAC address manually before continuing
to enable this network device.
Please inspect the EEPROM dump and report the issue to your hardware vendor
or Intel Customer Support.
/*********************/
e1000: 0000:00:19.0: e1000_probe: Invalid MAC Address
e1000: 0000:00:19.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:00:00:00:00:00
e1000: 0000:00:19.0: e1000_probe: This device (id 8086:104b) will no longer be supported by this driver i
n the future.
e1000: 0000:00:19.0: e1000_probe: please use the "e1000e" driver instead.
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

I fear that the EEPROM was deleted. This may be the reason, the fix is e1000 related, but it seems that e1000e has the same reason.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=78566fecbb12a7616ae9a88b2ffbc8062c4a89e3

I hope that Intel can help here and has a way to reprogram the EEPROM.

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

And how was the EEPROM deleted? This is very hard to do without having things wrong with the system itself.

First is the kernel 2.6.27-rcx? If so did the system experience a panic of some sort (not NIC related)? Was some other tool run on the system once it got into this state? Also, is does this system have some sort of manageability enabled on it? If so, disable it and try things again. Let me know about the other questions.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Yes it is based on 2.6.27-rc6 and we have no idea how the system get in this state, but we got multiple reports now :-(, all the same, installing Beta1 with an e1000e card (I will collect PCI ids of all reports later) and during the first driver load you see:

e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
e1000e: Copyright (c) 1999-2008 Intel Corporation.
e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
e1000e 0000:00:19.0: setting latency timer to 64
input: PC Speaker as /devices/platform/pcspkr/input/input3
0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
e1000e 0000:00:19.0: PCI INT A disabled
e1000e: probe of 0000:00:19.0 failed with error -5

in the dmesg output (for more details see attachment #1 for the full log)
If you then try to boot into a other OS version (which was working before) the network card does not work anymore with the same error, which let me think that
the eeprom was overwritten or deleted and later I found the commit in later kernels for e1000 (comment #7) which sounds somehow related for me.

Our e1000e driver differs from mainline in 3 additional patches requested by
Kent Liu (in CC now)
1. http://tinyurl.com/6253yl
2. http://tinyurl.com/5bd8v2
3. http://tinyurl.com/6rj8j7

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Stephan reading your first comment again, you did only install a new kernel, you did not install Beta1, correct ?

The it is unlikely some program or configuration probing causing this, it is the e1000e driver itself.
Please also give us the PCI IDs of your e1000e card.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

*** Bug 428322 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Stbinner (stbinner) wrote :

> you did only install a new kernel, you did not install Beta1, correct ?

Correct, just "zypper dup" + reboot. openSUSE 11.1 Beta 1 didn't exist yet when I reported that bug. :-)

29: PCI 19.0: 0200 Ethernet controller
  [Created at pci.318]
  UDI: /org/freedesktop/Hal/devices/pci_8086_294c
  Unique ID: kpGf.nWnnnRlG_JE
  SysFS ID: /devices/pci0000:00/0000:00:19.0
  SysFS BusID: 0000:00:19.0
  Hardware Class: network
  Model: "Intel 82566DC-2 Gigabit Network Connection"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x294c "82566DC-2 Gigabit Network Connection"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x0000
  Revision: 0x02
  Memory Range: 0x92200000-0x9221ffff (rw,non-prefetchable)
  Memory Range: 0x92224000-0x92224fff (rw,non-prefetchable)
  I/O Ports: 0x3400-0x341f (rw)
  IRQ: 216 (no events)
  Module Alias: "pci:v00008086d0000294Csv00008086sd00000000bc02sc00i00"
  Driver Info #0:
    Driver Status: e1000e is active
    Driver Activation Cmd: "modprobe e1000e"
  Config Status: cfg=no, avail=yes, need=no, active=unknown

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

I have an 11.0 system with just this new kernel. I even went into Windows (which could not get a dhcp address) and installed a new BIOS version. the BIOS shows me that my macid has only FFs.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

Karsten, is the fix mentioned in #7 part of our kernel CVS?

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Yes with the update to rc7. But note, this is not a fix to this issue.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Some more info about affected PCI IDs:
We got 2 reports about (this and bug #428322)
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x294c "82566DC-2 Gigabit Network Connection"

And one report with (bug #428180)
Vendor: pci 0x8086 "Intel Corporation"
Device: pci 0x104b "82566DC Gigabit Network Connection"

And I got one report about a working installation of Beta1 with e1000e driver

Vendor: pci 0x8086 "Intel Corporation"
Device: pci 0x109a "82573L Gigabit Ethernet Controller"

maybe that helps.

John I did add you to the duplicate bugs as well.

Changed in linux:
status: Unknown → Incomplete
Changed in linux:
status: Unknown → Confirmed
Revision history for this message
In , Bob Mahar (bob-o-rama) wrote :

In just looking at this quickly, I think something is getting
balled up in one of the e1000_nvm_operations structures. This would misdirect
the driver to the improper NVM operation and potentially cause the erasure of
the EEPROM.

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Jeffrey Baker (jwbaker) wrote :

This is just my humble opinion, but the Alpha CD downloads should be pulled from the archive. This kernel can partially ruin your hardware, and unsuspecting users shouldn't be able to merrily download it.

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

Created attachment 317425
driver with csum check bypass

here is a driver that just prints the message but doesn't error out if the checksum validation fails.

This should allow you to run ethtool -e ethX after loading the driver.

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

the difference in the driver I just attached is:
diff -rup e1000e-0.4.1.7.orig/src/netdev.c e1000e-0.4.1.7/src/netdev.c
--- e1000e-0.4.1.7.orig/src/netdev.c 2008-06-23 09:27:33.000000000 -0700
+++ e1000e-0.4.1.7/src/netdev.c 2008-09-22 16:06:59.000000000 -0700
@@ -56,7 +56,7 @@

 #define DRV_DEBUG

-#define DRV_VERSION "0.4.1.7" DRV_NAPI DRV_DEBUG
+#define DRV_VERSION "0.4.1.7_nocsum" DRV_NAPI DRV_DEBUG
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;

@@ -5309,8 +5309,10 @@ static int __devinit e1000_probe(struct
                        break;
                if (i == 2) {
                        e_err("The NVM Checksum Is Not Valid\n");
+ /* JJJ skip around error path
                        err = -EIO;
                        goto err_eeprom;
+ JJJ end */
                }
        }

Revision history for this message
Steve Langasek (vorlon) wrote :

Jorge brought this bug to my attention just now; this really needs to be fixed one way or another for beta, even if that would mean blacklisting e1000e altogether until this is resolved. Even with as little as I use the wired ethernet on my laptop, I wouldn't enjoy having to RMA it to fix it after a kernel bug. :/

Changed in linux:
milestone: none → ubuntu-8.10-beta
Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

bob can you elaborate?

we have reports (linked at debian bugzilla) of a user that had a graphics panic and then ran into the issue. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

Upstream bug references to this bug:

        http://lkml.org/lkml/2008/8/8/123
        http://lkml.org/lkml/2008/9/22/23
        http://bugzilla.kernel.org/show_bug.cgi?id=11382

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

also, whole piles of reports now starting to converge, many of them linked here:

http://bugzilla.kernel.org/show_bug.cgi?id=11382

I'm trying to work a plan to help address this soonest.

Revision history for this message
Colin Watson (cjwatson) wrote :

Jeffrey, we can't afford to do that; we need to be able to test with the Alpha CDs on the wide variety of hardware not affected by this bug, or our development schedules for 8.10 will be seriously compromised. However, I'd be happy to add a warning to the cdimage web pages. Can anyone suggest some text?

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

In my case: I saw the error message with a previous 2.6.27 kernel first but did not report it :-( My log files show the first occurence on the 10th of September which would mean that this was 2.6.27-rc6 or even 2.6.27-rc5 with SUSE patches.

I had some crashes during boot where my graphics display was somehow screwed up (did not succeed in debugging with serial console, so no report for that yet).

So, yes it could be that another error broke this but since I mainly use wireless and not the ethernet port, I only noticed this problem recently and cannot say for sure when and how it happened.

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

All,

If you read the entire thread pointed to in #7 you would have read Intel response to this. The thread has to do with VMWare and not Linux. VMWare is based on a very old kernel that has some poor kernel locking issues. We pointed this out to them and asked the question is any of the people responding had actually seen this problem on Linux. This is where thing went awry a bit. All (_all_) of the reports that we have so far have had this gfx panic just before this problem comes up. The current belief is that the gfx panic is scribbling all over our NVM space somehow. We are not sure how this is happening. Since we can't repro the problem without this panic happing first this is very hard for us to look at. If the gfx panic does not happen there have been no problems reported. I do not know the status of a fix for the gfx panic. We are working on repro cases but because the problem relies on a panic to another driver this is very hard for us to work on. Work will continue.

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

okay, so two votes for a graphics panic possibly related to the issue.
Andreas, would you be willing to comment out the code just after the
nvm.validate (validate_eeprom) in netdev.c and then try to dump your eeprom
using ethtool -e ethX

if it returns all 0xff 0xff ... then can you please download ethregs utility
from sourceforge.net/projects/e1000 and build/run it and attach or send me the
output?

we are still trying to reproduce, I'm raising the issue in priority and we will continue to update as we make progress.

Revision history for this message
Alacrityathome (alacrityathome) wrote :

Colin,

Seems that a warning may be insufficient. I would think most of the folks testing a pre-release may not know they have an e1000e driver or affected NIC.

Maybe blacklist e1000e asap and then re-instate e1000e after a fix is found.

Perhaps have the "warning" state something about the e1000e being temporarily withheld from the pre-release with certain Intel NICs affected.

John

Revision history for this message
In , Chuck (chuck-redhat-bugs) wrote :

Michal, have you ever booted a Fedora 10 Alpha or rawhide disk on that system?

Revision history for this message
In , Bob Mahar (bob-o-rama) wrote :

Jesse, the question I'm left with is "what TYPE of eeprom interface does the broken NIC's implement?" I.e. what specific chip and what interface does it provide to the eeprom. Part of that you get from the customer, and part you get from Intel. Considering the Intel 7256x and 7257x parts, you have 4 different interface - that I know of.

For SPI / uW its hard to "accidentally" overwrite the prom as you have to sent it the write enable opcode first and then shift in the data. That's typically to complicated to happen via random garbage from a crash. If these are parts with SPI / uW addressed proms, then the overwrite is most likely the result of e1000e code being called. If that's the case, the debug builds of the driver would dump out the "I'm writing to the prom" messages. ( Hint: that's not the case. )

For the parallel eeprom, which are memory mapped, they can be overwritten by writing junk over top of thir address space. I think the 72573 has this type. The 72566 part also has a similar, but different memory mapped eeprom.

Point being, if the SPI based parts are having issues, this points towards programatic overwrite - as its unlikely to happen by accidental overwrite of I/O or memory space. Damage to the memory mapped parts, on the other hand, could happen because accidental overwrites are poorly defended.

Perhaps John can give us a clue as to the Intel parts which use SPI vs uW vs shadow RAM vs memory mapped. I don't see much in the code that latches the memory mapped eeproms from accidental overwrite. For the shadow ram method, it wants the write through flag to be set to true, well FF's are "true" so if this is the means to protect the underlying prom data, its a pretty feeble one. If the gfx crash writes all FF's over a swatch of memory, there you have it.

Revision history for this message
ArbitraryConstant (anthony-spamtrap) wrote :

Is Ubuntu willing to risk the liability of distributing software known to destroy hardware?

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

Ad comment #24: I'm not using vmware at all on this machine.

Ad comment #25: Karsten, could you compile a kernel and the tool for me? I'll then test it (note I'm the whole day in meetings and have only little time to do anything extra myself). Stephan, will you test as well?

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

Hi Bob, thanks for the informative reply.

all the broken ones seem to use the FLAG_HAS_FLASH, at this point all the
reports I've heard have BAR1 mapped to an area that has registers used for
read/writing the NVM. The people I've heard reporting this have a laptop based
on the ICH8 or ICH9 chipsets, aka the lan part is the 82566 or 82567.

This NVM is usually part of a larger flash which contains the BIOS and possibly
the PXE boot code as well as the LAN Non-volatile-memory (NVM)

none of our "configuration" areas use direct memory mapped mode, and unless you
call the "write_eeprom" function using ethtool, nothing should be calling
e1000e_write_flash_data_ich8lan.

we have some patches ready for setting the registers to read-only that is
mapped at the flash BAR1 area. They are not tested, but we will test a little
and post them to the mailing list tomorrow.

If you have any references to any users that *don't* have an 82566 or 567, then
please point it out.

Revision history for this message
Scruffynerf (scruffynerf) wrote :

Unless Canonical wants liability for
a) Individual user's destroyed hardware
b) Crippling reputation damages, especially against the 'new to linux' groups
I'd echo the suggestion to pull the liveCD's until this is fixed.

When new linux users discovered permanently corrupted hardware after trying Ubuntu, and this gets out in the wider webs, all of Ubuntu's efforts at promoting Ubuntu will also be destroyed.

Breaking known good hardware is a problem greater than keeping to a self-imposed delivery schedule.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Ad comment #23:
Do you still have this kernel around with the exact version (best would be the rpm or kernel source) ?
I added the 3 additional patches on Wed Sep 10 17:18:30 CEST 2008 . So this may related.
Ad comment #27:
I will prepare a kernel based on 11.0 and the tools.

Revision history for this message
eentonig (eentonig) wrote :

It's alpha software, people should be considered as being aware that using it might break stuff.

Furthermore, people should be smart enough to read about the known issues prior to installing it.

Yes a warning and blacklisting the e1000 driver should be done, but revoking an alpha because of a (serious) bug just doesn't seem the answer to me, because it blocks you from finding other issues that might bite people when the official release gets out.

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Yes, i have rawhide on my system.
Last two kernels i have
2.6.27-0.226.rc1.git5.fc10.i686
2.6.27-0.244.rc2.git1.fc10.i686

I do not know which one killed my port. If you want me to run it or something i am unable to have any internet connection on that kernels, wifi does not work, eth you know.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

I got an other report about a working system (82566MM Gigabit Network Connection [8086:1049] (rev 03)) but with a pre Beta1 kernel which is mostely identically but _without_ the 3 additional patches.

Revision history for this message
In , Wstephenson-9 (wstephenson-9) wrote :

As per comment #19, I also have 8086:109a here and beta1 (kernel-pae-2.6.27-7.2) works and has not broken the hardware yet. I installed beta1 at the weekend not knowing about this bug, but am loath to boot it again if it will cook my ethernet.

Revision history for this message
Chris Jones (cmsj) wrote :

Colin: FWIW, I think some kind of warning on cdimage and in the alpha release notes seems highly prudent (not because of the bogus liability claims here, but just because it's the good thing to do). I would suggest:

"Due to an unresolved bug in the Linux kernel currently used in Ubuntu 8.10 users with Intel network hardware supported by the e1000e driver should not download and run these images. Doing so may render your network hardware permanently inoperable.
Older Intel network hardware which uses the e1000 driver is not affected by this, however, use of the e1000 driver in older Ubuntu releases is not a reliable indication of which driver will be used by Ubuntu 8.10. Support for hardware which uses a PCI Express bus has been moved from e1000 to e1000e. If in doubt, do not run these images and subscribe to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555 to receive notifications when the bug is fixed."

Steve: I am not sure exactly where the responsibility for handling this for the Alphas falls (other than being quite sure it's not mine, and suspecting it's yours ;) but I think we should put warnings out fairly prominently, as SuSE has done. The obvious safe default would be to yank e1000e.ko and replace the above warning with something similar which explains why newer Intel network hardware won't work in the Alphas. It's a bit of a nuclear option since there is a lot of this hardware around and the bug mostly seems to be affecting laptops, but since they tend to be doing a lot more "interesting" kernel work (suspending, frequent loading of modules, etc) it could simply be that they are exposing it more easily and server hardware is just as capable of being affected.

For those wishing to discuss this bug, its implications, etc. there is a forum thread which seems more suitable for this, see: http://ubuntuforums.org/showthread.php?t=912666

Revision history for this message
In , Stefan-seyfried (stefan-seyfried) wrote :

I had a machine (hp 2510p) which had a lockup with garbled X display and refused to boot at all afterwards (got the mainboard replaced on warranty) and we have another 2510p which, after updating to an early 2.6.27rc, suddenly "forgot" it's 1280x800 video mode. They both have not shown e1000e trouble (yet ;), but it hints into the "graphics problem overwrites system flash" direction.

The 2510p has

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)

00:02.0 0300: 8086:2a02 (rev 0c)
00:19.0 0200: 8086:1049 (rev 03)

Revision history for this message
In , Martin-wilck-d (martin-wilck-d) wrote :

Sorry to interfere, our ESX expert told me that under ESX 3.5 it wasn't necessary to have processes _writing_ to the EEPROM for this problem to occur. Rather, it happened if two processes were just _reading_ the EEPROM at the same time, due to a broken locking bit in the HW of some NICS (82546, that's what I was told).

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

OK I did built the ethregs package and a 11.0 kernel with the error paths commented out. The kernel is only available internally in mbuild diabelli-kkeil-179 (kernel-default).
ethregs is available in the buildservice search for it on http://software.opensuse.org/search

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

Please stop talking about VMWare. That OS has a problem that was fixed by a work-around in the driver. What the VMWare guys saw has never been seen in Linux. It has _nothing_ to do with this bug.

As Jesse said in his mail we are working on some NVM protection patches for our driver. This won't fix the root cause of writing over our NVM area but it might help to find what/who is writing all over it.

For comment #30 and #31, if the system did not see one of the gfx driver panics our NVM remains fine. So if that panic does not happen our NIC NVM remains fine as well.

As soon as the patches are out of test we'll be pushing them upstream. We'll post links to this BZ once this happens today.

Revision history for this message
Arnd (arnd-arndnet) wrote :

> It's alpha software, people should be considered as being aware that using it might break stuff.

That's absolutely ridiculous. I'm being aware that ubuntu alpha or beta can break some stuff (like eating my filesystem or deleting my partitions etc). In fact it already did. However, this is a whole different thing as BREAKING HARDWARE.
To make this clear, we are talking about RMAing laptops and mainboards because of this bug. And we are also talking about reasonable popular hardware. With the statement standing in the room that I should espect my hardware to die when I try ubuntu I will certainly don't try any alpha or beta ubuntu software ever again.

In my opinion the Alpha should be pulled NOW. Then you can discuss what steps have to be make to address this problem. (e.g. one easy sollution: disable e1000 and republish) This won't you cost more than a few days.

Just my 2 cents

Revision history for this message
Christian Wolf (christianwolf) wrote :

Folks,

I suggest to remove the respective Intrepid AlphaX images from the mirrors ASAP.

Although testing is testing, and everybody knows that there is a risk (I remember a similar issue with Mandrake Linux and CD-Rom drives) and you, as a tester, take a known risk, we also have the responsibility to minimize impact of this issue.

I think only the latest Alpha6 has this flaw?

Revision history for this message
John Dong (jdong) wrote :

Shall we pull in e1000e-prevent-corruption-of-eeprom-nvm.patch? It seems from the discussion that it isn't a 100% fix (other methods of reaching mmio'ed EEPROM probably exist) but should at least eliminate this disaster scenario of just booting up the distribution causing the card to be hosed.

Revision history for this message
In , Andi-nbz (andi-nbz) wrote :

If the graphics crash uses DMA to override the MMIO area (assuming it's really
the graphics crash) then it would require VT-d to protect it. But write protecting it to on the CPU level is probably a good start, just it might not be not enough.

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

Does this mean Fedora 9 is not to blame for killing e1000e?

Slashdot reported that Fedora 9 and 10 are affected, but it sounds like only rawhide has the problem.

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

Agreed Andi. This is all we can do however. Let's see what happens with the patches with the people that are actually seeing the problem.

Revision history for this message
In , Andi-nbz (andi-nbz) wrote :

Those would first need to fix their mac addresses to try again right?
Also is there other vital information in that EEPROM?

Some of those systems should have VT-d. Presumably it would give an message
on a stray DMA? Of course the CPU protection would be needed too.

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #38 from Andi N Kleen)
> Those would first need to fix their mac addresses to try again right?
> Also is there other vital information in that EEPROM?

By the way the current driver doesn't get even bound to the card that has wrong EEPROM CRC, right? So it's even not possible to easily fix its contents up using ethtool from within default installation.

Karsten already patched and built the kernel so that it binds the driver to the card even in cases of broken EEPROM checksum.

Revision history for this message
In , Jon (jon-redhat-bugs) wrote :

FWIW, I've heard of similar problems with recent -RT kernels.

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote : RE: [Bug 263555] Re: [intrepid] 2.6.27 e1000e kernel places Intel gigEchipsets at risk

John Dong wrote:
> Shall we pull in e1000e-prevent-corruption-of-eeprom-nvm.patch? It
> seems from the discussion that it isn't a 100% fix (other methods of
> reaching mmio'ed EEPROM probably exist) but should at least eliminate
> this disaster scenario of just booting up the distribution causing
> the card to be hosed.

no, this patch is for e1000, and has nothing to do with this problem.
Right now, the only reports of this issue are with 82566 and 82567 based
LAN parts (ich8 and ich9).

the eeprom is not MMIO mapped, the registers for accessing it are. I'm
still not clear if a random write to a memory location could corrupt
things, we'll be looking at that today.

Chris Jones (cmsj)
description: updated
Revision history for this message
Ralf Nieuwenhuijsen (ralf-nieuwenhuijsen) wrote :

>http://www.ubuntu.com/testing/intrepid/alpha6

No warning

>http://cdimage.ubuntu.com/releases/intrepid/alpha-6/

No warning

>http://cdimage.ubuntu.com/releases/intrepid/alpha-6/intrepid-desktop-i386.iso

Download works.

How many people test these iso's? How many of them are using an intel motherboard? (10%-40% ?)
How many will ever test again if testing an ISO means you are frying your motherboard.

This isn't a blame game. But if top priority is not removing the alpha, it will very soon be...

We are talking about thousands, if not millions, of laptops and pc's that will be broken beyond repair, if I'm not mistaken...

And some people actually say things like:

>Jeffrey, we can't afford to do that; we need to be able to test with the Alpha CDs on the wide variety of hardware not affected by this bug,

Don't you get it. If you don't pull now, NOBODY WILL TEST THE NEXT VERSION.
There will be no NEXT VERSION because nobody DARES to install it.

>It's alpha software, people should be considered as being aware that using it might break stuff.

Yes, it may corrupt data. [but if it does that beyond its own partition; it should be a big issue as well]
But BREAKING hardware?

I'm quite sure that afterwards some quality control and reflection .. that there will be SOME policy to prevent these mistakes (NOT TAKING THE IMAGE DOWN) ..

But it will be too late.

PULL THE IMAGE: THEN DISCUSS!

Revision history for this message
abingham (abingham) wrote :

It's been almost a day since discussion on this issue resumed.

The Alpha 6 image are still up with no warning present.

I always assume that an Alpha or Beta release may break things to where I need to reinstall the OS. Battery life could be bad. Etc. But this is literally capable of *destroying* peoples hardware. It's a whole different ball game. Even the LiveCDs are affected, and people testing them can reasonably assume there will be no hardware impact on their system even if it is an Alpha.

These images need to be pulled from availablility *now*. Major mirror sites need to be notified.

If the release becomes '8.11' instead of '8.10' because of it, so what. We are talking about destroying motherboards here. Replacing a laptop motherboard can cost > $500.

If this is not dealt with, I will no longer be able to recommend Ubuntu to friends and family. The attitude of 'release on time at all costs' already caused many issues with 8.04, and now people are seriously suggesting continuing distribution of disc images that literally destroy hardware?

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

The modified driver is in mbuild diabelli-kkeil-184 (diabelli-kkeil-179 was wrong and crash).
With this driver I see the card ethtool -e shows all FF and the MAC is also read as FF, but I set set the old MAC address with ifconfig and it seems that the card is working, but only at 10 Mbit. ethregs output follows.

Revision history for this message
Jeffrey Baker (jwbaker) wrote :

There's no reason to be hysterical, but a re-spin of Alpha 6 CDs without the e1000e module may be called for. This is a separate bug, but the recommended workaround of adding "blacklist e1000e" in /etc/modprobe.d/blacklist doesn't work. Somehow, udev or some other thing manages to load it anyway. I had to unlink it.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Created attachment 241194
ethregs eth1 output

Revision history for this message
abingham (abingham) wrote :

Intel has ~80% CPU market share and >=70% of the chipset market for their own CPU.

So at least 56% of machines sold are Intel CPUs with Intel chipsets that are susceptible to this bug.

1 in 2 of Ubuntu testers could be vulnerable to this.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

Ad comment #29: The kernel is not available anymore.

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

I strongly recommend if you are going to test for this bug or haven't seen it
yet on your ich8/9 system, that you RIGHT NOW, do ethtool -e ethX >
savemyeep.txt

Having a saved copy of your eeprom means we can help you write it back to your
system.

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote : Re: [intrepid] 2.6.27 e1000e kernel places Intel gigE chipsets at risk

I strongly recommend if you are going to test for this bug or haven't seen it
yet on your ich8/9 system, that you RIGHT NOW, do ethtool -e ethX >
savemyeep.txt

Having a saved copy of your eeprom means we can help you write it back to your
system.

Revision history for this message
sog (sogrady) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

For those that might be interested in testing the Alpha, but has an at risk
machine, is there an accepted workaround that removes the e1000e driver
without jeopardizing the hardware?

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote : RE: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

okay, lets just use the data we *have* now. What we know is that some
users have reported a corrupt NVM. Intel networking does not have a
current reproduction but is *fully engaged* on trying to solve this
problem. We have only had reports on 82566 and 82567 based machines, no
others. Trying to extrapolate this out to "1 of 2" users is just fear
mongering.

These kernels being released with this problem are still in alpha/beta,
which means our testing audience is smaller, but so is the potential
impact of any problem.

The process is working as far as I can see, we have a set of users that
is reporting the problem, which will help keep the kernels with the
issue from being promoted to full production status.

If you have some useful data to add to this bug, please comment, we're
listening. I think the discussion about pulling alpha cds or whatever
should go to some mailing list, and not be inside this bug.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
abingham
Sent: Tuesday, September 23, 2008 9:55 AM
To: Brandeburg, Jesse
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel
ICH8and ICH9 gigE chipsets at risk

Intel has ~80% CPU market share and >=70% of the chipset market for
their own CPU.

So at least 56% of machines sold are Intel CPUs with Intel chipsets that
are susceptible to this bug.

1 in 2 of Ubuntu testers could be vulnerable to this.

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets
at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Confirmed
Status in "linux" source package in Ubuntu: Triaged
Status in linux in Ubuntu Intrepid: Triaged
Status in "linux" source package in Fedora: Confirmed
Status in "linux" source package in Suse: Incomplete

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to
corrupt the NVRAM used by some Intel network parts to store data such as
MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared
from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to
quote Intel). The reports seem to be isolated to laptops, but it is not
clear if this is because desktop/server parts are not vulnerable, or if
use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS
update, but may well require replacement of the hardware. Use of Intel's
IABUTIL.EXE is strongly discouraged, as it will worsen the problem to
the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much
guesswork. Below are the URLs originally referenced)

http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.h
tml
http://<email address hidden>/msg00398.h
tml

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

Jesse, What happens with those that have already a broken eeprom?

Revision history for this message
In , Eich-m (eich-m) wrote :

Could this be the same as bug #57976?

At the time I did not notice any connection with a graphics corruption - but would not rule it out either.
In any case the intel graphics driver was a totally different one, the one we used
at the time (i810) is no longer shipped with 11.1.

Can someone elaborate a little on the 'graphics panic'? Was this a total lockup or just a screen corruption?

Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #44 from Andreas Jaeger)
> Jesse, What happens with those that have already a broken eeprom?
>

Point is: if it's related to #57976 writing back the eeprom doesn't work.
At the time I attributed this to preproduction hardware.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Uploaded module-init-tools_3.3-pre11-4ubuntu10 to temporarily blacklist e1000e.

Revision history for this message
In , Bob Mahar (bob-o-rama) wrote :

(In reply to comment #46 from Egbert Eich)
> Point is: if it's related to #57976 writing back the eeprom doesn't work.
> At the time I attributed this to preproduction hardware.

I don't have access to that one... what was the issue? Can you elaborate?

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

In comment #40, just setting the MAC address in the NVM is most likely not going to restore everything to working. The HW reads a lot of things out it when it's coming up. You will probably have to do a BIOS update to get everything in the system back. You will also have to put the MAC address back in as well. The BIOS update might do this for you so make sure it's right before trying to update.

We would like to know that this works so if you could try it on this system it would help a lot.

Revision history for this message
Harry (harry2o) wrote :

Even if it means making myself look like an idiot: May I also suggest publishing (maybe along with the warnings) hints about how to find out whether your hardware is / will be / might be affected. And when does that eeprom writing actually happen - at boot time, when using the lan interface, or elsewhen?

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

I could write the MAC address with ethtool but now the driver do not load completely insmod hangs for about a minute and then it disable the IRQ.
After this here is no eth1 this are the dmesg:
e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:19.0 to 64
ACPI: PCI interrupt for device 0000:00:19.0 disabled
e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
e1000e: Copyright (c) 1999-2007 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:19.0 to 64
ACPI: PCI interrupt for device 0000:00:19.0 disabled

I think, now with a valid checksum it interprets some of the still 0xFF values as valid and set some registers wrong.

Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #47 from Bob Mahar)
> (In reply to comment #46 from Egbert Eich)
> > Point is: if it's related to #57976 writing back the eeprom doesn't work.
> > At the time I attributed this to preproduction hardware.
>
> I don't have access to that one... what was the issue? Can you elaborate?
>

In bug #57976 an SPI type eeprom which seemed to still hold valid content (at least not 0xff) but a bogus checksum could not be restored as no matter to which byte offset a value was written it always ended up at offset 0 or 1.

However looking at comment #49 it doesn't seem to be related as in this case the checksum could be fixed.

Revision history for this message
Alacrityathome (alacrityathome) wrote :

I had the duplicate bug 272630 and consider myself lucky. I had the Intel NIC but had used Alpha 5. I only had the dmesg error and not the hardware eeprom failure. I had Alpha 6 ready to test until I found this bug thread. For folks like me, it will be a good decision to blacklist e1000e pending a resolution. Most 1st time Alpha testers would not be as lucky or have the time to seek out a full bug thread.

Revision history for this message
Ralf Nieuwenhuijsen (ralf-nieuwenhuijsen) wrote :

>The process is working as far as I can see, we have a set of users that
is reporting the problem, which will help keep the kernels with the
issue from being promoted to full production status.

I'm sorry .. will you buy these a new laptop? Ifso, then the proccess is working.

Try advertising: there is a 50% chance that this will destroy your hardware.
How many testers will you have got left? ZERO.

THE IMPLIED RISK OF TESTING ALPHA SOFTWARE IS DATA CORRUPTION.

What's next? THe machiene blows up and kills people?
Would that be a reason to remove a machiene-destroying cd-image?

YES, its' not Ubuntu's fault .. the alpha is shipped. The procces is working, but the procces is not done until somebody QUICKLY removes the image.

No tester signed up for this; and I for sure will not ever put an alpha disc into one of my machienes. I'll even wait a couple of months after the release.

It's not that this happened. It's that afterwards official developers consider loosing half the hardware of every volunteers that test a sane and good move. Just part of the proccess.

This discussion SHOULD NOT BE MOVED TO A MAILING, until the CD IMAGE IS GONE.

I know this is not the UBUNTU code of conduct. But so is WILLFULLY DAMAGE PEOPLE"S PROPERTY.

You know of the BUG, REMOVE THE IMAGE.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Since we do not have a similar machine, we cannot dump the EEProm. The board is a Intel DQ35JO. Maybe someone from Intel has access to this board and can provide a ethtool -e dump ?

Changed in linux:
status: Incomplete → In Progress
Changed in linux:
status: Confirmed → In Progress
Revision history for this message
M. Salivar (mfsalivar) wrote :

One alpha tester has already been lost, at least partially. When the problem occurred I sucked it up and said to myself, you know, it is alpha. Things like this shouldn't happen, but they do from time to time. Now that I see the indifference of Ubuntu devs to people losing their hardware, and even worse, to the extreme likelihood of more people losing theirs because of a stupidly strict adherence to release schedules, I'll never test an alpha outside of a virtual machine again (your loss, not mine). I'm not sure yet, but I may be through with Ubuntu, period.

You should pull all the current alphas, and quickly release an alpha 7 with the e1000e module removed or an older kernel. It's the only reasonable thing to do. Pulling the alphas and waiting for a fix will cause too many delays, but leaving up the current alphas is just plain immoral.

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

I suggest this is severity urgent now.

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

I for one second the motion to remove the alpha images until they have been prepared so that there is no risk of hardware damage. I would be seriously concerned about the negative effects of ignoring this issue.

There is a liability on Canonical for the distribution of software which causes permanent and irreparable damage. The problem has been identified, and by not shielding their customers from this issue is nothing less than WILLFUL NEGLECT.

REMOVE THE IMAGES NOW, and re-issue them when the driver has been blacklisted.

Revision history for this message
mmomjian (matthew-momjian) wrote :

Thomas, why do you feel that the present warnings are not enough? I for one feel that they are certainly sufficient warning to people running those chipsets to not download them. Also, this isn't a democracy; we don't vote on whether to pull images or not. That's up to the core developers/Canonical.

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

Why do i not feel the warnings are enough?

for one, because users who download VIA bittorrent will never see those warnings, and like somebody above noted, not all testers know for certain the hardware they are running. Simply saying "if your computer breaks, tough luck, we warned you" will not garner any respect among linux users. The alpha testers are doing canonical a great service, and taking that for granted would be a shame.

I'm not saying it's a democracy, i am just warning of the consequences of ignoring this issue and bricking people's computers.

Revision history for this message
Ing0R (ing0r) wrote :

I think a warning (with Jesse Brandeburg's advice) should go to *every* tester who is running such a system.
I just read about it on a computer news site and I wish I got this news form canonical (maybe via update manager)

Ing0R

Revision history for this message
Thomas McKay (tom-mckay1) wrote :

Not to mention the thousands of people who downloaded these images before the warning was posted.

Revision history for this message
Daniel Kulesz (kuleszdl) wrote :

I was really shocked to see that it took more than one day between the Issue becoming apparent and the warning being placed on the website. This is a very serious issue and can cause severe damage on really expensive hardware (i.e. most recent Lenovo Thinkpads like the X200, X301, T400, T500 and so on).

Also, please be also aware, that some testers might simply change their old download URL from a download manager and increment from 5 to 6 - therefore I really suggest to at least move the images away to some different place and replace the ISO files with textfiles containing the same warning together with the real download location.

Is there any way to issue a warning to all the testers who are already using or began downloading the ISO? Does the installer query some URL through which the warning could be injected?

Revision history for this message
mmomjian (matthew-momjian) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

The only place where I can find official download links to the
BitTorrent is at http://cdimage.ubuntu.com/releases/intrepid/alpha-6/,
where the warning exists.

Revision history for this message
Chris Jones (cmsj) wrote :

Please listen to what Jesse said in comment #24. This is a bug report, not a discussion forum.

Calls for ISOs to be pulled, legal claims, accusations and use of capslock should be on the ubuntu-devel or ubuntu-devel-discuss mailing list. For one thing, by posting to those lists your opinion will be seen by a much wider audience.

The people subscribed to this bug have either been affected by this bug (such as myself, I filed this bug), or are trying to fix it.

Yelling at those people (which is what a number of you are doing) will solve nothing. Stop it please. The only relevant discussion here is that which is gathering information about the bug, or attempting to fix it.

I appreciate this is a contentious issue (since my laptop was affected by this), but I want to read about progress, I don't want to read lots of ranting. I also don't want this post to be perceived as negative, or whining, or whatever. I fully sympathise with people who are trying to protect their fellow users from harm, and in that respect I apologise for not shouting more about this bug when it was first uncovered. All I did was make sure as many of the people as possible who could fix it, knew about it.
If you would like to argue with me about this, please do not do it here, email me personally (see my Launchpad overview page for my addresses) or via <email address hidden>.

Revision history for this message
Michael W. (hotdog003-gmail) wrote :

If we don't pull the images (we should, but I won't comment since it's already being discussed), it might be a good idea to at least make the words "permanently inoperable" on the Alpha 6 testing page in big, bold letters so users have less of a chance to skim over that part.

Think about it: How many times do we read warning labels on the stuff we eat? My point exactly. Having a "WARNING" section on a testing page where people are already expecting things not to work perfectly might not be an accurate indicator of exactly how grave this problem really is.

I think we should do everything in our power to at least let users know what they're dealing with here. Somehow, we've managed to produce a stick of dynamite with a lit fuse. A lot of people are expecting testing images to be imperfect and may skip right over the warning section because they already know the typical "This is just alpha software, hopefully nothing major will happen" lecture that warning sections typically give them. Making "permanently inoperable" in bold letters will make it much more eye-catching than it is now.

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

Intel has just posted patches to lkml [1] [2] [3] that mark the memory mapped EEPROM region as read-only. Therefore if the EEPROM is garbled by any bug in kernel code, after these patches are applied, the EEPROM would no longer be overwritten, and stack trace would be dumped instead, which will hopefully point to the code that is corrupting the memory.

If, however, userspace is corrupting the memory region (most probably X.Org), then this protection is rendered useless, but it still is worth trying so that we can potentially rule out either userspace or kernelspace code completely.

I have built the kernel with these three patches applied, for those who are willing (and able) to test. The kernel RPM can be obtained from

        http://labs.suse.cz/jikos/download/bug-425480/

Any testing would be highly appreciated.

[1] http://lkml.org/lkml/2008/9/23/427
[2] http://lkml.org/lkml/2008/9/23/431
[3] http://lkml.org/lkml/2008/9/23/432

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #52 from Jiri Kosina)
> If, however, userspace is corrupting the memory region (most probably X.Org),
> then this protection is rendered useless, but it still is worth trying so that
> we can potentially rule out either userspace or kernelspace code completely.

In fact, testing whether booting the system only in text-mode (so that xorg won't be started at all) also triggers the bug or not would also be a valuable test.

Revision history for this message
In , John (john-redhat-bugs) wrote :

Patches to the e1000e driver to protect the NVM were posted to netdev a few ours ago. They need to be tried on this problem. Either it will fix the problem or it should point to what is causing the problem. The patches are obviously for the 2.6.27-rc kernels.

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

Thanks Jiri. Yes these patches need tried. We tested them but we have not been seeing the problem.

Andreas, could you please give us the model and serial number form the system that had the NVM corrupted? We have some people here at Intel that think that they can track down the EEPROM image. They want to see if things are correct with it.

Thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.27-4.6

---------------
linux (2.6.27-4.6) intrepid; urgency=low

  [ Tim Gardner ]

  * Disable e1000e until the NVRAM corruption problem is found.
    - LP: #263555

  [ Upstream Kernel Changes ]

  * Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"

 -- Ben Collins <email address hidden> Tue, 23 Sep 2008 09:53:57 -0400

Changed in linux:
status: Triaged → Fix Released
Revision history for this message
Jojo (kuzniarpawel) wrote :

according to http://groups.google.com/group/linux.kernel/browse_thread/thread/a5ef7deff8551186/d05c233ecb430178

this bug might be related to xorg and Intel graphics.

I used e1000e for 5 days with lot of traffic on eth0 and nothing happened (luck?) but I have T61p wit NV Quadro

Revision history for this message
Joel Stanley (shenki) wrote :

I was using 2.6.27-rc era e1000e on a Dell desktop machine and my ThinkPad X300 for a few weeks, and they operated fine.

Both machines have the Intel 965 video chipset.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

John, my system is a Lenovo thinkpad X61s:
  System Info: #1
    Manufacturer: "LENOVO"
    Product: "766929G"
    Version: "ThinkPad X61s"
    Serial: "L3A2878"
    UUID: undefined, but settable
    Wake-up: 0x06 (Power Switch)
  Board Info: #2
    Manufacturer: "LENOVO"
    Product: "766929G"
    Version: "Not Available"
    Serial: "1ZDMN77215Z"

The e1000 is:
26: PCI 19.0: 0200 Ethernet controller
  [Created at pci.310]
  UDI: /org/freedesktop/Hal/devices/pci_8086_104b
  Unique ID: kpGf.mInfNyjoCrB
  SysFS ID: /devices/pci0000:00/0000:00:19.0
  SysFS BusID: 0000:00:19.0
  Hardware Class: network
  Model: "Intel 82566DC Gigabit Network Connection"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x104b "82566DC Gigabit Network Connection"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x0000
  Revision: 0x03

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

The motherboard from this bug is:

Handle 0x0007, DMI type 2, 20 bytes
Base Board Information
        Manufacturer: Intel Corporation
        Product Name: DQ35JO
        Version: AAD82085-801
        Serial Number: BQJO749006WD

The onboard NIC:
33: PCI 19.0: 0200 Ethernet controller
  [Created at pci.310]
  UDI: /org/freedesktop/Hal/devices/pci_8086_294c
  Unique ID: kpGf.CUCsNZz8jz8
  SysFS ID: /devices/pci0000:00/0000:00:19.0
  SysFS BusID: 0000:00:19.0
  Hardware Class: network
  Model: "Intel 82566DC-2 Gigabit Network Connection"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x294c "82566DC-2 Gigabit Network Connection"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x0000
  Revision: 0xfd
  Memory Range: 0x92200000-0x9221ffff (rw,non-prefetchable)
  Memory Range: 0x92224000-0x92224fff (rw,non-prefetchable)
  I/O Ports: 0x3400-0x341f (rw)
  IRQ: 20 (no events)
  Module Alias: "pci:v00008086d0000294Csv00008086sd00000000bc02sc00i00"

The original MAC: 00:1c:c0:2b:74:3a

Revision history for this message
In , Stefan-seyfried (stefan-seyfried) wrote :

(In reply to comment #53 from Jiri Kosina)
> In fact, testing whether booting the system only in text-mode (so that xorg
> won't be started at all) also triggers the bug or not would also be a valuable
> test.

It is, unfortunately, not that easy. I have rebooted my machine (hp 2510p) with e1000e 17 times since Sep 15 with 2.6.27-rc5-git9+ Kernels (always pretty recent STABLE) and I did not encounter any problems.
So it is pretty hard to prove the absence of this bug.

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #57 from Stefan Seyfried)
> It is, unfortunately, not that easy. I have rebooted my machine (hp 2510p) with
> e1000e 17 times since Sep 15 with 2.6.27-rc5-git9+ Kernels (always pretty
> recent STABLE) and I did not encounter any problems.
> So it is pretty hard to prove the absence of this bug.

And did this machine expose the problem at least once previously? Apparently not all systems having e1000e hardware are being hit by the issue, either only specific product IDs are affected, or it might be chipset-dependent, etc.

Also, please do not forget to back up contents of your EEPROM before you start playing with this :)

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

(In reply to comment #49 from Karsten Keil)
> I could write the MAC address with ethtool but now the driver do not load
> completely insmod hangs for about a minute and then it disable the IRQ.
> After this here is no eth1 this are the dmesg:
> e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
> e1000e: Copyright (c) 1999-2007 Intel Corporation.
> ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20
> PCI: Setting latency timer of device 0000:00:19.0 to 64

at this point (without trying to activate the device) does ethtool -e still
work? I would assume not.

> ACPI: PCI interrupt for device 0000:00:19.0 disabled

I looked at your ethregs dump (thank you!!!) and in the EECD register, bit 8 is
not set, indicating the valid bits in the eeprom are not set.
bit 9 is set indicating the hardware tried to read the eeprom.
bit 22 is only valid if bit 8 and 9 is set, but it would indicate which of the
two eeprom banks had a valid signature.

I'm curious if the other bank on the eeprom might still be okay. I'll have to
figure out tomorrow if we can switch to the other bank. I may be able to get
you some internal tools since this is an intel board, I'll have to see what is
available.

BTW this is the first desktop machine I've heard of that reported the problem.

Revision history for this message
In , Stefan-seyfried (stefan-seyfried) wrote :

(In reply to comment #58 from Jiri Kosina)
> And did this machine expose the problem at least once previously? Apparently
> not all systems having e1000e hardware are being hit by the issue, either only
> specific product IDs are affected, or it might be chipset-dependent, etc.

I am not sure. See comment #32. It might of course also just have been a broken joint on the mainboard.

> Also, please do not forget to back up contents of your EEPROM before you start
> playing with this :)

If it hits me the same as last time, this won't help :) (and yes, i backed it up)

(In reply to comment #59 from Jesse Brandeburg)
> BTW this is the first desktop machine I've heard of that reported the problem.
Regarding the desktop: it also has intel integrated graphics, and it had recurring problems with the graphics driver (lockups) before the ethernet broke. Maybe that's one common factor.

Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #52 from Jiri Kosina)
> Intel has just posted patches to lkml [1] [2] [3] that mark the memory mapped
> EEPROM region as read-only. Therefore if the EEPROM is garbled by any bug in
> kernel code, after these patches are applied, the EEPROM would no longer be
> overwritten, and stack trace would be dumped instead, which will hopefully
> point to the code that is corrupting the memory.
>

This is indeed a valuable test.

> If, however, userspace is corrupting the memory region (most probably X.Org),
> then this protection is rendered useless, but it still is worth trying so that
> we can potentially rule out either userspace or kernelspace code completely.

Not necessarily. If X overwrites this memory from user space, yes.
However if it is overwritten from kernel space (by DRM - either
from the Xserver or from a DRM client) we will be able to catch it.

For now I would rule out user space. From user space the Xserver
cannot access memory unless it is explicitly mapped. You can find
out the mapped memory ranges from /proc/<pid>/maps. It would be
instructive to know which ranges show up there on the affected
machines and compare them to an lspci -v output.
I may have missed this, but I have not seen any analysis which
access method is used on the affected systems to write to the
EEPROM.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

(In reply to comment #59 from Jesse Brandeburg)
Yes ethtool does not work any more.
What I saw while I was programming the MAC was, that a other word in th EEProm
changed as well (I assume the checksum) and one other byte also get some other
value (maybe BF, but I'm not sure) it was in an other line of the ethtool -e dump (2. or 3. line) between MAC (first 6 bytes) and the checksum.

Revision history for this message
Kabelsalat (kabelsalat) wrote :

Nothing happend to me either. Probably it depends on the system bios / firmware. My Vaio Z does not have an ordinary BIOS, but a new (U)EFI firmware with bios emulation. Probably the emulation denies access (to some parts) of the NVRAM!? Just to make sure, blacklisted and removed e1000e.

Please note: I'm not using Ubuntu Intrepid, but only it's 2.6.27-3 kernel package and the new xorg packages. Ethernet was used only once for traffic. Driver was loaded every system start and card was recognized. The only time I realy used the ethernet port, I had to reload the driver manually as network cable wasn't recognized. After having reloaded the driver everything worked fine.

Revision history for this message
Kabelsalat (kabelsalat) wrote :

I'm using X4500 Intel graphics, too.

Revision history for this message
Daniel Kulesz (kuleszdl) wrote :

Jojo: Your T61p is ICH8 based, right? Is the following ethernet card built into all ICH8-based machines? (just run lspci):

00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)

Revision history for this message
Peter Frühberger (peter-fruehberger) wrote :

I own an X61s, with an Intel Graphics Card and the following Network device.
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)

I used the e1000e module sind alpha5 and with all new kernels till yesterday. I did not get in any problem with this driver - lucky me.

I booted another OS from Redmond and it still worked. So I can be lucky, i think. I did save the eeprom stuff with the ethtool mentioned above - just in case.

Do we exactly know what hardware is "affected"? Or what circumstances lead to this insult?

Just out of the blue:
Did someone use the "onboard" sd card reader? - This is the only component I did not use the last month on my X61s. In my Understanding it could be caused by nearly everything making ungood stuff in the kernel (open pointers, etc. etc.)

Someone mentioned a garbled display. I did have this yesterday after booting ubuntu when I visited the Redmond OS before.

Any Hints, what I could test (without damaging my hardware?) - or is my hardware not affected at all?

Thx
Peter

Revision history for this message
Jojo (kuzniarpawel) wrote :

my t61p has

00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)

I work on
2.6.27-4-generic i686

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #63 from Egbert Eich)
> For now I would rule out user space. From user space the Xserver
> cannot access memory unless it is explicitly mapped. You can find
> out the mapped memory ranges from /proc/<pid>/maps. It would be
> instructive to know which ranges show up there on the affected
> machines and compare them to an lspci -v output.

It could be some temporary mapping that goes away after a while, so that it doesn't show in /proc/<pid>/maps permanently, but yes, this of course can be tried.

Could please someone, who has access to affected hardware, provide output of

       cat /proc/`pidof Xorg`/maps
       lspci -v

commands, so that we can see if there is possibly some lethal overlap?

Revision history for this message
Fahim Abdun-Nur (fahim-a) wrote :

Came in to work this morning and updated my intrepid system, noticed a new kernel and had to update. When the system came back up, eth0 disappeared. Now I have no network connection at all. :( Did an lspci, and I have an Intel 82566DM-2 Gigabit Network Connection (rev 02)

I'm guessing (and I'll admit I'm a *nix novice) that this is due to the e1000e driver being disabled and eth0 never being loaded. An ifconfig returns nothing other than lo. Booting into Windows XP seems a-OK and I have full network access (haha, haven't booted into XP in quite a while and was prompted to update a gazillion Win-updates...anyways, I digress).

This was marked "fixed" (?) last night; seems to have caused me more trouble. Any ideas/recommendations? If and when a fix is released, I guess I'll have to copy over and install the debs on a usb stick. Oh well, it is alpha after all.

================================================
linux (2.6.27-4.6) intrepid; urgency=low

  [ Tim Gardner ]

  * Disable e1000e until the NVRAM corruption problem is found.
    - LP: #263555
================================================

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :
Download full text (3.2 KiB)

Also, the locking in e1000e seems indeed to be dodgy, Thomas Gleixner reported this to be spotted by lockdep on his system with 2.6.27-rc7:

e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
e1000e: Copyright (c) 1999-2008 Intel Corporation.
e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
e1000e 0000:00:19.0: setting latency timer to 64
0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:15:58:84:9f:94
0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
0000:00:19.0: eth0: MAC: 4, PHY: 6, PBA No: ffffff-0ff
------------[ cut here ]------------
WARNING: at /home/tglx/work/kernel/git/linux-2.6/kernel/mutex.c:135 mutex_lock_nested+0x5c/0x26d()
Modules linked in: e1000e i915 drm ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp bnep rfcomm l2cap bluetooth autofs4 coretemp fuse sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi e1000 cpufreq_ondemand acpi_cpufreq ext2 dm_mirror dm_log dm_multipath dm_mod ipv6 kvm_intel kvm snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq arc4 snd_seq_device snd_pcm_oss ecb snd_mixer_oss snd_pcm video crypto_blkcipher snd_timer snd_page_alloc iwlagn i2c_i801 i2c_core firewire_ohci iwlcore mac80211 snd_hwdep firewire_core crc_itu_t iTCO_wdt iTCO_vendor_support rtc_cmos snd soundcore output ac battery pcspkr cfg80211 sr_mod thinkpad_acpi rfkill cdrom sg hwmon button joydev ata_piix ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
Pid: 3484, comm: ip Not tainted 2.6.27-rc7-00006-gcec5eb7-dirty #89

Call Trace:
 <IRQ> [<ffffffff8103654d>] warn_on_slowpath+0x51/0x77
 [<ffffffff810572e9>] __lock_acquire+0x6ad/0x716
 [<ffffffff8129c285>] mutex_lock_nested+0x5c/0x26d
 [<ffffffffa04f85c2>] e1000_acquire_swflag_ich8lan+0x59/0x74 [e1000e]
 [<ffffffffa04fd753>] e1000e_read_kmrn_reg+0x18/0x62 [e1000e]
 [<ffffffffa04f8606>] e1000e_gig_downshift_workaround_ich8lan+0x29/0x71 [e1000e]
 [<ffffffffa0503e07>] e1000_intr_msi+0x46/0xec [e1000e]
 [<ffffffff81076fa5>] handle_IRQ_event+0x1e/0x51
 [<ffffffff81078295>] handle_edge_irq+0xe8/0x12b
 [<ffffffffa04fb312>] e1000e_update_mc_addr_list_generic+0x0/0x18e [e1000e]
 [<ffffffff8100ea88>] do_IRQ+0x6c/0xd4
 [<ffffffff8100c556>] ret_from_intr+0x0/0xf
 <EOI> [<ffffffffa04fb312>] e1000e_update_mc_addr_list_generic+0x0/0x18e [e1000e]
 [<ffffffffa04fb38f>] e1000e_update_mc_addr_list_generic+0x7d/0x18e [e1000e]
 [<ffffffffa04fb359>] e1000e_update_mc_addr_list_generic+0x47/0x18e [e1000e]
 [<ffffffffa0500ace>] e1000_set_multi+0xe2/0x11b [e1000e]
 [<ffffffff8121a1e8>] dev_set_rx_mode+0x21/0x2d
 [<ffffffff8121d1a6>] dev_open+0x85/0x9e
 [<ffffffff8121b172>] dev_change_flags+0xa6/0x15d
 [<ffffffff81262588>] devinet_ioctl+0x242/0x58a
 [<ffffffff812109a5>] sock_ioctl+0x1d8/0x1ff
 [<ffffffff810b80a9>] vfs_ioctl+0x21/0x6b
 [<ffffffff810b834c>] do_vfs_ioctl+0x259/0x272
 [<ffffffff81055a14>] trace_hardirqs_on_caller+0xf2/0x115
 [<ffffffff810b83b6>] sys_ioctl+0x51/0x73
 [<ffffffff8100bf4b>] system_call_fastpath+0x16/0x1b

Haven't looked into the code yet to see if this cou...

Read more...

Revision history for this message
Chris Jones (cmsj) wrote :

Fahim: the driver has been disabled until a proper fix emerges. This does mean that anyone using e1000e with Intrepid is unable to do so at the moment, but this is felt to be a better workaround than continuing to place unknown numbers of machines at risk of permanently losing their ethernet chips.
One option you may have is to boot your Intrepid system with a Hardy kernel (this is what I am doing), or an older Intrepid kernel (but then you may be at risk again).

Tim: I'm not sure I agree with the status of this being "Fix Released", I would personally opt for "In Progress" - otherwise it may be excluded from searches which (IMHO) absolutely should not exclude it.

Revision history for this message
William Grant (wgrant) wrote :

Not fixed - LP closed the bug due to the reference in the changelog.

Changed in linux:
status: Fix Released → In Progress
Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #65 from Jiri Kosina)
> It could be some temporary mapping that goes away after a while, so that it
> doesn't show in /proc/<pid>/maps permanently, but yes, this of course can be
> tried.

The Xserver itself doesn't have any temporary mappings. It could be buffers
requested from DRM which (depending on the implementation) could be requested and discarded during runtime of the server.

Revision history for this message
Daniel Stoyanov (dankh) wrote :

All this is just crazy!

I'm testing Intrepid since Alpha 3, I have done testing on all Ubuntu releases at very early stage (Alpha 2) starting from 6.10. I'm not developer myself and the only way I can help improve Ubuntu is to test and submit bugs. I assume that I can loose some data and on regular basis I won't be very productive on the testing machine, I'm ok with that. BUT damage my hardware is something different, my machine costs ~ 2500 $ (ThinkPad T61p). I just upgraded the kernel this morning via synaptic, sometimes I don't even bother to check what's upgrading, because I ASSUME that Alphas can damage my data (which is backed up daily). When I submited a bug (duplicate of this one) this morning I saw the danger. I'm currently on Intrepid with the old kernel (2.6.27-3), and luckily my ethernet adapter is running well.

I'm sorry to say that but I think this will be the last time I'll run Ubuntu Alpha on my machine. I'm willing to help, but I have to be 100% sure that Ubuntu developers have very CLEAR policy on what will be pushed in the repositories for testing.

Regards,
Daniel

Revision history for this message
In , Renato-yamane (renato-yamane) wrote :

Created attachment 241376
ethtool -e eth0 > e1000e.txt

Requested by Karsten Keil (comment #51).

Hardware: Laptop Lenovo Thinkpad T61

$ uname -a
Linux mandachuva 2.6.26.2 #1 SMP Sat Aug 16 19:08:09 BRT 2008 i686 GNU/Linux

$lspci -vvv

00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)
        Subsystem: Lenovo Device 20b9
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 216
        Region 0: Memory at fe200000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fe225000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at 1840 [size=32]
        Capabilities: <access denied>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Revision history for this message
Jacob Godserv (fun2program8) wrote :

Daniel, to be honest, no one can really protect anyone of bugs until they know about the bugs to protect people against. This is the first time in a long time (perhaps ever) that this sort of report has been confirmed.

They've done the best they can to protect people by educating everyone on the mailing lists the _day_ the report comes in, and sending out an update that disables the driver. What more can they do?

Revision history for this message
Ralf Nieuwenhuijsen (ralf-nieuwenhuijsen) wrote :

@JavaJake
>What more can they do?

Well, they could:

a) Pull the images immediately, instead of entering a discussion just
how import the hardware of those who volunteer to test really is.

b) Put the warning on planet-ubuntu, instead of entering a discussion
just how import the hardware of those who volunteer to test really is.

c) Put out a security bulletin, instead of entering a discussion just
how import the hardware of those who volunteer to test really is.

d) Shout it at on IRC, instead of entering a discussion just how
import the hardware of those who volunteer to test really is.

e) Upload a blacklist-fix immediately, instead of entering a
discussion just how import the hardware of those who volunteer to test
really is.

YES THESE THINGS CAN HAPPEN.

But like with a security flaw; it's all about HOW FAST and IF YOU CARE
they respond.
The first things that people will read about this bug, after it
destroyed their hardware is that they shouldn't complain: it's alpha
software ..

WHAT KIND OF POLICY IS THIS?

I mean, the comment asking for suggestions of the text yesterday ..
seems to indicate that nobody was in a hurry ..

Colin .. a question for you... between the time you asked for
suggestions of the possible warning text and the time that an actual
warning test was put on the release notes .. ..how many people have
downloaded? You have the information .. in the very best case only 10%
or something will be affected.

Might be an interesting number to consider.

Revision history for this message
Alan (mrintegrity) wrote :

As previously mentioned:

Discussion, flames, complaints,anything not immediately useful to getting the bug fixed etc goes here:

 https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss

Anything else that *is relevant* to getting the bug fixed goes here.

Revision history for this message
Renato S. Yamane (renatoyamane) wrote :

Someone try this patchs from Jeff Kirsher (Intel)?
http://lkml.org/lkml/2008/9/23/427
http://lkml.org/lkml/2008/9/23/431
http://lkml.org/lkml/2008/9/23/432

Best regards,
Renato

Revision history for this message
In , Renato (renato-redhat-bugs) wrote :

Someone try this patchs from Jeff Kirsher (Intel)?
http://lkml.org/lkml/2008/9/23/427
http://lkml.org/lkml/2008/9/23/431
http://lkml.org/lkml/2008/9/23/432

And I think that is a good idea change priority and severity to higher, because this bug can DAMAGED a hardware.

Best regards,
Renato

Revision history for this message
Ravindran K (ravindran-k) wrote :

Hi,

I understand Intel Gigabit controller are affected. But which ones exactly? Wht PCI IDs and model?

Currently I have:

00:19.0 Ethernet controller: Intel Corporation 82566DC-2 Gigabit Network Connection (rev 02)
        Subsystem: Intel Corporation Unknown device 0001
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin A routed to IRQ 217
        Region 0: Memory at e0400000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at e0424000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at 2400 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                Address: 00000000fee0300c Data: 4182
        Capabilities: [e0] Vendor Specific Information

[OT: I was using 2.6.27 for some days ( from inteprid, though on Hardy) and thankfully nothing has gone wrong. I was using 2.6.26-5-server but it was removed and all my modules vanished. Now, I'm still on older kernel 2.6.26-5-generic (though it has been removed , but modules are working not sure till when) coz VMWare server only works on 2.6.26-5-X. ]

Just waiting for Intrepid to be released (n ofcourse some patches from vmware for vmware server) :)

Revision history for this message
In , Okir (okir) wrote :

Vladimir Botka just posted this to an internal mailing list:
--------------------------------
So, I am the one who tried (without an intention). The installation of
SLED11 Beta1 on TP T61 crashed in the moment installer was probing the
X configuration. Instead of the well known notification "Dont panic ..."
an error message appeared for a few seconds telling something about
"yast, proposal, etc ...". Then the machine restarted. The network card
stop responding.

Here is the card info:

00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network
Connection (rev 03) Subsystem: Intel Corporation Device 0000
        Flags: fast devsel, IRQ 20
        Memory at fe000000 (32-bit, non-prefetchable) [size=128K]
        Memory at fe025000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 1840 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable- Kernel modules: e1000e
---------------------------------

Revision history for this message
Hew (hew) wrote :

Ravindran, it looks like you are affected. Here's a list of affected devices taken from http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/302000834931?r=202000364931#202000364931 .

8086:1049 82566MM Gigabit Network Connection (Centrino Pro notebooks made since 2007)
8086:104a 82566DM Gigabit Network Connection (vPro desktops made since 2006)
8086:104b 82566DC Gigabit Network Connection (some desktops made since 2006)
8086:104c 82562V 10/100 Network Connection (some desktops made since 2006)
8086:104d 82566MC Gigabit Network Connection (notebooks made since 2007)
8086:105e PRO/1000 PT Dual Port Network Connection
8086:105f PRO/1000 PF Dual Port Server Adapter
8086:1060 PRO/1000 PB Dual Port Server Connection
8086:107d PRO/1000 PT Server Adapter
8086:107e PRO/1000 PF Server Adapter
8086:107f PRO/1000 PB Server Connection
8086:108b PRO/1000 PM Network Connection (82573V integrated)
8086:108c PRO/1000 PM Network Connection (82573E integrated)
8086:1096 PRO/1000 EB Network Connection with I/O Acceleration (servers since 2006)
8086:1098 PRO/1000 EB Backplane Connection with I/O Acceleration (servers since 2006)
8086:109a PRO/1000 PL Network Connection
8086:10a4 PRO/1000 PT Quad Port Server Adapter
8086:10a5 PRO/1000 PF Quad Port Server Adapter
8086:10b9 PRO/1000 PT Desktop Adapter
8086:10ba PRO/1000 EB1 Network Connection with I/O Acceleration (servers since 2007)
8086:10bb PRO/1000 EB1 Backplane Connection with I/O Acceleration (servers since 2007)
8086:10bc PRO/1000 PT Quad Port LP Server Adapter
8086:10bd 82566DM-2 Gigabit Network Connection (vPro desktops since 2007)
8086:10bf 82567LF Gigabit Network Connection (desktops since 2008)
8086:10c0 82562V-2 10/100 Network Connection
8086:10c2 82562G-2 10/100 Network Connection
8086:10c3 82562GT-2 10/100 Network Connection
8086:10c4 82562GT 10/100 Network Connection
8086:10c5 82562G 10/100 Network Connection
8086:10cb 82567V Gigabit Network Connection (desktops since 2008)
8086:10cc 82567LM-2 Gigabit Network Connection
8086:10cd 82567LF-2 Gigabit Network Connection
8086:10ce 82567V-2 Gigabit Network Connection
8086:10d5 Gigabit PT Quad Port Server ExpressModule
8086:10d9 82571EB Dual Port Gigabit Mezzanine Adapter
8086:10da 82571EB Quad Port Gigabit Mezzanine Adapter
8086:10f5 82567LM Gigabit Network Connection
8086:294c 82566DC-2 Gigabit Network Connection (desktops since 2007)

Revision history for this message
In , Okir (okir) wrote :

Given that this is increasingly looking like it's closely related to video,
can people with affected machines please post the lspci output for their gfx
chip?

Revision history for this message
In , Renato-yamane (renato-yamane) wrote :

Olaf, I really don't fell confortable to test Kernel 2.6.27-rc if it can damaged my ethernet device, so I don't know if my hardware is affected, but my video card is:

01:00.0 VGA compatible controller: nVidia Corporation Quadro NVS 140M (rev a1)
        Subsystem: Lenovo Device 20d8
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at d4000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at 2000 [size=128]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia, nvidiafb

And my ethernet device is listed in Comment #68

Best regards,
Renato

Revision history for this message
Chris Jones (cmsj) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Hew McLachlan wrote:
> Ravindran, it looks like you are affected. Here's a list of affected
> devices taken from
> http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/302000834931?r=202000364931#202000364931

That looks kinda like it's just a list of PCI IDs claimed by the driver.

AFAIK there is no indication thus far that things like server parts are
affected, but it's likely that those systems would be less likely to
trigger the bug in the first place, so, to the best of my knowledge,
it's currently unclear exactly which systems are at risk (hence
blacklisting/removing the entire e1000e module)

Revision history for this message
Pelládi Gábor (pelladigabor) wrote :

Can this bug destroy hardware even if Intrepid alpha 6 is run as a guest on a stable Hardy? That is, does a virtual machine protect my hardware from this bug?

Revision history for this message
In , Will (will-redhat-bugs) wrote :

kernel-2.6.27-0.352.rc7.git1.fc10 (http://koji.fedoraproject.org/koji/buildinfo?buildID=64060) includes a fix for e1000 and (temporarily) disables e1000e.

This is probably sufficient for F10Beta (pending some regression testing)

Revision history for this message
In , Andy (andy-redhat-bugs) wrote :

I guess that will work, but you've now killed the wired network on quite a few hardware platforms. Pulling the patches from comment #20 would probably be better for F10Beta.

Revision history for this message
Leonardo Silva Amaral (leleobhz) wrote :

Ive created a VERY stupid script to find if the system have a NIC vulnerable to the faillure (Used the list from http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/302000834931?r=202000364931#202000364931)

Here is: http://pastebin.com/f6e9042ad

Revision history for this message
John Dong (jdong) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

On Wed, Sep 24, 2008 at 11:18 AM, Pelládi Gábor <email address hidden>wrote:

> Can this bug destroy hardware even if Intrepid alpha 6 is run as a guest
> on a stable Hardy? That is, does a virtual machine protect my hardware
> from this bug?
>

No, it cannot. It may or may not be capable of damaging the virtual
machine's virtual e1000 card though (I think I read one report of this!)

Revision history for this message
Daniel Kulesz (kuleszdl) wrote :

Pelládi: Afaik a "full" virtual machine protects you from this bug, since it only emulates a network card and does not grant direct access to the hardware. I've seen that VirtualBox can emulate a e1000 gigabit adapter, maybe you can try to currupt your virtual NVRAM if you want to.

Something else might happen with para-virtualized machines (like OpenVZ, Xen, para_virt ops etc.) where all virtual instances share one physical instance, that is your real e1000 card in this case.

Revision history for this message
Daniel Kulesz (kuleszdl) wrote :

I suggest to change the topic, provided Hew McLachlan's list is reasonable. Since those chips are also on add-on and riser cards, also systems with non-ICH8/9 chipsets are possibly affected. My suggestion:

2.6.27 e1000e driver places Intel gigE network chips (affected: ICH8 / ICH9 chipsets and possibly stand-alone cards) at risk

Revision history for this message
In , Stefan-seyfried (stefan-seyfried) wrote :

The Ethernet on this Thinkpad R400 is already toasted, after an installation attempt some time ago (Micha, do you still know when you tried it?)

00:19.0 Ethernet controller: Intel Corporation 82566DC-2 Gigabit Network Connection (rev 03)
        Subsystem: Intel Corporation Device 0000
        Flags: fast devsel, IRQ 218
        Memory at fc000000 (32-bit, non-prefetchable) [size=128K]
        Memory at fc024000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 1820 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable+
        Capabilities: [e0] PCIe advanced features <?>
        Kernel modules: e1000e

00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 20e4
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f4400000 (64-bit, non-prefetchable) [size=4M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 1800 [size=8]
        Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Count=1/1 Enable-
        Capabilities: [d0] Power Management version 3

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

On Wed, 24 Sep 2008, Leonardo Silva Amaral wrote:
> Ive created a VERY stupid script to find if the system have a NIC
> vulnerable to the faillure (Used the list from
> http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/302000834931?r=202000364931#202000364931)
>
> Here is: http://pastebin.com/f6e9042ad

Please update your script wtih the list from this thread. The first list
lists every intel adapter, which is quite a bit of overkill.

http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/283000364931

Revision history for this message
Leonardo Silva Amaral (leleobhz) wrote :

Em Wednesday 24 September 2008 13:18:53 Jesse Brandeburg escreveu:
> On Wed, 24 Sep 2008, Leonardo Silva Amaral wrote:
> > Ive created a VERY stupid script to find if the system have a NIC
> > vulnerable to the faillure (Used the list from
> > http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/30200083493
> >1?r=202000364931#202000364931)
> >
> > Here is: http://pastebin.com/f6e9042ad
>
> Please update your script wtih the list from this thread. The first list
> lists every intel adapter, which is quite a bit of overkill.
>
> http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/283000364931

OK! Updated! http://pastebin.com/f201dac8c

Trying to attach a copy to this bug.

--
Leonardo Amaral - Administrador de Sistemas Linux
Tel: 31 8432-5025 / 31 4062-7411
Cerificado LPIC-1 LPI000106747

"Eu gosto das cousas. As cousas sim!
As pessoas atrapalham. Estão em toda parte.
Multiplicam-se em excesso. As cousas são quietas.
Bastam-se. Não se metem com ninguém. E não exigem nada.
Apenas que não as tirem do lugar onde estão." - Mario Quintana

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

Is all hardware which currently know to be affected by the bug driven by i915 DRM driver? ("dmesg | grep -i drm" should be enough to learn that).

Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #71 from Renato Yamane)
> Olaf, I really don't fell confortable to test Kernel 2.6.27-rc if it can
> damaged my ethernet device, so I don't know if my hardware is affected, but my
> video card is:
>
> 01:00.0 VGA compatible controller: nVidia Corporation Quadro NVS 140M (rev a1)
> Subsystem: Lenovo Device 20d8
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-

Definitely not.
And if this system does show to be affected it would point away from the gfx driver.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

But all other have i915.

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

I would also be really interested to know whether the bug triggers when booting with 'nopat' kernel option, as that's other piece that might go wrong on the way between Xorg, graphics card, ethernet card and MMIO.

Revision history for this message
Peter Frühberger (peter-fruehberger) wrote :

Please fix your script, you are printing awk $1, this is not what you want to compare, use $3. Please do not post scripts, which the user thinks he is secure.

I am affected - and your script said not.

Peter
PS: I wrote myself a script in perl, i will NOT post this. Please Ubuntu Developers post a working script, which users are not in false safety!

You are affected! 00:19.0 0200: 8086:1049 (rev 03)

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

And so far I understand comment #71, Renatos machine is not affected yet, only on the list of potential victims of NICs with writable FLASH.

So yes upto now all affected machines are used the 915 DRM driver.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

My laptop uses the 915 DRM:
dmesg | grep -i drm
[drm] Initialized drm 1.1.0 20060810
[drm] Initialized i915 1.6.0 20060119 on minor 0

# hwinfo --gfxcard
27: PCI 02.1: 0380 Display controller
  [Created at pci.310]
  UDI: /org/freedesktop/Hal/devices/pci_8086_2a03
  Unique ID: ruGf.a6pkzICrUB2
  SysFS ID: /devices/pci0000:00/0000:00:02.1
  SysFS BusID: 0000:00:02.1
  Hardware Class: graphics card
  Model: "Intel Mobile GM965/GL960 Integrated Graphics Controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x2a03 "Mobile GM965/GL960 Integrated Graphics Controller"
  SubVendor: pci 0x17aa "Lenovo"
  SubDevice: pci 0x20b5
  Revision: 0x0c
  Memory Range: 0xf8200000-0xf82fffff (rw,non-prefetchable)
  Module Alias: "pci:v00008086d00002A03sv000017AAsd000020B5bc03sc80i00"
  Config Status: cfg=no, avail=yes, need=no, active=unknown

28: PCI 02.0: 0300 VGA compatible controller (VGA)
  [Created at pci.310]
  UDI: /org/freedesktop/Hal/devices/pci_8086_2a02
  Unique ID: _Znp.3gR64TvADaC
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel 965 GM"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x2a02 "965 GM"
  SubVendor: pci 0x17aa "Lenovo"
  SubDevice: pci 0x20b5
  Revision: 0x0c
  Memory Range: 0xf8100000-0xf81fffff (rw,non-prefetchable)
  Memory Range: 0xe0000000-0xefffffff (rw,prefetchable)
  I/O Ports: 0x1800-0x1807 (rw)
  IRQ: 16 (2124 events)
  I/O Ports: 0x3c0-0x3df (rw)
  Module Alias: "pci:v00008086d00002A02sv000017AAsd000020B5bc03sc00i00"
  Driver Info #0:
    XFree86 v4 Server Module: intel
  Driver Info #1:
    XFree86 v4 Server Module: intel
    3D Support: yes
    Extensions: dri
  Config Status: cfg=no, avail=yes, need=no, active=unknown

Primary display adapter: #28

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

> I'm sorry to say that but I think this will be the last time I'll run
> Ubuntu Alpha on my machine. I'm willing to help, but I have to be 100%
> sure that Ubuntu developers have very CLEAR policy on what will be
> pushed in the repositories for testing.

Such a policy can do nothing to address bugs like this one that are unknown
at the time the upload is pushed to the repositories. We're not about to
make empty promises that no Ubuntu alpha can ever damage your hardware - the
best we can do is to ensure that if we become aware of such a problem, we
publicize this by the same channels we used to solicit testing to begin
with.

Changed in linux:
status: Confirmed → Fix Committed
Revision history for this message
Remo (ubuntu-remo) wrote :

HELP!! Intrepid Ibex broke the wireless of my brand new Latitude e6500.. (Wireless NIC is a Intel Wifi 5100)
Someone here has an idea on how to fix it??

Ubuntu disappointed a big big Ubuntu fan with this!!
I mean.. We are talking about breaking HW not partitions or SW..

If someone can help me in anyway.. Thanks a lot!

Revision history for this message
Steve Langasek (vorlon) wrote :

Remo, this bug is about the e1000e driver, which is a Gigabit ethernet driver. Any issues you're seeing with your wireless are unrelated to this bug.

What reason do you have to believe your wireless has been broken, as opposed to simply being unsupported by Intrepid?

Revision history for this message
In , Jpallen (jpallen) wrote :

I had the problem on my T60p. It does not appear to use the 915 DRM.

hwinfo --gfxcard
11: PCI 100.0: 0300 VGA compatible controller (VGA)
  [Created at pci.318]
  UDI: /org/freedesktop/Hal/devices/pci_1002_71c4
  Unique ID: VCu0.s+GMGFf+eZ0
  Parent ID: vSkL.rxAOeWuq8i6
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
  SysFS BusID: 0000:01:00.0
  Hardware Class: graphics card
  Model: "Lenovo ThinkPad T60p"
  Vendor: pci 0x1002 "ATI Technologies Inc"
  Device: pci 0x71c4 "Mobility FireGL V5200"
  SubVendor: pci 0x17aa "Lenovo"
  SubDevice: pci 0x2007 "ThinkPad T60p"
  Memory Range: 0xd0000000-0xdfffffff (rw,prefetchable)
  I/O Ports: 0x2000-0x2fff (rw)
  Memory Range: 0xee100000-0xee10ffff (rw,non-prefetchable)
  Memory Range: 0xee120000-0xee13ffff (ro,prefetchable,disabled)
  IRQ: 11 (no events)
  I/O Ports: 0x3c0-0x3df (rw)
  Module Alias: "pci:v00001002d000071C4sv000017AAsd00002007bc03sc00i00"
  Driver Info #0:
    XFree86 v4 Server Module: radeonhd
  Config Status: cfg=no, avail=yes, need=no, active=unknown
  Attached to: #27 (PCI bridge)

Primary display adapter: #11

Revision history for this message
In , Jpallen (jpallen) wrote :

I guess I really don't know for sure if I have the same problem. I did not inspect the contents of the EEPROM using ethtool, so I really can't verify. I have subsequently flashed my BIOS to recover.

What I experienced was a blank screen upon rebooting my system after installing SLED 11 Beta - build47. I was unable to install anything after that (SLED 10, XP, etc.). I continued to get a blank screen during installation. After updating my BIOS using Lenovo's bootable BIOS CD I was able to install build48 (without loading the e1000e module).

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

That would be the first case when this would be reported to happen on non-i915 hardware.

On the other hand, this mail that came to LKML just a while ago is quite interesting too:

           http://lkml.org/lkml/2008/9/24/133

apparently some realtek ethernet device stopped working, because it has a lots of 0xff somewhere in its configuration space ....

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

I think the problem with T60p is different (it has a different lan chip 82573 with a real eeprom (not NVM based) that should not be able to be corrupted in the same manner as 82566/82567.

let's leave Jared's problem off to the side as a (possibly) new issue, maybe a new bug if it is reproducible?

Revision history for this message
Leonardo Silva Amaral (leleobhz) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Em Wednesday 24 September 2008 14:27:31 Peter Frühberger escreveu:
> Please fix your script, you are printing awk $1, this is not what you
> want to compare, use $3. Please do not post scripts, which the user
> thinks he is secure.
>
> I am affected - and your script said not.
>
> Peter
> PS: I wrote myself a script in perl, i will NOT post this. Please Ubuntu
> Developers post a working script, which users are not in false safety!
>
> You are affected! 00:19.0 0200: 8086:1049 (rev 03)

Fix released: http://pastebin.com/f3d063ed6

Sorry for the attachment

--
Leonardo Amaral - Administrador de Sistemas Linux
Tel: 31 8432-5025 / 31 4062-7411
Cerificado LPIC-1 LPI000106747

"Eu gosto das cousas. As cousas sim!
As pessoas atrapalham. Estão em toda parte.
Multiplicam-se em excesso. As cousas são quietas.
Bastam-se. Não se metem com ninguém. E não exigem nada.
Apenas que não as tirem do lugar onde estão." - Mario Quintana

Revision history for this message
Leonardo Silva Amaral (leleobhz) wrote :

Em Wednesday 24 September 2008 20:36:41 Brian Murray escreveu:
> ** Attachment removed: "DETECT_INTEL_E1000E_BUGGY.sh"
>
> http://launchpadlibrarian.net/17925250/DETECT_INTEL_E1000E_BUGGY.sh

Thanks

--
Leonardo Amaral - Administrador de Sistemas Linux
Tel: 31 8432-5025 / 31 4062-7411
Cerificado LPIC-1 LPI000106747

"Eu gosto das cousas. As cousas sim!
As pessoas atrapalham. Estão em toda parte.
Multiplicam-se em excesso. As cousas são quietas.
Bastam-se. Não se metem com ninguém. E não exigem nada.
Apenas que não as tirem do lugar onde estão." - Mario Quintana

Revision history for this message
In , Bob Mahar (bob-o-rama) wrote :

(In reply to comment #83 from Jesse Brandeburg)
> I think the problem with T60p is different (it has a different lan chip 82573
> with a real eeprom (not NVM based) that should not be able to be corrupted in
> the same manner as 82566/82567.

Oh, if it were only that simple. The T60p has the 82573L c.f. the Intel docs...

http://download.intel.com/design/network/products/LAN/manuals/316080.pdf

See section 2.3... "The 82573E/82573V/82573L supports both FLASH memory and EEPROM; however, only one device can be connected at a time (not both).

So while the 8257x's for the most part SEEPROMs, the -E,V,L suffixed part could go either way - oh joy!

Revision history for this message
Scruffynerf (scruffynerf) wrote :

This might be a red herring, however over at http://lkml.org/lkml/2008/9/24/133 is a report of the same symptoms on a RealTek ethernet system.

Revision history for this message
In , Martin-wilck-d (martin-wilck-d) wrote :

Forgive me this dumb question - if this is due to an accidental overwrite with random data (DMA), why would the EEPROM contain only FFs afterwards? Can't we infer something from that?

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #85 from Martin Wilck)
> Forgive me this dumb question - if this is due to an accidental overwrite with
> random data (DMA),

It's not DMA but MMIO. The data are not that random, it is really all 0xff.

> why would the EEPROM contain only FFs afterwards? Can't we infer something from > that?

Well, we weren't able to use this to identify source of the corruption so far. We have patches that could help to point to the guilty, but first we need reliable way to restore the EEPROM contents, otherwise the debugging is almost impossible.

Revision history for this message
In , Okir (okir) wrote :

Looking at the lspci output from the system mentioned on LKML (comment #82)
that machine seems to have an i945G graphics controller, which AFAIK is
also driven by the i915 driver. The chipset is ICH7.

Has anyone tried to resurrect AJ's laptop via a BIOS update?

Is there any way Intel can help resuscitate these e1000e NVMs - this is
really preventing us from doing further debugging.

Revision history for this message
In , Andreas Jaeger (jaegerandi) wrote :

Olaf, I did a BIOS update to the latest Lenovo BIOS. It did not help at all.

Revision history for this message
In , Okir (okir) wrote :

Jesse, I'm setting this bug as NEEDINFO to you.
The biggest roadblock right now is our inability to bring those dead
NICs back to life. Without this, we cannot proceed with testing, and
we are somewhat reluctant to try this ourselves, as it seems someone at
RedHat has bricked a laptop this way.

We tried a BIOS update on one of the affected laptops, but this didn't
help. And since we weren't aware of the problem in advance, we don't
have an ethregs dump of these.

So can you please get someone from Intel to help us with restoring the
NVM to a working condition? Thanks a lot!

Revision history for this message
Richard Kleeman (kleeman) wrote :

I am using a 2.6.26 kernel in hardy which loads the e1000e module on a Thinkpad X300. I have not seen any problems but as a precaution have blacklisted the module by hand by adding the line

blacklist e1000e

to the end of the file

 /etc/modprobe.d/blacklist

This prevents the module loading when I boot the 2.6.26 kernel but allows the e1000 module to load with the stock hardy 2.6.24 kernel.

No wired connection with the 2.6.26 kernel but hopefully no corruption of hardware either....
I use the 2.6.26 kernel because the wireless driver is less buggy.

Comments?

Revision history for this message
In , Jpallen (jpallen) wrote :

I realize that my issue might be something different (see comment #83). However, updating my BIOS seemed to bring my bricked T60p back to life.

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote : RE: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

2.6.26 as far as we know is uneffected. The problem reports all started
with 2.6.27. So in short I think you have nothing to worry about.

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
In , Okir (okir) wrote :

One more question to Intel.

There's a question whether the NVM we're talking about here is actually larger,
and is used by components other than the e1000e. If for instance the video BIOS
maps all of the NVM and, due to some bug, scribbles over parts of it that
include the e1000e's config space - is there a way to verify this?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

@Jesse - Regardless of the reasons unrelated crashes are presumed to be corrupting flash, why not modify the driver so that NV RAM is only mapped while ethtool is actually using the interface? Even if the root cause is found this time, it seems like something that could happen again.

Revision history for this message
In , Eich-m (eich-m) wrote :

Update:
Since numerous people have seen this problem during installation when the Xserver is probed I've looked into what's happening at this stage:
The Xserver is started with a standard config - only the line containing the bus id and driver name is special. The installation program then connects to the xserver to obtain the randr version and information about the available outputs. Nothing is ever drawn (except for the standard X background.
For now I would also rule out drm as it is initialized but never used (2d operations don't do use drm on this driver).
The probing scenario during installation can be reproduced on any running system with the command:
sysp -s xstuff
I'm currently condensing down the part that involves the X connection of this for better reproduction.
My goal is to narrow down where to look. The X driver is still a considerable chunk of code so it would be beneficial to reduce the possible sources of the problem.

Revision history for this message
In , Eich-m (eich-m) wrote :

(In reply to comment #91 from Olaf Kirch)
> One more question to Intel.
>
> There's a question whether the NVM we're talking about here is actually larger,
> and is used by components other than the e1000e. If for instance the video BIOS
> maps all of the NVM and, due to some bug, scribbles over parts of it that
> include the e1000e's config space - is there a way to verify this?
>

I don't think this is the case: the driver only maps the POSTed copy of the VBIOS. This is copied into RAM at POST time (to the 0xC-segment). This copy is then made read only. This copy is (should be) entirely independent of the EEPROM containing the PCI ROM.

Revision history for this message
In , Eich-m (eich-m) wrote :

Another question came up: does this happen on both 64 and 32 bit installations?

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

(In reply to comment #89 from Olaf Kirch)
> Jesse, I'm setting this bug as NEEDINFO to you.
> The biggest roadblock right now is our inability to bring those dead
> NICs back to life. Without this, we cannot proceed with testing, and
> we are somewhat reluctant to try this ourselves, as it seems someone at
> RedHat has bricked a laptop this way.

We'll get you a utility today to help with this, and at the same time we're working on a quick hack to the driver to take in an ethtool eeprom dump and push it back to the NVM. We hope to have that done and working today.

> We tried a BIOS update on one of the affected laptops, but this didn't
> help. And since we weren't aware of the problem in advance, we don't
> have an ethregs dump of these.

so it depends on whether the BIOS version has the LAN part included. Some bios versions do, and some do not. I know that in particular there were a couple versions of the bioses for the X60/T60 line that had LAN NVM updates.

> So can you please get someone from Intel to help us with restoring the
> NVM to a working condition? Thanks a lot!

We are working on it, a couple of different avenues.

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

(In reply to comment #91 from Olaf Kirch)
> There's a question whether the NVM we're talking about here is actually larger,
> and is used by components other than the e1000e. If for instance the video BIOS
> maps all of the NVM and, due to some bug, scribbles over parts of it that
> include the e1000e's config space - is there a way to verify this?

the NVM in question is a single part that the entire machine (VGA, BIOS, LAN, Manageability, AHCI, etc) all use.

I couldn't tell you how to verify if something else is mapping over the top of the LAN area of the NVM. The only reports I've heard are that the LAN NVM is corrupted. If you managed to corrupt the BIOS area, the machine wouldn't boot.

Changed in linux:
status: In Progress → Incomplete
Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

(In reply to comment #94 from Egbert Eich)
> Another question came up: does this happen on both 64 and 32 bit installations?

At this point we don't know. At least one reported I worked with was running 32 bit.

Revision history for this message
In , Stefan-seyfried (stefan-seyfried) wrote :

(In reply to comment #96 from Jesse Brandeburg)

> The only reports I've heard are that the LAN NVM is
> corrupted. If you managed to corrupt the BIOS area, the machine wouldn't boot.

Helmut Schaa has an HP 2510p that lost some of its display modes after a hard X crash on an early 2.6.27-rc kernel (it now no longer knows that it has a 1280x800 panel but thinks that it only has 1024x768, the BIOS screen is in the upper left corner instead of centered on the screen). Even though we don't know that this is the same problem, it shows that sh*t happens.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Do we have any dumps of the gfx related crashes? Comment #98 seems to indicate that the video ROM may have also become corrupted (either that or the EEPROM containing the EDID), but I don't currently have any theories about how the gfx driver could cause that...

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

> And I think that is a good idea change priority and severity to higher,
> because this bug can DAMAGED a hardware.

Nobody is changing priority and severity because those fields are meaningless. We should really remove those fields from the interface.

Revision history for this message
NekoNemo (nemeyes) wrote :

"Due to an unresolved bug in the Linux kernel included in Alpha 6, it should not be used on Intel ethernet hardware handled by the e1000e driver (Intel GigE). Doing so may render your network hardware permanently inoperable."

yeah, but i have resolved on my hp 8510w with an old image of windows, and my network card is reborn :O

Revision history for this message
Craig (candrews-integralblue) wrote :

@NekoNemo
Thank you very much for the utterly nonproductive, and frankly insulting to the people who actually work on and care about Ubuntu/Linux, comment you supplied. Congratulations on using an *alpha* copy of Ubuntu, having a problem, then switching to Windows. Once again, congratulations on the insightful comment.

Can we PLEASE keep this bug report on topic of how to *fix* or deal with the issue and not have "contributions" by people who threaten to or actually switch to ${PROPRIETARY_OS}, or other forms on non-productive whiny comments? Thanks.

Revision history for this message
NekoNemo (nemeyes) wrote :

@Craig
Sorry, if you think this...

I just wanna say that your network card is not permanently inoperable.
however, there is no need to be rude.
thanks

Revision history for this message
Tom Jaeger (thjaeger) wrote :

I just checked my syslog and interestingly, the first time I got the "e1000e: probe of 0000:00:19.0 failed with error -5" error was during wake-up from standby. The syslog doesn't indicate an X server crash (or restart) between booting the computer and suspending it. I'm not sure how useful this information is, but I thought I'd mention it just in case. Let me know if you want more information.

Revision history for this message
Vincenzo Ciancia (vincenzo-ml) wrote :

Craig: most of us never use windows but have been constrained to pay for it, so if it's able to restore a dead hardware device, at least I paid for something! :) It's better than never use the ethernet card anymore isn't it? It is better to know in this page that the hardware damage is reversible, rather than leave here a page that states "use ubuntu and you can kill your only integrated NIC forever"...

It would be interesting to discover how windows "restores" the card so that somebody could create a free software that does the same.

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

Tom Jaeger wrote:
> I just checked my syslog and interestingly, the first time I got the
> "e1000e: probe of 0000:00:19.0 failed with error -5" error was during
> wake-up from standby. The syslog doesn't indicate an X server crash
> (or restart) between booting the computer and suspending it. I'm not
> sure how useful this information is, but I thought I'd mention it
> just in case. Let me know if you want more information.

Tom your machine may need a bios update to fix your suspend issue. I'm
not sure yet if this is related to the issue we're seeing with other
users who appear to just be rebooting. It is a very interesting data
point however, so thanks for posting. If you can post if you're running
i686 or x86_64, and let us know if you have an ethtool -e dump from
after the failure (may need modified e1000e that doesn't goto or set err
= after printing failed NVM checksum)

Revision history for this message
Anil (anil-omkar) wrote :

what else is more important than fixing some bug like this ?
Fedora seems to have already done it.

Revision history for this message
nanog (sorenimpey) wrote :

I have an 82566DC-2 and I had no idea why eth0 disappeared until I found this bug report.
Please include some sort of warning if this does not get fixed soon.

Revision history for this message
Jacob Godserv (fun2program8) wrote :

Anil, the "fix" Fedora released was nothing more than disabling the e1000e driver.

Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

please see my message on lkml titled "e1000e NVM corruption issue status"

Revision history for this message
Tom Jaeger (thjaeger) wrote :

Jesse Brandeburg wrote:
> Tom your machine may need a bios update to fix your suspend issue. I'm
> not sure yet if this is related to the issue we're seeing with other
> users who appear to just be rebooting. It is a very interesting data
> point however, so thanks for posting. If you can post if you're running
> i686 or x86_64, and let us know if you have an ethtool -e dump from
> after the failure (may need modified e1000e that doesn't goto or set err
> = after printing failed NVM checksum)
>
There actually wasn't any issue with suspend, it's just that the
corruption happened between boot and suspend (which caused the driver to
be reloaded, I guess) -- without any X crash or reboot. I have seen a
lot of random X crashes lately though, and also a few instances of the
system hanging and NumLock blinking (are these kernel panics?). I'm
attaching the relevant part of /var/log/messages. I did do a bios
update a few days ago, though. I'm running i686 (on a thinkpad x61t).
The dump isn't very interesting, it's all 0xff. I'll also attach a
modified kernel module for the latest intrepid kernel, in case somebody
else is in a similar situation. Would it make sense to write some
random junk in the EEPROM to see if the issue appears again? I don't
need a network adapter at the moment.

Revision history for this message
Tom Jaeger (thjaeger) wrote :

> Would it make sense to write some
> random junk in the EEPROM to see if the issue appears again? I don't
> need a network adapter at the moment.

Bad idea. Don't try this. Now my ethernet controller isn't even listed in lspci anymore and the driver won't see it. Damn!

Revision history for this message
In , Warren (warren-redhat-bugs) wrote :

http://lkml.org/lkml/2008/9/25/510
This appears to be the post Jesse is referring to.

Revision history for this message
sam tygier (samtygier) wrote :

Are there any CD images which have the black listing in place? It can take a few days before a kernel propagates to the cd

the latest daily live seems to be 20080923. i guess this is still venerable.

Revision history for this message
In , Renato-yamane (renato-yamane) wrote :

About Comment #86:
> but first we need reliable way to restore the EEPROM contents, otherwise the
> debugging is almost impossible.

A strange comment in Ubuntu bug Report that, maybe, can help:
"...I have resolved on my hp 8510w with an old image of windows, and my network card is reborn..."
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/75

Anyone have dual-boot (Windows) and can try this?

Best regards,
Renato

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

The Windows drivers do not restore NVM images. So I don't think this report was seeing the same issue. If the NVM is really corrupted, loading the Windows driver is not going fix it. The Windows driver does not calculate and check the checksum so the device could be using what ever is in the corrupted NVM and running with those settings. Much like in some case on this bug if you comment out the checksum check it works for some people (probably with some random MAC address).

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Jesse, John we have one case, where the NVM is not completely destroyed. it seems only the
NVM valid bit is not longer set and it shows a checksum error.
The Lenovo T61 did work until a attempt to install Beta1, a network install.
During yasts Xserver setup, it reboots and after this it does not longer load
e1000e because of the checksum error. I will attach ethtool -e and ethregs from
this machine. I did not set the NVM valid bit up to now, so the NIC is still in this state.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Created attachment 242026
T61 ethtool -e dump

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Created attachment 242027
T61 ethregs output

Revision history for this message
Shwan (shwan-ciyako) wrote :

For you who have not functioning network card now, try start with another kernel (older) min works booting by older one ( I have a lenovo X61 tablet, and for you, who have a wiped eeprom , I guess it is possible to get a EEPROM-dump from another computer with same lan card and edit its MAC adress and flash it to the device, but how? you just need to wait, some body will write a program for that!

Revision history for this message
nanog (sorenimpey) wrote :

Shwan, Good idea. However, I cannot find 2.6.26-5 on archive.ubuntu.com. Does anyone have a link to the source, image, headers, restricted modules for this kernel?

(Hardy's 2.6.24 kernel works somewhat but is not an ideal temporary solution.)

Revision history for this message
user (amani-julian) wrote :

I used alpha 6 with this Kernel (64bit version) on a ThinkPad R61. I installed Ubuntu intrepid on its release date and used it until today. In that period of time I used the (wired) ethernet device two times (checking mails and surfing -> not much traffic). When I read this bugreport I immediately started (the never used) Windows Vista. The ethernet device works fine! Either 64bit version of the kernel doesn't harm anything or my ThinkPad is not be affected. Because some other ThinkPad users reported no problems in this bugreport I could imagine that the last reason is the more probable one.
Should I try to avoid booting intrepid until this bug is fixed?

Thank you for working on that Problem! Good luck!

Revision history for this message
nanog (sorenimpey) wrote :

For those who are stuck with a box without eth0 the rt kernel appears to be a good temporary work around.

~$ dpkg -l | grep 2.6.26
ii linux-headers-2.6.26-1-rt 2.6.26-1.15 Linux kernel headers for version 2.6.26 on Ingo Molnar'
ii linux-headers-rt 2.6.26.1.5 Rt Linux kernel headers
ii linux-image-2.6.26-1-rt 2.6.26-1.15 Linux kernel image for version 2.6.26 on Ingo Molnar's

Revision history for this message
Larry Hastings (larry-hastings) wrote :

In case this helps: I've been using Intrepid for about a month now and my wired NIC is fine; I tested it by booting into Windows. It only stopped working this week when I got the module blacklist. The machine is a Thinkpad T61p with NVidia graphics. Before the blacklisting I used the wired network all the time, so if this bug was bound to happen sooner or later I'd guess it simply doesn't affect me.

Revision history for this message
Stefan Skotte (screemo) wrote :

I also have a thinkpad T61p which it hit by the same bug, but booting other operating systems (and older linux kernels) makes the wired ethernet work just fine. So it doesn't seem permanently corrupted like many bugreports suggests.

Shwan (shwan-ciyako)
description: updated
Revision history for this message
ka (kandresen2000) wrote :

I have a Acer AM5620-E1204A (Quad core desktop) which got the Intel e1000e card (Intel Corportation 82566DC-2 Gigabit Network Connection (rev 02))

I had no problem running the supported kernels 2.6.27-2 or 2.6.27-3 but today I upgraded to the latest 2.6.27-4, and with this version my network cannot be detected.

My current only workarround is to boot using 2.6.27-3 (Linux version 2.6.27-3-generic (buildd@crested) (gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu6) ) #1 SMP Wed Sep 10 16:18:52 UTC 2008 (Ubuntu 2.6.27-3.4-generic))

Revision history for this message
Hew (hew) wrote :

If your card cannot be detected as of 2.6.27-4.6, please stop commenting as it is creating unnecessary bug mail. Since the e1000e was at risk of being damaged, it has been intentionally disabled to prevent hardware damage while a fix is developed. Please comment only if you have new information to add to the report. Thanks.

Revision history for this message
Rich (rincebrain) wrote :

Hey world,
Are we still having trouble finding a machine to reproduce this bug on? I have a T61p with an 82566MM (8086:1049 (rev 03)) that I'd be willing to volunteer for testing purposes, if someone wants it. :)

Revision history for this message
In , Luis (luis-redhat-bugs-1) wrote :

Another message from Jesse Brandeburg in LKML isd a list of the patches being used to debug the issue and under test as possible fixes to the issue:

  http://lkml.org/lkml/2008/9/25/515

Changed in linux:
status: Fix Committed → Confirmed
Revision history for this message
Gert van Dijk (gertvdijk) wrote :

@Rich: I have the same laptop, with the same card and I've run the september 17th daily build of the Intrepid live CD at the time (not knowing this bug was present), but my ethernet card is still working. I see more reports (see related bug trackers on top of page) of T61p users which are not affected somehow (EEPROM reset on reboot or something).

Revision history for this message
Anil (anil-omkar) wrote :

T61 users saw "Starting up..." message shown for long time (just after pressing enter at grub menu) ?
May be thats when EEPROM was getting into RESET state ?

Revision history for this message
Tim Gardner (timg-tpi) wrote :

UBUNTU: SAUCE: e1000e: Map NV RAM dynamically only when needed.

I'm going to go with this until upstream converges on a solution that I'm happy with. One point of contention with upstream is that Ubuntu is using the e1000e driver from Intel's SourceForge project, when we ought to be using the in-kernel version (since that is where any permanent fixes for this issue will go).

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Matt Zimmerman (mdz) wrote :

An update on the current status of this bug:

This issue has never affected a released version of Ubuntu, only an alpha milestone of Ubuntu 8.10 (Intrepid).

The problem was worked around in kernel version 2.6.27-4.6 in Intrepid by temporarily disabling the e1000e driver (the e1000 driver is still provided). There is no risk of hardware damage with the current Intrepid packages.

A more complete fix is still in progress upstream, though Tim is preparing an interim fix which will allow the e1000e driver to be restored (see comment #98).

Revision history for this message
Matt Zimmerman (mdz) wrote :

Removing beta milestone since the workaround is sufficient for beta

Changed in linux:
milestone: ubuntu-8.10-beta → none
Revision history for this message
ThyMythos (thymythos) wrote :

For more information also have a look at: http://article.gmane.org/gmane.linux.kernel/738578

According to http://article.gmane.org/gmane.linux.kernel/738618 it is definitly a X server problem.
Are we really sure, deactivating the e1000e removes the problem?

Revision history for this message
Chris Jones (cmsj) wrote :

ThyMythos: I don't think that post proves "it is definitely a X server problem" - it's a strong suspicion among the people investigating this, but as yet there is no solid evidence of what is causing it and allowing it to happen.

Removing the e1000e module will mean that the EEPROM is not mapped into memory, which would make it really quite hard to break (and indeed, the laptop which broke within 24 hours of running Intrepid a month ago, has been back with me for over a week now and running without e1000e.ko has been absolutely fine).

Steve Langasek (vorlon)
Changed in linux:
milestone: none → ubuntu-8.10
Revision history for this message
Rich (rincebrain) wrote :

@gertvdijk: I'm curious to see what works out in terms of what's going on, but from what I've read, it looks like A) it's not a consistent problem with easy reproducibility, and B) isn't Sept 17th after the blacklist of the affected driver went in?

In any event, I'm glad there's a story for resolving this. :)

Revision history for this message
Pumalite (pumalite) wrote :

Same thing happened to me on an Intel D975XBX with a Gigabyte Ethernet. Fortunately just disabled it. I'll wait untill December. See what happens. In the meantime I have Sabayon there.

Revision history for this message
Gert van Dijk (gertvdijk) wrote :

@Rich:
No, it wasn't disabled the 17th (afaik). It was disabled in 2.6.27-4.6 which was published 2008-09-24:
https://launchpad.net/ubuntu/intrepid/+source/linux/2.6.27-4.6
True, it's not a very consistent problem. My guess is that some BIOS routine in some(!) ThinkPads (and maybe others) is automatically recovering the EEPROM at boot/POST time or that this bug does not appear to show up on some specific hardware or configuration we haven't identified yet. All speculations.

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Guys, I have an HP 8510w which is experiencing some interesting behaviour. I believe I may have had a graphics corruption first, though I don't recall if the problems started directly afterward. I'm definately running the e1000e driver, the machine has an NVIDIA Quadro FX570M (Mobile Version). The first thing I noticed was the Intel Boot agent in the BIOS reports the following;

Initializing Intel (R) Boot Agent GE v1.2.45
PXE-E05: The LAN adapter's confirguration is corrupted or has not been initialized. The Boot Agent cannot continue.

Then the eth0 device would no longer work. I found a link which I've posted at the end of this which talked about some work arounds etc using free dos and resetting the Intel Boot Agent using an Intel Program called IBAUTIL. I was at this point able to use the NIC while using windows, I was not able to use it using Linux, Linux would complain with a standard message in Yast that the card was corrupted and that therefore the module was not loaded.

I ran the procedure outlined using IBAUTIL and voila my linux ethernet worked again. However, upon booting up a day after it is back to being dead.

If this is indeed the same situation, this may be all we need to get info out of the card. Also, I may potentially have access to more of these machines that ARE going if that helps.

The windows OS will now not get an IP address either, which I assume isn't just about the address and rather about hardware failure. Event Viewer shows nothing as usual, where's the Windows DMESG!!!! Windows was working fine all day though.

I shall try this procedure again, but I expect I am now out of luck :(

If someone wants me to post some kind of image from a going one of these machines it might be possible, but I'll need to do it from an older version of Linux I expect :)

http://dance.richii.com/article238.html

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

I am now definately in the same boat as everyone else, I don't even have lights on on my NIC at the hardware level and the driver has been auto removed from windows! The worst part is the wireless doesn't work on Linux in Beta 1 for me so no network in linux at all! Now where is that old cisco wireless card......

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #105 from Quentin Jackson)
> Guys, I have an HP 8510w which is experiencing some interesting behaviour. I
> believe I may have had a graphics corruption first, though I don't recall if
> the problems started directly afterward. I'm definately running the e1000e
> driver, the machine has an NVIDIA Quadro FX570M (Mobile Version). The first
> thing I noticed was the Intel Boot agent in the BIOS reports the following;
>
> Initializing Intel (R) Boot Agent GE v1.2.45
> PXE-E05: The LAN adapter's confirguration is corrupted or has not been
> initialized. The Boot Agent cannot continue.

Quentin,

could you please post a lspci output from the affected machine? If you are experiencing the problem on a system that doesn't have intel graphics chip at all, you'd be the first one whatsoever, and this would really change the direction of our debugging efforts -- currently the main suspect is intel graphics driver in X.org, which apparently couldn't be blamed in such case.
In addition to that, could you please attach your /etc/X11/xorg.conf?

Thanks.

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Created attachment 242732
LSPCI.txt

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Created attachment 242733
Xorg.conf

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Done :) I don't think the nic is showing up in LSPCI at all from what I can see. I also noticed my Firewire connector (shows up as a NIC in windows has an x through it, I really hope that's unrelated!

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #110 from Quentin Jackson)
> Done

Thanks. So apparently, you are really the first one, to my knowledge, who reports the problem on ICH chipset, but with no Intel graphics chip at all. This really seems to rule out the xorg graphics driver issue in my eyes.

Could you please boot a "Kernel Of The Day" from

          ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/

This kernel contains a load of fixes for the e1000e driver. It is unfortunately not currently able to bring your network card back to life, but it will output a EEPROM contents dump into 'dmesg' output even if the contents are corrupt. Could you please attach this output then?

This will allow us to verify whether you are really hitting the very same problem.

Thanks.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Quentin, it is very important to get the NIC NVM image for this machine with ethtool -e. You could use a old SuSE 11.0 CD for this the rescue system is enough, you can mount a USB stick and save the ethtool -e eth0 output on it.

Revision history for this message
Scruffynerf (scruffynerf) wrote :

FWIW, I apologise to all and retract my inane statement earlier in the thread at:
 https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/263555/comments/11

I'll plead a very bad month.

Revision history for this message
vjohn (vnibs119) wrote :

Hi! I have a DESKTOP with motherboard Intel DG965WH with a onboard ethernet card 82566DC: 00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)

I have installed, unfortunately, the ubuntu kernel 2.6.27-3-generic in my Debian Etch and the onboard network stop working...

It apper in lspci command (like you can see above), but onload the kernel module nothing happen and can't activate the interface...

I will try update the bios to solve this problem and will post the results so fast, but I stay too angry that something like this happen. So, the Intel haven't opened the code of your hardware? So, how it's happen?

Thanks
Vinicius

Revision history for this message
In , S-puch (s-puch) wrote :

Hi Guys until now I'm not affected by this Bug although (according to Jesse Brandeburg) I would be a very hot candidate.
As this Bug seems not be related to SuSe Linux (mostly I'm using Mandriva) but SUSE Labs seemed for me very active on LKML to get this problem fixed, I subscribed to this Bugtracking system, too.

I would like to offer my help if desired, because I've got a Lenovo T61 as in Comment #102 and I have got a graphic adapter from NVIDIA (NVIDIA Quadro 140M) which should use the same driver as the HP 8510w from Comment #107.

I've got a backup of my working NIC NVM so if it would help I could post it here. As I need my laptop for daily business work I can only do further testings if there is a valid method to get a broken NIC back to work. I know that some guys of Intel are working on a tool doing that but I don't know if it is released yet.

Revision history for this message
[_SHIN_] (cruiser-infinito) wrote :

I have a desktop too, MBoard Intel DP35DP with:

00:19.0 Ethernet controller: Intel Corporation 82566DC-2 Gigabit Network Connection (rev 02)

After installing Intrepid 8.10, the version without the blacklisted drivers, I saw no network. The device was unlisted.
On the same machine I have Ubuntu Gutsy (latest and updated), but after the installation of Ibex no problem of connectivity in Gutsy.
If you are interested in: on the same machine (again) I have a third OS: WindowsXP. Here I had to reinstall the network card, like it was a new one. After that, no problem detected. Maybe that OS recognize the hardware by an ID (or something like this) stored in the MB and detected a change. And this mean something changed on the hardware.

I've decided to leave the Ibex OS unused until the patch or the final release. It's the first time I install a beta version, now I know I must be careful.

If you want more detail just ask.
Bye.

Revision history for this message
vjohn (vnibs119) wrote :

I do a bios update from Intel, but can't activate the ethernet yet.
On load module (e1000e), show this message: "The NVM Checksum Is Not Valid"
Any help?
Thanks
Vinicius

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

OK, I'll have to do the kernel of the day tonight when I'm at home, but I should be able to use the ethtool dump today, hunting down a laptop now :)

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Created attachment 242921
ethtool dump from HP 8510w

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Please advise, if this suffices. Sounds like you've been looking for it for a while. Theoretically I have one of these machines to play with whenever needed, both dead and not dead.

Revision history for this message
Anil (anil-omkar) wrote :

when will we get a kernel with this fix ?

Revision history for this message
Michael Chang (thenewme91) wrote :

The current "fix" is to disable the driver to prevent damage to your hardware.

Various developers are still working on finding the actual cause of the problem, please be patient. You are recommended to use the latest stable LTS (8.04) for anything mission-critical.

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Created attachment 242983
DMESG output after latest kernel of the day

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

The Kernel upgrade complained that I was upgrading over a newer version, I forced it as it was dated October. But thought I should mention it incase anything doesn't come through correctly. After the kernel was loaded and rebooted one of the network card lights now comes on, I don't think it was doing this in windows and definitely not in linux. Let me know if there is anything else I can provide and let me know if this is this bug or if I need to log it somewhere else! :)

Revision history for this message
Nicorac (nicorac) wrote :

This seems to give some hope, even on de-bricking hardware...

http://lkml.org/lkml/2008/10/1/368

Is it?

Revision history for this message
In , Mmeeks-i (mmeeks-i) wrote :

I can volunteer too - I have a T60p with a:
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
and (amusingly) the socket is physically broken (by myself), so I seldom to never use it.

Revision history for this message
Lorenzo Zolfanelli (lorenzo-zolfa) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

On Wed, Oct 1, 2008 at 7:46 PM, vjohn <email address hidden> wrote:

> So, the Intel haven't opened the code of your hardware? So, how it's
> happen?
>

It's happen because it isn't a stable kernel release, and because linux is
developed by humans, and they might make mistakes. And I think the linux
testing team was to small to reveal this bug.

Revision history for this message
Ralf Nieuwenhuijsen (ralf-nieuwenhuijsen) wrote :

And because, after they learned about the problem, they failed to
communicate effectively and never removed the cd-image.

This has never happened before, and obviously there was no policy for this
situation. Which automatically means that some developpers turn into 'its
your fault'-mode and be less supportive of the community members who have
bricked their hardware. It turned into a blame game.

There were more "it's an alpha .. you knew what you were getting it to "
expressions than any compassion. Worse, appearantly its easier to type "it's
an alpha .. off course it's going to break your hardware .. and you test on
your own risk" .. 10 times .. than to put a warning up. At least: it was
said at least 10 times before the warning was up.

That speaks volumes about priority. And the image is still available and
linked to from many places that do not contain the warning.

The mistake was just an oopsie .. the aftermath, how it was being dealt
with, (and still is) .. a PR horror.

2008/10/2 Lorenzo Zolfanelli <email address hidden>

> On Wed, Oct 1, 2008 at 7:46 PM, vjohn <email address hidden> wrote:
>
>
> > So, the Intel haven't opened the code of your hardware? So, how it's
> > happen?
> >
>
> It's happen because it isn't a stable kernel release, and because linux is
> developed by humans, and they might make mistakes. And I think the linux
> testing team was to small to reveal this bug.

Apologetic.

>
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Vincenzo Ciancia (vincenzo-ml) wrote :

Indeed the alpha release should have been replaced in a hurry by a new release with the module blacklisted. What I see from "outside" is that the burocratic need to release images only at established dates is preventing a well known practice - release a fixed iso in a hurry - to happen.

Revision history for this message
Leslie Viljoen (leslieviljoen) wrote :
Revision history for this message
sam tygier (samtygier) wrote :

Some people might have been around long enough to remember the Xorg breaking dapper update in 2006. There are quite a few similarities in what happened and peoples responses.

After the incident there was a report, and changes were made to make sure it did not happen again [0,1]. Once this has settled down I am sure there will be a report with recommendations.

More angry messages here are not needed. Take discussions to the forum or the devel-discuss mailing list.

[0] http://www.markshuttleworth.com/archives/54
[1] http://err.no/personal/blog/tech/Ubuntu/2006-08-24-11-36_broken_X_in_Ubuntu.html

Changed in linux:
status: Incomplete → In Progress
Revision history for this message
vjohn (vnibs119) wrote :

Hi people, sorry by angry message, but I'm realy needing my onboard network working in gnu/linux... Yesterday I have installed m$ windows in my machine to test the onboard network card (for something it's work! hehe) and it work fine in windows... only in linux box have the "The NVM Checksum Is Not Valid"... So the network card remain working... Of course the gnu/linux is develloped by humans, but the kernel team need do more testing in new releases, because have a great responsability in thousands machines runing this great operating system!
Thanks a lot and I will wait the solution, when it appear...
Vinicius

Revision history for this message
Amon_Re (ochal) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Hi,

Quoting vjohn <email address hidden>:

> Hi people, sorry by angry message, but I'm realy needing my onboard
> network working in gnu/linux... Yesterday I have installed m$
> windows in my machine to test the onboard network card (for
> something it's work! hehe)

You definatly should *NOT* be running Alpha or Beta software on a
work-critical machine, the workarround is to install an older kernel,
see the big warning topic on the forums.
(http://ubuntuforums.org/showpost.php?p=5882185&postcount=38)

> and it work fine in windows... only in linux box have the "The NVM
> Checksum Is Not Valid"... So the network card remain working... Of
> course the gnu/linux is develloped by humans, but the kernel team
> need do more testing in new releases,

What do you think the point is of Beta & Alpha releases? To test the
code on a larger ecosystem containing alot more diverse hardware, the
whole discussion is pointless anyway, people clearly don't read
warnings or even solutions when they're staring them in the face.

> because have a great responsability in thousands machines runing
> this great operating system!
> Thanks a lot and I will wait the solution, when it appear...
> Vinicius

If my reply sounds abit bitter, it's probably because i am, there's
been alot of whining & wailing about how they were supposed to do this
& that & pull iso's etc etc etc etc ad infinitum, blah

Revision history for this message
Richard Kleeman (kleeman) wrote :

This bug is in an alpha release which is using a release candidate version of the kernel so I think your comments are too tough. Why don't you use hardy as suggested in many places in Ubuntu documentation?

My only concern about the present bug is that it discourages testers of alpha software who may not wish to see their hardware compromised. This is not good for the eventual stability of the final release.

Revision history for this message
Thomas McKay (tom-mckay1) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

What make's an alpha or beta tester's hardware less valuable than the
hardware the final release is installed on? In many cases it is the same
hardware.

While I agree many install alpha or beta releases when they shouldn't, a
large number of people are actually working to do testing, and their
complaints are valid. If I or anybody sounds like they are "whining &
wailing" it is because their hardware is valuable, and they are doing a
service to you and everybody else.

On Thu, Oct 2, 2008 at 9:06 AM, Amon_Re <email address hidden> wrote:

> Hi,
>
> Quoting vjohn <email address hidden>:
>
> > Hi people, sorry by angry message, but I'm realy needing my onboard
> > network working in gnu/linux... Yesterday I have installed m$
> > windows in my machine to test the onboard network card (for
> > something it's work! hehe)
>
> You definatly should *NOT* be running Alpha or Beta software on a
> work-critical machine, the workarround is to install an older kernel,
> see the big warning topic on the forums.
> (http://ubuntuforums.org/showpost.php?p=5882185&postcount=38)
>
>
> > and it work fine in windows... only in linux box have the "The NVM
> > Checksum Is Not Valid"... So the network card remain working... Of
> > course the gnu/linux is develloped by humans, but the kernel team
> > need do more testing in new releases,
>
> What do you think the point is of Beta & Alpha releases? To test the
> code on a larger ecosystem containing alot more diverse hardware, the
> whole discussion is pointless anyway, people clearly don't read
> warnings or even solutions when they're staring them in the face.
>
> > because have a great responsability in thousands machines runing
> > this great operating system!
> > Thanks a lot and I will wait the solution, when it appear...
> > Vinicius
>
> If my reply sounds abit bitter, it's probably because i am, there's
> been alot of whining & wailing about how they were supposed to do this
> & that & pull iso's etc etc etc etc ad infinitum, blah
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom McKay

Revision history for this message
Michael W. (hotdog003-gmail) wrote :

Hey guys,

Let's use this bug tracker to get the problem fixed. Can we please keep the discussion of whether or not to pull the CD image to the forums or mailing lists? This is a place to talk about the bug *itself* and how to get the bug *itself* fixed, not to discuss the implications of this bug.

I bet we can get this dilemma fixed MUCH faster if we stop pointing fingers, stop bickering about what to do with the CD image, and start hacking.

Why do I say this? Nicorac posted a link to the Linux Kernel Mailing list. I'll repost it here so you don't have to scroll up:
http://lkml.org/lkml/2008/10/1/368
From the page: "Currently we (Intel Ethernet) are reproducing the issue on
multiple machines in house, we are working on the issue with the
other core Linux teams here at Intel and within the community. No
resolution yet but we are much closer now.

Later we will post patches to help users who have had this
problem restore their eeprom from either a saved image from
ethtool -e or from another identical system."

This means that your hardware is NOT permanently bricked. When someone writes the Lazarus program to resurrect your NIC from the dead, everything will be OK and your computer will return to normal.

So, because your hardware is not permanently in danger, let's focus on getting the bug fixed now before 8.10 ships and clean up the aftermath later.

Revision history for this message
Martin Capitanio (capnm) wrote :

A small Button with the phrase Don't Panic on it ...

http://lkml.org/lkml/2008/10/1/368

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a7703582836f55a1cbad0e2c1c6ebbee3f9b3a7

This patch is meant to prevent all future corruptions of the
e1000e NVM (non volatile memory) after the driver is loaded. The
registers stay locked until the machine is power cycled.

----

> Does this impose any user-visible behavior change? (such as not being
> able to set up wake-on-lan, change MAC address, whatever).

no, because none of that is stored permanently in the eeprom unless you
do writes with ethtool -E. Our policy for the driver is generally don't
ever write to the eeprom. So all the normal paths (except for initial
start on preproduction hardware and ethtool -E writes) do not write to
the eeprom.

Currently the driver will let you try to commit a change but with this
patch it will never get written to NVM unless you reboot, load driver
(the first time!) with WriteProtectNVM=0 and *then* do ethtool -E.

Revision history for this message
moschops (simon-waddington) wrote :

So can we expect an Ubuntu Alpha 7 with this Intel provide eeprom
protection patch and the removal of the e1000 blacklisting? This will
allow me to continue testing Intrepid on a machine with e1000 (and it
needs at least Alpha 6 anyway because it has GMA 45000 graphics). Or
is the hope that beta will wait for this critical problem to be fixed.

Revision history for this message
lod (altoas) wrote :

I'm using intel based HP dc5800 Desktop with Ethernet card that is using the bad drivers. I've migrated to Intrepid after alpha 2 was released and unti the e1000e driver was blacklisted (i've noticed no eth0 :) ) I had no problems. Now I'm using external LAN and my dualboot Vista is working great with the internal card. No corruption with me.
If someone wants safe, use Hardy, Gutsy, Feisty or Debian..
"Here be dragons", isn't right?

Revision history for this message
Michael Chang (thenewme91) wrote :

Well, those patches were released in the last two or three days -- basically, we have to wait for upstream kernel devs who are capable/willing to reproduce the issue to test the patches, and then consider how to get that code into the Ubuntu kernel. (Directly applying patches? Go through Debian first? etc. etc.)

Another issue is that the kernel freeze is on October 16th[1]. I don't think it would be nice to ship with e1000e blacklisted, so if the resolution isn't tested/approved by then, a release manager will need to grant an exemption to push the release date back into November.

[1] https://wiki.ubuntu.com/IntrepidReleaseSchedule

Revision history for this message
sam tygier (samtygier) wrote :

The patch is already committed to ubuntu git branch of the kernel. All ubuntu packages have been frozen for a few days for the beta release. I dont think there is any chance of putting a new kernel in the beta at this stage. After the beta is out there will most likely be a kernel update with the fix/workaround. I would expect that this will be in the next few days, assuming nobody finds any problems with it.

There should also be a restore tool quite soon, for anyone who has had their card corrupted. I am not sure if any ubuntu tester were actually effected.

Revision history for this message
ddumont (ddumont) wrote :

So will the nic drivers be blacklisted in the beta? And hopefully updated later with a patch?

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

On Thu, Oct 02, 2008 at 10:53:21AM -0000, Vincenzo Ciancia wrote:
> Indeed the alpha release should have been replaced in a hurry by a new
> release with the module blacklisted. What I see from "outside" is that
> the burocratic need to release images only at established dates is
> preventing a well known practice - release a fixed iso in a hurry - to
> happen.

This is not "bureaucratic". A working milestone image can't be produced at
random, it takes approximately a week to prepare and validate a set of
images and we can only muster one of these once every two weeks (at best).
Doing a new alpha release right would have meant deferring the beta; doing
it wrong is no better than just deleting the images, since there's no
guarantee they'll work.

This does not mean that the Ubuntu developers don't care about the integrity
of testers' hardware (and I would appreciate it if the various apologists,
who are not Ubuntu developers, would stop claiming that users should expect
the possibility of hardware damage). However, the alpha milestones are
targeted at testers who are part of the development community, they're not
intended for general consumption - people using these images should be
subscribed to ubuntu-devel-announce, where notice of the problem has been
posted, so making the alpha images unavailable for download is really not
warranted and would only hinder getting necessary feedback from users who
don't have the affected hardware.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.27-4.7

---------------
linux (2.6.27-4.7) intrepid; urgency=low

  [ Ben Collins ]

  * build/abi: Add gfs1 to perm blacklist
  * build/abi: Ignored changes in gfs2 symbols

  [ Fabio M. Di Nitto ]

  * Revert "SAUCE: Export gfs2 symbols required for gfs1 kernel module"
  * ubuntu: update GFS Cluster File System

  [ Stefan Bader ]

  * SAUCE: x86: Reserve FIRST_DEVICE_VECTOR in used_vectors bitmap.
    - LP: #276334

  [ Tim Gardner ]

  * Revert "Disable e1000e until the NVRAM corruption problem is found."
  * Add atl1e and atl2 to Debian installer bits
    - LP: #273904
  * SAUCE: e1000e: Map NV RAM dynamically only when needed.
    - LP: #263555

 -- Tim Gardner <email address hidden> Fri, 26 Sep 2008 20:51:22 -0600

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Seems to have gone quiet around here :)

Can someone please explain to me what path they expect this bug to take? I am sitting with an unusable system and am wondering whether to go back to OpenSuSE 10.3 as at least I can have working wireless in that version. Unless I can get some direction I see no point in leaving Beta1 on my system as I cannot continue with bug fixing with no network access.

Revision history for this message
In , Okir (okir) wrote :

Currently, we're busy testing the patches we've put into beta2. These
are mostly patches from intel, also posted upstream on LKML

On beta1, we're able to reproduce the issue pretty reliably by simply booting
into runlevel 3, and shutdown the machine 1 minute later. The problem will
usually show up within 3-20 reboots. With beta2, we have so far run 350
reboots or more without hitting the problem.

We're currently still discussing with Intel and LKML what the cause of the
problem may be. We're chasing a number of leads, but it seems at least
one of the patches we have so far is effective in stopping the corruption
from happening.

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

That's a good update, thanks. More specifically, is someone able to advise:

a) is it possible eventually for this hardware to be repaired via some kind of software programming?

b) If so are we awaiting Intel or can this be done by my providing the ethtool dump above or something more specific?

c) If so presuming we would have a fix within, 2-4 weeks?

If not then it would make sense to get my hardware repaired and no doubt others will be interested in ETA's on this too.

Thanks.

Revision history for this message
Daniel Kutik (danielkutik) wrote :

How could I install this fix manually (not over the internet).
At the time I've got no other connection as a wired. Is there a possibility to update the system via usb stick?

Revision history for this message
Shwan (shwan-ciyako) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Please ask in the forum not here

On Fri, Oct 3, 2008 at 9:29 AM, Daniel Kutik <email address hidden> wrote:
> How could I install this fix manually (not over the internet).
> At the time I've got no other connection as a wired. Is there a possibility to update the system via usb stick?
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: In Progress
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Confirmed
> Status in "linux" source package in Gentoo Linux: Confirmed
> Status in "linux" source package in Mandriva: Confirmed
> Status in "linux" source package in Suse: In Progress
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
> This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
> http://www.blahonga.org/~art/rant.html (search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

Revision history for this message
Fred (eldmannen+launchpad) wrote :

This is very creepy that Linux can permanently damage hardware!
How can this be? Why?
How can it be prevented?
Does it really need to do these risky operations?

Similar things must never happen in the future! Completely unacceptable!
If Linux permanently damages peoples hardware, then nobody will dare to try Linux, and how can I recommend it to friends?

Revision history for this message
Mathieu Marquer (slasher-fun) wrote :

Fred, please comment only if you can provide relevant informations about this bug, and use forums if you just want to complain.

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

Hello Quentin to answer your questions:
a) Yes, I'm working on a GPL tool for that
b) You need a ethtool dump, ideal from the machine itself or from a similar
   machine (then you need to give the MAC address to the tool)
   To see if a other machine has the same device, you need the PCI IDs from
   the machine before the overwrite happens, the IDs are overwritten in most
   cases via the NVM, if the NVM got corrupt it will fallback to the generic
   IDs
c) I hope I have a verified working version early next week

Revision history for this message
Michael Losonsky (michl) wrote :

The bug is not fixed yet except to disable the driver.
I just upgraded from 8.04 to 8.10 beta using update
manager and the the card on a HP dc7800 was
disabled.

One problem with this is that once upgraded, you
have no connection to the internet to get fixes.
Had to do a fresh reinstall 8.04. I understand this
is a beta version, but it seems a beta release
should be further along.

Changed in linux:
status: Fix Released → In Progress
Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Brian Curtis (bcurtiswx) wrote :

Michael,
I don't believe the new package has been made available to download in repositories yet, (i just checked). Once the new package is there, I would test the upgrade again, for now you can use an older kernel and it should work. I am not going to change back to Fix Released in case i am mistaken, but I don't think the fix has been made available yet.

Revision history for this message
Benjamin Prosnitz (aetherane) wrote :

Is there a way for me to unblacklist this until the update? The card worked fine until I updated to the kernel with the blacklist. - Or is this really a danger even if it works?

Revision history for this message
Anil (anil-omkar) wrote :

Everyone here needs one answer. If we run 'apt' when are we going to get the 'fix' ?

Revision history for this message
vjrj (vjrj) wrote :

https://launchpad.net/ubuntu/+source/linux/2.6.27-4.7/+build/728333

Status: Failed to upload

(...)
2008-10-03 02:46:59 INFO Rejection during accept. Aborting partial accept.
2008-10-03 02:46:59 WARNING Upload was rejected:
2008-10-03 02:46:59 WARNING Unable to find source package linux/2.6.27-4.7 in intrepid
(...)

Revision history for this message
Colin Watson (cjwatson) wrote :

2.6.27-4.7 was rejected because 2.6.27-5.8 had already been uploaded. I just accepted the binaries for 2.6.27-5.8 into the archive, and they should be available within the hour; it will take a little bit longer for the 'linux' etc. metapackages to catch up with this.

Changed in linux:
status: In Progress → Fix Released
Revision history for this message
vjrj (vjrj) wrote :

Thanks Colin.

2.6.27-5-8 works for me:

00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network Connection [8086:10f5] (rev 03)

Revision history for this message
Tomasz Czapiewski (xeros) wrote :

Will You update/rebuild beta cd/dvd images once fix is released?

Sorry for negative opinion but it's ridiculous that many people who have such ethernet cards can't use this beta because of not functional network card and even can't get fixed packages by update without other Internet connection.
It should be at least mentioned as warning in release anouncement.
In my opinion any release should have been delayed because of such bugs.

I really like Kubuntu and Ubuntu distributions but it's release cycles should be more flexible to make the best product as it can be. For example I can't even boot Ubuntu Hardy Heron (LTS) shipped CDs on about 80% PCs around on which Gutsy Gibbon worked without problems and I still can't use my DVD+/-RW drive at home with latest Hardy kernel updates - I need to use Gutsy kernel.

And once more sorry for a little OT.

Keep up good work.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

> Will You update/rebuild beta cd/dvd images once fix is released?

No. The fix will be included in the subsequent daily images, in the release
candidate image, and in the final release.

> It should be at least mentioned as warning in release anouncement.

It was.

Revision history for this message
Benjamin Prosnitz (aetherane) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

You could also probably just put the kernel packages on a flash drive, and
install them manually.

On Fri, Oct 3, 2008 at 2:32 PM, Steve Langasek <<email address hidden>
> wrote:

> > Will You update/rebuild beta cd/dvd images once fix is released?
>
> No. The fix will be included in the subsequent daily images, in the
> release
> candidate image, and in the final release.
>
> > It should be at least mentioned as warning in release anouncement.
>
> It was.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

It appears that the patch to use set_memory_ro/rw changes the timings enough in our test boxes that the problem no longer occurs.

We are not currently sure why this patch fixes it, but I wanted to share our findings.

We also have a patch (will attach here soon) to restore the eeprom from an ethtool -e dump, using a sysfs interface to the driver.

Revision history for this message
In , David (david-redhat-bugs) wrote :

*** Bug 465127 has been marked as a duplicate of this bug. ***

Revision history for this message
Christian Becker (c-becker-88) wrote :

No package here, when it will arrive? It's really amazing when you have no network ...

Revision history for this message
In , Boricua (boricua-redhat-bugs) wrote :

I was just hit by this bud after doing preupgrade from F9 64bit to F10 beta 64bit. The system states "no network device available". I'm including the output I got after running dmesg and other commands (hope it helps):
[Francisco@localhost ~]$ su -
Password:
[root@localhost ~]# /sbin/ifconfig
lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:124 errors:0 dropped:0 overruns:0 frame:0
          TX packets:124 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10080 (9.8 KiB) TX bytes:10080 (9.8 KiB)

[root@localhost ~]# dmesg | grep eth
Driver 'sd' needs updating - please use bus_type methods
Driver 'sr' needs updating - please use bus_type methods
[root@localhost ~]# "dhclient eth0" //
-bash: dhclient eth0: command not found
[root@localhost ~]# dhclient eth0
Device "eth0" does not exist.
Cannot find device "eth0"
[root@localhost ~]# dhclient eth1
Device "eth1" does not exist.
Cannot find device "eth1"
[root@localhost ~]# lscpi -v|grep -i ethernet
-bash: lscpi: command not found
[root@localhost ~]# lspci -v|grep -i ethernet
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
[root@localhost ~]# ifconfig -a
lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:668 errors:0 dropped:0 overruns:0 frame:0
          TX packets:668 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:55232 (53.9 KiB) TX bytes:55232 (53.9 KiB)
[root@localhost ~]#

Revision history for this message
In , Boricua (boricua-redhat-bugs) wrote :

I was able to solve this by manual installation of the latest available kernel, 2.6.27-0.382.rc8.git4.fc10, along with the equivalent kernel-firmware. Worked immediately.

Revision history for this message
Michael Chang (thenewme91) wrote :

If you have no network whatsoever, you'll need to grab the package using another machine or OS, use a daily build CD image, or downgrade to a version where the driver was not disabled and then upgrade to the current one.

http://packages.ubuntu.com/intrepid/linux-image-2.6.27-5-generic
http://packages.ubuntu.com/intrepid/linux-image-2.6.27-5-server

(As Colin mentioned, it will take a while for linux-* packages to be updated to automatically bring in the new package.)

Revision history for this message
In , Renato (renato-redhat-bugs) wrote :
Revision history for this message
In , Renato-yamane (renato-yamane) wrote :
Revision history for this message
Renato S. Yamane (renatoyamane) wrote :
Revision history for this message
Simon Sigre (simon-sigre) wrote :

Since the Kernel upgrade i still appear to be having some troubles with the network card; im using an X200 laptop that ships with a 82566DC-2 network card;

simonsigre@penfold:~$ uname -a
Linux penfold 2.6.27-5-generic #1 SMP Fri Oct 3 00:38:23 UTC 2008 i686 GNU/Linux

simonsigre@penfold:~$ sudo lshw
       *-network UNCLAIMED
             description: Ethernet controller
             product: 82566DC-2 Gigabit Network Connection
             vendor: Intel Corporation
             physical id: 19
             bus info: pci@0000:00:19.0
             version: 03
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi cap_list
             configuration: latency=0
        *-usb:0

simonsigre@penfold:~$ sudo cat /var/log/dmesg | grep e1000
[ 2.567349] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
[ 2.567352] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 2.567394] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.567405] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.636692] e1000e 0000:00:19.0: PCI INT A disabled
[ 2.636708] e1000e: probe of 0000:00:19.0 failed with error -5

I went straight from 2.6.27.4 --> .5 could that be it?

Revision history for this message
Anil (anil-omkar) wrote :

once you have the kernel running do modprobe e1000e

On Saturday 04 October 2008 4:07:43 pm Simon Sigre wrote:
> Since the Kernel upgrade i still appear to be having some troubles with
> the network card; im using an X200 laptop that ships with a 82566DC-2
> network card;
>
> simonsigre@penfold:~$ uname -a
> Linux penfold 2.6.27-5-generic #1 SMP Fri Oct 3 00:38:23 UTC 2008 i686 GNU/Linux
>
> simonsigre@penfold:~$ sudo lshw
> *-network UNCLAIMED
> description: Ethernet controller
> product: 82566DC-2 Gigabit Network Connection
> vendor: Intel Corporation
> physical id: 19
> bus info: pci@0000:00:19.0
> version: 03
> width: 32 bits
> clock: 33MHz
> capabilities: pm msi cap_list
> configuration: latency=0
> *-usb:0
>
> simonsigre@penfold:~$ sudo cat /var/log/dmesg | grep e1000
> [ 2.567349] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
> [ 2.567352] e1000e: Copyright (c) 1999-2008 Intel Corporation.
> [ 2.567394] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> [ 2.567405] e1000e 0000:00:19.0: setting latency timer to 64
> [ 2.636692] e1000e 0000:00:19.0: PCI INT A disabled
> [ 2.636708] e1000e: probe of 0000:00:19.0 failed with error -5
>
>
> I went straight from 2.6.27.4 --> .5 could that be it?
>

Revision history for this message
In , Jkosina-d (jkosina-d) wrote :

(In reply to comment #125 from Renato Yamane)
> Fixed?
> <http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a7703582836f55a1cbad0e2c1c6ebbee3f9b3a7>
>

Yes, that's workaround that prevents the corruption of the EEPROM contents, but it doesn't fix the real problem, just prevents bad things from happening when the bug triggers.

Changed in linux:
status: In Progress → Fix Released
Revision history for this message
Arve Bersvendsen (arve-bersvendsen) wrote :

@Anil: While that re-enables the driver for the session, it's not the permanent solution.

I just upgraded from 2.6.27.4 to 2.6.27.5, but /etc/modprobe.d/blacklist-e1000e still remains on disk, and you'd need to either comment out the line blacklisting the driver, or move/delete the file to get this "fixed".

Revision history for this message
Brian Curtis (bcurtiswx) wrote :

i've finally received the -5 drivers and i still don't have an internet connection available to me. the only kernel im having any luck with is -3.

Revision history for this message
mrbean71 (m-marti) wrote :

Hi all, mi NIC won't work with any 2.6.27 kernel, now i'm writing from:

Linux marcoPC 2.6.24-21-generic #1 SMP Mon Aug 25 17:32:09 UTC 2008 i686 GNU/Linux

This is lspci result:

00:19.0 Ethernet controller: Intel Corporation 82562V-2 10/100 Network Connection (rev 02)

It seems card is recognized from 2.6.27.5 but don't work, I can make up and down with ifconfig with no result.

It prefectly work rebooting 2.6.24-21. If you need I wan't help to understand.

Revision history for this message
mrbean71 (m-marti) wrote :

Hi all, my NIC won't work with any 2.6.27 kernel, now i'm writing from:

Linux marcoPC 2.6.24-21-generic #1 SMP Mon Aug 25 17:32:09 UTC 2008 i686 GNU/Linux

This is lspci result:

00:19.0 Ethernet controller: Intel Corporation 82562V-2 10/100 Network Connection (rev 02)

It seems card is recognized from 2.6.27.5 but don't work, I can make up and down with ifconfig with no result.

It prefectly work rebooting 2.6.24-21. If you need I wan't help to understand.

Revision history for this message
Brian Curtis (bcurtiswx) wrote :

Arve's fix was good, I have blacklisted the driver e1000e as the workaround and never removed the blacklist. With the removed from /etc/modprobe.d/blacklist it works fine

Revision history for this message
Benjamin Prosnitz (aetherane) wrote :

2.6.27-5 fixed my problems, just to confirm that this works for some people.

On Sun, Oct 5, 2008 at 4:09 PM, mrbean71 <email address hidden> wrote:

> Hi all, my NIC won't work with any 2.6.27 kernel, now i'm writing from:
>
> Linux marcoPC 2.6.24-21-generic #1 SMP Mon Aug 25 17:32:09 UTC 2008 i686
> GNU/Linux
>
> This is lspci result:
>
> 00:19.0 Ethernet controller: Intel Corporation 82562V-2 10/100 Network
> Connection (rev 02)
>
> It seems card is recognized from 2.6.27.5 but don't work, I can make up
> and down with ifconfig with no result.
>
> It prefectly work rebooting 2.6.24-21. If you need I wan't help to
> understand.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Noel J. Bergman (noeljb) wrote :

Just confirming that it works fine with my T61p 6457-7WU.

$ uname -r
2.6.27-5-generic

$ lsmod | grep 100
e1000e 128040 0

lshw also shows the device properly setup, and no longer unclaimed.

Revision history for this message
Khairul Aizat Kamarudzzaman (fenris) wrote :

Mine was T61 6464 - AP3

If upgrading the 2.6.27-5 this morning ....

fenris@thinkbuntu:~$ uname -r
2.6.27-5-generic
fenris@thinkbuntu:~$ lsmod | grep 1000

it doesnt give me any result, should i remove the blacklist manually or it will be done automatically after upgrading it ?

Revision history for this message
Khairul Aizat Kamarudzzaman (fenris) wrote :

i've tried manually remove the blacklist-e1000e but it doesnt work for me at all .. any suggestion what should i do next ?

Revision history for this message
Khairul Aizat Kamarudzzaman (fenris) wrote :

sorry .. i miss to read previous comment, i done it manually now i get from dmesg

[ 25.596061] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
[ 25.596065] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 25.596142] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 25.596159] e1000e 0000:00:19.0: setting latency timer to 64
[ 25.658013] e1000e 0000:00:19.0: PCI INT A disabled
[ 25.673108] e1000e: probe of 0000:00:19.0 failed with error -5

Revision history for this message
mrbean71 (m-marti) wrote :

I don't know if open a new bug. I think e1000e have some problem here result of furter investigation.
Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
Sometime network work out of the box, sometime network work out of the box sometimes I need to:

sudo modprobe -r e1000e
sudo modprobe e1000e

now I'm writing with 27.5 kernel and removed added driver two times to make it work.

Revision history for this message
Benjamin Prosnitz (aetherane) wrote :

For those who got this working on the 2.6.27-5.4 kernel. Does the newer
2.6.27-5.5 kernel that is in the repositories now also work for you?

On Mon, Oct 6, 2008 at 7:10 AM, mrbean71 <email address hidden> wrote:

> I don't know if open a new bug. I think e1000e have some problem here
> result of furter investigation.
> Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
> Sometime network work out of the box, sometime network work out of the box
> sometimes I need to:
>
> sudo modprobe -r e1000e
> sudo modprobe e1000e
>
> now I'm writing with 27.5 kernel and removed added driver two times to
> make it work.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
William Marques (williamarques-gmail) wrote :

I confirm that the new binaries are working for me.
uname -r
2.6.27-5-generic

Thanks a lot for the great job!

Revision history for this message
desertoak (danielc-brikks) wrote :

"The fix will be included in the subsequent daily images"

Is the fix avalible now in the daily builds?: http://cdimage.ubuntu.com/daily-live/current/
if not when will it be? Or is there another link?

Revision history for this message
mrbean71 (m-marti) wrote :

Driver seems to be ok, eth0 come up automagically.
Now I think there are problems with knetwork manager: i wrote manually resolv.conf and i have to restar network from a shell to make dns work.
But probably this is another story.

Revision history for this message
Andrew Tamoney (tamoneya) wrote :

The fix was not in 20081004 but it should be in 20081005 and is definitely in 20081006. Therefore any of the ISO would have the correct kernel.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

> The fix was not in 20081004 but it should be in 20081005 and is
> definitely in 20081006. Therefore any of the ISO would have the correct
> kernel.

No, 20081007 will be the first daily image that includes this module again.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Juan Cuevas (jdcuevas) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Hi,

I confirm that it works for me. I just update the kernel to 2.6.27-5. In a
first moment it seemed to do nothing, but I just did:

sudo ifconfig eth0 down
sudo ifconfig eth0 up

And that was all, my network started to be ok.

Bye,

Juan David Cuevas Guarnizo
Investigador - Grupo GASURE
Tel: +57 4 219 8548
Bloque 19 - Facultad de Ingeniería
Universidad de Antioquia
Medellín - Colombia

"La actividad social de la gente de la universidad debe ser total y
radicalmente ajena a toda actitud de conformismos con la injusticia social,
la desigualdad económica y la opresión intelectual". - Eduardo Umaña Luna.

On Mon, Oct 6, 2008 at 07:10, mrbean71 <email address hidden> wrote:

> I don't know if open a new bug. I think e1000e have some problem here
> result of furter investigation.
> Same behaviour from 27.3 and 27.5 (in the last driver is not blacklisted).
> Sometime network work out of the box, sometime network work out of the box
> sometimes I need to:
>
> sudo modprobe -r e1000e
> sudo modprobe e1000e
>
> now I'm writing with 27.5 kernel and removed added driver two times to
> make it work.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Confirmed
> Status in "linux" source package in Gentoo Linux: Confirmed
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: In Progress
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html<http://www.blahonga.org/%7Eart/rant.html>(search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

Revision history for this message
Andrew Tamoney (tamoneya) wrote :

the updated kernel is in the 20081006 manifest:
linux-generic 2.6.27.5.5
linux-headers-2.6.27-5 2.6.27-5.8
linux-headers-2.6.27-5-generic 2.6.27-5.8
linux-headers-generic 2.6.27.5.5
linux-image-2.6.27-5-generic 2.6.27-5.8
linux-image-generic 2.6.27.5.5
linux-libc-dev 2.6.27-5.8
linux-restricted-modules-2.6.27-5-generic 2.6.27-5.7
linux-restricted-modules-common 2.6.27-5.7
linux-restricted-modules-generic 2.6.27.5.5
 That worked fine for me.

Revision history for this message
Khairul Aizat Kamarudzzaman (fenris) wrote :

any ways to fix/patches/ways for eeprom/NVM problem? after upgrade to 2.6.27-5.8, the Ethernet still wont work ?

$ dmesg | grep e1000e
[ 5.604448] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
[ 5.604456] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 5.604557] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 5.604586] e1000e 0000:00:19.0: setting latency timer to 64
[ 5.683292] e1000e 0000:00:19.0: PCI INT A disabled
[ 5.683335] e1000e: probe of 0000:00:19.0 failed with error -5

the ethernet works well if booting to windows XP ..

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

I'm hanging out to restore my ethernet card firmware. Any chance on getting that EEPROM restore application? Or if not public yet any chance of emailing it to quentin dot jackson at exclamation dot co dot nz? :)

Revision history for this message
In , Karsten-keil (karsten-keil) wrote :

The restore application does work now, I restored broken Thinkpad X61s successful. I'm now preparing a mini iso with the application and our rescue system, so you can boot from this CD and use the application in a sane environment.

Revision history for this message
Michael Fritscher (michael-fritscher) wrote :

same for me:
under Linux, the e1000e driver refuses to load:

388.961230] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 388.961249] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 388.961346] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 388.961374] e1000e 0000:00:19.0: setting latency timer to 64
[ 389.013108] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 389.023427] e1000e 0000:00:19.0: PCI INT A disabled
[ 389.023567] e1000e: probe of 0000:00:19.0 failed with error -5

(Khairul, you missed the most imported line becuase of the grep!)

But it works fine under Windows.

uname -a:
Linux michis-ibm 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

Changed in linux:
status: Confirmed → In Progress
Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

Well, I have gotten hold of and applied the recovery tool. Unfortunately it does not work for the following reason:

The device does not list in lspci or lspci -n because it is dead, therefore I cannot find the new device ID because it doesn't have one. The tool relies on this information to work. Apparently there are other tools that will get around it via some kind of BIOS update direct from intel. Thought you would all like to know.

Revision history for this message
In , Quentin Jackson (quentin-jackson) wrote :

I should have said, this is the case on my device, apparently it is not the case for all devices, you will need to check if your device is listed in lspci or not.

Changed in linux:
status: In Progress → Fix Released
Revision history for this message
Simon Sigre (simon-sigre) wrote :

If it helps i am also having this problem aswell; i have included output of lshw aswell
//
Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

[ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
[ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5

        *-network UNCLAIMED
             product: 82566DC-2 Gigabit Network Connection
           *-network
  *-network DISABLED
\\

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :
Download full text (5.5 KiB)

I have tried newest rawhide kernel and it does not help.
I have also tried attached drivers. Did not change anything. Still no ethernet. Now i did not mess aorund with no ethtool nor some intel soft.

Output of dmesg:

e1000e: Intel(R) PRO/1000 Network Driver - 0.4.1.7_nocsum-NAPI
e1000e: Copyright (c) 1999-2008 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:19.0 to 64
0000:00:19.0: : Failed to initialize MSI interrupts. Falling back to legacy interrupts.
0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:3703]
Modules linked in: e1000e(+) rfkill_input bridge bnep rfcomm l2cap vboxdrv ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi fuse sunrpc arc4 ecb crypto_blkcipher b43 ssb rfkill mac80211 cfg80211 input_polldev ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_log dm_multipath dm_mod ipv6 sr_mod cdrom pcspkr snd_hda_intel serio_raw joydev snd_seq_dummy sg snd_seq_oss snd_seq_midi_event i915 snd_seq ata_piix snd_seq_device pata_acpi snd_pcm_oss snd_mixer_oss video output ata_generic wmi battery ac drm hci_usb snd_pcm i2c_algo_bit i2c_core iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc bluetooth snd_hwdep snd soundcore ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: e1000e]
CPU 0:
Modules linked in: e1000e(+) rfkill_input bridge bnep rfcomm l2cap vboxdrv ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi fuse sunrpc arc4 ecb crypto_blkcipher b43 ssb rfkill mac80211 cfg80211 input_polldev ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_log dm_multipath dm_mod ipv6 sr_mod cdrom pcspkr snd_hda_intel serio_raw joydev snd_seq_dummy sg snd_seq_oss snd_seq_midi_event i915 snd_seq ata_piix snd_seq_device pata_acpi snd_pcm_oss snd_mixer_oss video output ata_generic wmi battery ac drm hci_usb snd_pcm i2c_algo_bit i2c_core iTCO_wdt iTCO_vendor_support snd_timer snd_page_alloc bluetooth snd_hwdep snd soundcore ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: e1000e]
Pid: 3703, comm: modprobe Not tainted 2.6.26.5-45.fc9.x86_64 #1
RIP: 0010:[<ffffffffa0649c24>] [<ffffffffa0649c24>] :e1000e:e1000_flash_cycle_ich8lan+0x34/0x60
RSP: 0018:ffff81003c0699d8 EFLAGS: 00000202
RAX: 000000000000e028 RBX: ffff81003c0699f8 RCX: 000000005351a052
RDX: 00000000000006e8 RSI: 00000000000001f4 RDI: 00000000000006c3
RBP: ...

Read more...

Revision history for this message
In , Thomas (thomas-redhat-bugs) wrote :

As far as I know the current fixes in the newest kernel only prevent this from happening to undamanged hardware. But they don't fix it, if it's already damaged.

Some people from Intel and Novell were talking about developing a tool to repair it, if you have a backup of the original eeprom contents or access to an identical system. However, I don't know if that tool is already done or where you can get it from.

Revision history for this message
In , Michal (michal-redhat-bugs) wrote :

Well, i did not backup my eeprom, my laptop is popular so i may have access to someones eeprom image to restore it. I'll just ask someone for image.

Thing is i had to disable e1000e loading (i am using drivers attached to this bug) as it constantly crashes with message i pasted above and i can not boot my kernel unless i blacklist module e1000e.

I hope guys will find way to fix it soon.

Revision history for this message
Dan Cashman (dcashman) wrote : RE: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk
Download full text (3.3 KiB)

Bounces,

I am filling in for John S while he is on vacation?

What system is this referring to?

Invoice #?

Linux? What version of Linux?

Please advise.

Thanks,

Dan Cashman
7002 S. Revere Parkway
Ste #90
Centennial, CO 80112
720.488.9800
800.381.1083
F- 720.488.9885
<email address hidden>
   www.microsel.com

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Simon Sigre
Sent: Saturday, October 11, 2008 1:15 AM
To: John B. Sobernheim
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

If it helps i am also having this problem aswell; i have included output of lshw aswell
//
Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686 GNU/Linux

[ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
[ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
[ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5

        *-network UNCLAIMED
             product: 82566DC-2 Gigabit Network Connection
           *-network
  *-network DISABLED
\\

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” source package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in “linux” source package in Fedora: Fix Released
Status in “linux” source package in Gentoo Linux: Confirmed
Status in “linux” source package in Mandriva: Fix Released
Status in “linux” source package in Suse: In Progress

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://w...

Read more...

Revision history for this message
Craig (candrews-integralblue) wrote :
Download full text (3.8 KiB)

I have been active in that launchpad bug, but I'm not sure who John S is.

This information is about Ubuntu Linux, 8.10 (which is in testing, and has
not been released). This bug has since been resolved.

~Craig

On Mon, October 13, 2008 12:56 pm, Dan Cashman wrote:
> Bounces,
>
>
> I am filling in for John S while he is on vacation?
>
>
> What system is this referring to?
>
>
> Invoice #?
>
>
> Linux? What version of Linux?
>
>
> Please advise.
>
>
> Thanks,
>
>
>
> Dan Cashman
> 7002 S. Revere Parkway
> Ste #90
> Centennial, CO 80112
> 720.488.9800
> 800.381.1083
> F- 720.488.9885
> <email address hidden>    www.microsel.com
> Â
>
>
> -----Original Message-----
> From: <email address hidden> [mailto:<email address hidden>] On Behalf Of
> Simon Sigre
> Sent: Saturday, October 11, 2008 1:15 AM
> To: John B. Sobernheim
> Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel
> ICH8and ICH9 gigE chipsets at risk
>
>
> If it helps i am also having this problem aswell; i have included output
> of lshw aswell //
> Linux penfold 2.6.27-6-generic #1 SMP Tue Oct 7 04:15:04 UTC 2008 i686
> GNU/Linux
>
>
> [ 2.319073] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
> [ 2.319076] e1000e: Copyright (c) 1999-2008 Intel Corporation.
> [ 2.319116] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) ->
> IRQ 20
> [ 2.319126] e1000e 0000:00:19.0: setting latency timer to 64
> [ 2.377161] e1000e 0000:00:19.0: PCI INT A disabled
> [ 2.377199] e1000e: probe of 0000:00:19.0 failed with error -5
>
>
> *-network UNCLAIMED
> product: 82566DC-2 Gigabit Network Connection
> *-network
> *-network DISABLED
> \\
>
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets
> at risk https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in “linux” source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in “linux” source package in Fedora: Fix Released
> Status in “linux” source package in Gentoo Linux: Confirmed
> Status in “linux” source package in Mandriva: Fix Released
> Status in “linux” source package in Suse: In Progress
>
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as
> MAC addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear
> if this is because desktop/server parts are not vulnerable, or if use
> cases simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to
> the point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too muc...

Read more...

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Michael Fritscher (michael-fritscher) wrote :

I fixed my network card in my x61t with the Vista drivers from intel.com ( http://downloadcenter.intel.com/Filter_Results.aspx?strOSs=All&strTypes=All&ProductID=2775&lang=eng&OSFullName=All%20Operating%20Systems ). Yust installed them and played with their diagnostic tools in the driver.

Both PXE and normal networking under Linux are working again :-)
I BIOS-upgrade alone did not help (actually, it was a downgrade^^)

Please write something about this solution in the error message about the wrong crc, it would help many people.

Thanks,
Michael Fritscher

Revision history for this message
Simon Sigre (simon-sigre) wrote :

Michael as im having the same issue; i might try and do the same. Quite disappointing that we have to rely on Windows to bail us out. Ive even tried compiling the Intel drivers under Linux with the same error. Perhaps a live Windows Distro like Barts PE? I dont want to have to put the V1sta disks in and blow away my Ubuntu install.

Revision history for this message
Michael Fritscher (michael-fritscher) wrote :

Try to get a Vista DVD from somebody. You can actually boot them in a sort of Live System, perhaps you can install the drivers in it (the installation does not need a restart).

Revision history for this message
zika (4zika4) wrote :

hello,

a week ago I have upgraded my Hardy to Intrepid beta on 3 machines. the oldest one is still with Intrepid but two new had to be downgraded to Hardy since network did not work.

is it safe now to upgrade them to Intrepid beta or should I wait for official release?

Revision history for this message
In , John Ronciak (john-ronciak) wrote :
Download full text (4.8 KiB)

Here is a patch which we at Intel LAD have been testing today. This looks to be a work-around and with the .28 a fix for the root cause of the problem. The problem was with ftrace which is what we bisec'd to last week. On systems that failed with minutes we have not been able to make it happen once ftrace was disabled. So I think the .28 ftrace needs to get included into SLES11.

>---------- Forwarded message ----------
>From: Steven Rostedt <email address hidden>
>Date: Wed, Oct 15, 2008 at 3:21 PM
>Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible
>memory corruption on module unload
>To: LKML <email address hidden>, <email address hidden>
>Cc: Linus Torvalds <email address hidden>, Andrew Morton
><email address hidden>, Arjan van de Ven <email address hidden>,
><email address hidden>, <email address hidden>, Thomas Gleixner
><email address hidden>, Ingo Molnar <email address hidden>
>
>
>
>While debugging the e1000e corruption bug with Intel, we discovered
>today that the dynamic ftrace code in mainline is the likely source of
>this bug.
>
>For the stable kernel we are providing the only viable fix
>patch: labeling
>CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
>We will follow up with a backport patch that contains the
>fixes. But since
>the fixes are not a one liner, the safest approach for now is to
>disable the code in question.
>
>The cause of the bug is due to the way the current code in mainline
>handles dynamic ftrace. When dynamic ftrace is turned on, it also
>turns on CONFIG_FTRACE which enables the -pg config in gcc that places
>a call to mcount at every function call. With just CONFIG_FTRACE this
>causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
>overhead by dynamically updating the mcount call sites into nops.
>
>The problem arises when we trace functions and modules are unloaded.
>The first time a function is called, it will call mcount and the mcount
>call will call ftrace_record_ip. This records the calling site and
>stores it in a preallocated hash table. Later on a daemon will
>wake up and call kstop_machine and convert any mcount callers into
>nops.
>
>The evolution of this code first tried to do this without the
>kstop_machine
>and used cmpxchg to update the callers as they were called. But I
>was informed that this is dangerous to do on SMP machines if another
>CPU is running that same code. The solution was to do this with
>kstop_machine.
>
>We still used cmpxchg to test if the code that we are modifying is
>indeed code that we expect to be before updating it - as a final
>line of defense.
>
>But on 32bit machines, ioremapped memory and modules share the same
>address space. When a module would load its code into memory
>and execute
>some code, that would register the function.
>
>On module unload, ftrace incorrectly did not zap these functions from
>its hash (this was the bug). The cmpxchg could have saved us in most
>cases (via luck) - but with ioremap-ed memory that was exactly
>the wrong
>thing to do - the results of cmpxchg on device memory are undefined.
>(and will likely result in a write)
>
>The pending .28 ftrace tree does not have this bug a...

Read more...

Revision history for this message
In , Gregkh-n (gregkh-n) wrote :

This patch is now included in our SLE11 kernel, as it is in 2.6.27.1, which is the base of our kernel tree.

So, I guess we can close this out now, thanks for all of the work everyone!

Changed in linux:
status: In Progress → Fix Released
Revision history for this message
In , John (john-redhat-bugs) wrote :
Download full text (4.6 KiB)

It looks like the root cause of this problem has been found. Included here is the work-around for it as well as the reference to the 2.6.28-rc fix for the problem.

>---------- Forwarded message ----------
>From: Steven Rostedt <email address hidden>
>Date: Wed, Oct 15, 2008 at 3:21 PM
>Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible
>memory corruption on module unload
>To: LKML <email address hidden>, <email address hidden>
>Cc: Linus Torvalds <email address hidden>, Andrew Morton
><email address hidden>, Arjan van de Ven <email address hidden>,
><email address hidden>, <email address hidden>, Thomas Gleixner
><email address hidden>, Ingo Molnar <email address hidden>
>
>
>
>While debugging the e1000e corruption bug with Intel, we discovered
>today that the dynamic ftrace code in mainline is the likely source of
>this bug.
>
>For the stable kernel we are providing the only viable fix
>patch: labeling
>CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
>We will follow up with a backport patch that contains the
>fixes. But since
>the fixes are not a one liner, the safest approach for now is to
>disable the code in question.
>
>The cause of the bug is due to the way the current code in mainline
>handles dynamic ftrace. When dynamic ftrace is turned on, it also
>turns on CONFIG_FTRACE which enables the -pg config in gcc that places
>a call to mcount at every function call. With just CONFIG_FTRACE this
>causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
>overhead by dynamically updating the mcount call sites into nops.
>
>The problem arises when we trace functions and modules are unloaded.
>The first time a function is called, it will call mcount and the mcount
>call will call ftrace_record_ip. This records the calling site and
>stores it in a preallocated hash table. Later on a daemon will
>wake up and call kstop_machine and convert any mcount callers into
>nops.
>
>The evolution of this code first tried to do this without the
>kstop_machine
>and used cmpxchg to update the callers as they were called. But I
>was informed that this is dangerous to do on SMP machines if another
>CPU is running that same code. The solution was to do this with
>kstop_machine.
>
>We still used cmpxchg to test if the code that we are modifying is
>indeed code that we expect to be before updating it - as a final
>line of defense.
>
>But on 32bit machines, ioremapped memory and modules share the same
>address space. When a module would load its code into memory
>and execute
>some code, that would register the function.
>
>On module unload, ftrace incorrectly did not zap these functions from
>its hash (this was the bug). The cmpxchg could have saved us in most
>cases (via luck) - but with ioremap-ed memory that was exactly
>the wrong
>thing to do - the results of cmpxchg on device memory are undefined.
>(and will likely result in a write)
>
>The pending .28 ftrace tree does not have this bug anymore, as
>a general push
>towards more robustness of code patching, this is done
>differently: we do not
>use cmpxchg and we do a WARN_ON and turn the tracer off if
>anything deviates
>from its expected state. Furthermo...

Read more...

Revision history for this message
Yingying Zhao (yingying-zhao) wrote :
Download full text (4.7 KiB)

This is the patch which has passed Intel's testing and with this patch that issue can't be reproduced again now. It looks to be a work-around and with the .28 a fix for the root cause of the problem.

> Date: Wed, 15 Oct 2008 18:21:44 -0400 (EDT)
> From: Steven Rostedt <email address hidden>
> To: LKML <email address hidden>, <email address hidden>
> cc: Linus Torvalds <email address hidden>,
> Andrew Morton <email address hidden>,
> Arjan van de Ven <email address hidden>, <email address hidden>,
> <email address hidden>, Thomas Gleixner <email address hidden>,
> Ingo Molnar <email address hidden>
> Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible memory
> corruption on module unload
>
>
> While debugging the e1000e corruption bug with Intel, we discovered
> today that the dynamic ftrace code in mainline is the likely source of
> this bug.
>
> For the stable kernel we are providing the only viable fix patch: labeling
> CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
> We will follow up with a backport patch that contains the fixes. But since
> the fixes are not a one liner, the safest approach for now is to
> disable the code in question.
>
> The cause of the bug is due to the way the current code in mainline
> handles dynamic ftrace. When dynamic ftrace is turned on, it also
> turns on CONFIG_FTRACE which enables the -pg config in gcc that places
> a call to mcount at every function call. With just CONFIG_FTRACE this
> causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
> overhead by dynamically updating the mcount call sites into nops.
>
> The problem arises when we trace functions and modules are unloaded.
> The first time a function is called, it will call mcount and the mcount
> call will call ftrace_record_ip. This records the calling site and
> stores it in a preallocated hash table. Later on a daemon will
> wake up and call kstop_machine and convert any mcount callers into
> nops.
>
> The evolution of this code first tried to do this without the kstop_machine
> and used cmpxchg to update the callers as they were called. But I
> was informed that this is dangerous to do on SMP machines if another
> CPU is running that same code. The solution was to do this with
> kstop_machine.
>
> We still used cmpxchg to test if the code that we are modifying is
> indeed code that we expect to be before updating it - as a final
> line of defense.
>
> But on 32bit machines, ioremapped memory and modules share the same
> address space. When a module would load its code into memory and execute
> some code, that would register the function.
>
> On module unload, ftrace incorrectly did not zap these functions from
> its hash (this was the bug). The cmpxchg could have saved us in most
> cases (via luck) - but with ioremap-ed memory that was exactly the wrong
> thing to do - the results of cmpxchg on device memory are undefined.
> (and will likely result in a write)
>
> The pending .28 ftrace tree does not have this bug anymore, as a general push
> towards more robustness of code patching, this is done differently: we do not
> use cmpxchg and we do a ...

Read more...

Revision history for this message
jagdfalke (mathias-javafalke) wrote :

Is this Problem resolved now? I think I read somewhere that there already is a patch that you guys from Ubuntu just need to integrate. Is that true? If yes why is it taking so long? (no offense, just curious)

Revision history for this message
Chris Jones (cmsj) wrote :

jagdfalke: Please see the top of the page, it's marked as "Fix Released"

Revision history for this message
zika (4zika4) wrote : Re: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

I have downloaded LiveCD for AMD 64 and for x86 64 yesterday and when
I tried AMD 64 version as a LiveSession I was not able to use network
on a computer that I use now to write this message .... :)) (now I an
writing in Hardy) so it is not yet released AFAIAC.

On 10/17/08, Chris Jones <email address hidden> wrote:
> jagdfalke: Please see the top of the page, it's marked as "Fix Released"
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Fix Released
> Status in "linux" source package in Gentoo Linux: Fix Released
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: Fix Released
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html (search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

Revision history for this message
hefeweiz3n (philschmidt) wrote :

For your Information: Launchpad is NOT a support forum, in future situations please refer to the forums. As for your Problem: The driver is fixed in the current kernel-release, but as the live-cd still ships with the old kernel, networking is of course disabled with this card. I recommend waiting for the final release or downloading the new kernel-packages onto a usb-stick or similar and installing them by hand. As of HOW to do that, please refer to the forums.

Revision history for this message
zika (4zika4) wrote :

I am very sorry that I have missused this place. I hope that You will
be able to forgive and forget. I am just an old guy ... ;)

I will wait for official release.

Thank You very much.
Once again sorry for the noise.

On 10/17/08, hefeweiz3n <email address hidden> wrote:
> For your Information: Launchpad is NOT a support forum, in future
> situations please refer to the forums. As for your Problem: The driver
> is fixed in the current kernel-release, but as the live-cd still ships
> with the old kernel, networking is of course disabled with this card. I
> recommend waiting for the final release or downloading the new kernel-
> packages onto a usb-stick or similar and installing them by hand. As of
> HOW to do that, please refer to the forums.
>
> --
> [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at
> risk
> https://bugs.launchpad.net/bugs/263555
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Fix Released
> Status in "linux" source package in Ubuntu: Fix Released
> Status in linux in Ubuntu Intrepid: Fix Released
> Status in "linux" source package in Fedora: Fix Released
> Status in "linux" source package in Gentoo Linux: Fix Released
> Status in "linux" source package in Mandriva: Fix Released
> Status in "linux" source package in Suse: Fix Released
>
> Bug description:
> In some circumstances it appears possible for the 2.6.27-rc kernels to
> corrupt the NVRAM used by some Intel network parts to store data such as MAC
> addresses.
> This is limited to the new e1000e driver, and reports have only appeared
> from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote
> Intel). The reports seem to be isolated to laptops, but it is not clear if
> this is because desktop/server parts are not vulnerable, or if use cases
> simply increase the chances of laptop users being hit.
>
> Once this corruption has occurred, recovery may be possible via a BIOS
> update, but may well require replacement of the hardware. Use of Intel's
> IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the
> point where the network part will no longer appear on the PCI bus.
>
> (this is a new description, the original one was based on too much
> guesswork. Below are the URLs originally referenced)
> (the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases,
> so if your network is not working, it doesn't have to be damaged, but just
> disabled in order to prevent any accidents until this bug is solved, don't
> wary!)
> http://www.blahonga.org/~art/rant.html (search for "em0")
> http://<email address hidden>/msg00360.html
> http://<email address hidden>/msg00398.html
>

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

So in the interests of adding some closure to this bug. The issue turns out to
have never been the e1000e driver's fault. The fault lies with the
CONFIG_DYNAMIC_FTRACE option. So specifically when the FTRACE code was
enabled, it was doing a locked cmpxchg instruction on memory that had been
previously used as __INIT code from some other module.

a) some other module loads
b) that module's init code calls into ftrace which stores the EIP
c) that module discards its init code
d) e1000e loads
e) e1000e asks the kernel for memory to ioremap onto, and gets the memory
location of the code at b) and maps the flash/NVM control registers there.
f) ftraced runs and rewrites onto bytes 4-8 of the memory location from b/e
g) since the lock/cmpxchg instruction is undefined for memory mapped registers,
random junk is written to the b/e location
h) depending on the contents of the junk in g) the NVM is either byte corrupted
or block erased, which is detected the next time the e1000e driver is loaded.

a short term workaround is in 2.6.27.1 (disable CONFIG_DYNAMIC_FTRACE) and the
longer term fix is rewrites of the cmpxchg code (which is already done and will
be in 2.6.28-rc1)

I strongly recommend that 2.6.27.1 be picked up in ubuntu immediately

Changed in linux:
status: Fix Released → Confirmed
Revision history for this message
rostedt (rostedt) wrote :

> So in the interests of adding some closure to this bug. The issue turns out to
> have never been the e1000e driver's fault.

Just to clarify. there were two bugs here. Yes the ftrace code should have been more careful in using cmpxchg, and tried harder to not write into code that might have swapped out (note, 2.6.28 has this fixed).

But the e1000e driver absolutely did have a bug. The driver should never had left open that a random write into it could brick the board. I'm actually glad that ftrace was the culprit. Because it allowed for a consistent reproducer. Just imagine if ftrace did not cause this. Any little bug in the kernel could have brick you card. And guess what? You would be out of luck, because it would be extremely hard to ever reproduce it again.

I'm not denying that ftrace did not have a bug. I just want the record to state, that ftrace was not the only one at fault here.

Amit Kucheria (amitk)
Changed in linux-lpia:
assignee: nobody → amitk
importance: Undecided → Critical
milestone: none → ubuntu-8.10
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-lpia - 2.6.27-4.9

---------------
linux-lpia (2.6.27-4.9) intrepid; urgency=low

  [ Amit Kucheria ]

  * SAUCE: Start new release Ignore: yes
  * SAUCE: Add LPIA keyword in front of all our tags
  * SAUCE: Disable DYANMIC_FTRACE
    - LP: #263555
  * SAUCE: Disable ath5k from configs
    - LP: #288148
  * SAUCE: Fix rebase script some more
  * SAUCE: Change default TCP congestion algorithm to cubic
    - LP: #278801
  * SAUCE: Enable vesafb module

 -- Amit Kucheria <email address hidden> Thu, 23 Oct 2008 20:07:26 +0000

Changed in linux-lpia:
status: Fix Committed → Fix Released
Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
John (jsobernheim) wrote : unsubscribe

unsubscribe

-John
John B. Sobernheim
Microsel Of Colorado
7002 South Revere Parkway, Suite 90
Centennial, CO 80112
http://www.microsel.com
Phn 720.488.9800 x213 or 800.381.1083
Fax 720.488.9885 Cel 720-317-7587
email:<email address hidden>

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Bug Watch Updater
Sent: Friday, October 24, 2008 8:51 AM
To: John B. Sobernheim
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8and ICH9 gigE chipsets at risk

** Changed in: linux (Gentoo Linux)
       Status: Confirmed => Fix Released

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” source package in Ubuntu: Fix Released
Status in “linux-lpia” source package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fix Released
Status in “linux” source package in Fedora: Fix Released
Status in “linux” source package in Gentoo Linux: Fix Released
Status in “linux” source package in Mandriva: Fix Released
Status in “linux” source package in Suse: Fix Released

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search for "em0")
http://<email address hidden>/msg00360.html
http://<email address hidden>/msg00398.html

Basilisk (bluebal-1)
Changed in linux:
assignee: timg-tpi → nobody
assignee: nobody → bluebal-1
William Grant (wgrant)
Changed in linux:
assignee: bluebal-1 → timg-tpi
Revision history for this message
In , Jesse (jesse-redhat-bugs) wrote :

that cpu-stuck bug was a problem in the way the e1000e driver loops to read the NVM.

part of the threads on lkml covered a fix for that issue.

Please contact me directly for assistance restoring your eeprom image if you need help.

Revision history for this message
dave graham (david-graham) wrote :

While we should expect no further reports of flash corruptions due to this bug, I would like to know of any systems which did fall foul of the bug, and have not yet had their flash restored. Pleae let me know if you have system that had proper (e1000e) LAN functionality proor to installing a 2.6.27-rc kernel, and lost it while running the rc kernel.

So as not to confuse this bug report, please contact me offline and I will try to help you restore your LAN.

david.graham_at_intel_dot_com

Revision history for this message
Pieter (diepes) wrote :

Is there a link as to how to recover ?

Revision history for this message
BeigeGenius (beigegenius) wrote :

Prior to upgrading to kernel 2.6.27-9 No kernel panics were experienced.
However kernel panics seem "random" they seem to occur during high network activity when using the Ethernet device (Intel Corporation 82573L Gigabit Ethernet Controller) which uses the e1000e driver. No reports of hardware dieing yet and hopefully this bug will be limited to kernel panics!

Revision history for this message
In , Wstephenson-9 (wstephenson-9) wrote :

Was the recovery tool ever published? I just ran into a beta user who still has a trashed e1000e.

Revision history for this message
In , Jesse Brandeburg (jesse-brandeburg) wrote :

<email address hidden> can help

Revision history for this message
In , dave graham (david-graham) wrote :

I have been dealing with a lot of these recovery requests, and have been using a tool developered by Karsten Keil. The tool reads the (probably) corrupted content, which is sent to me. I repair the image - usually a single-byte corruption, and then I return the corrected image to be written back to the NVM using the same tool.

Follows the instructions that I have been providing to the individual reports I have had....
---- Start of instructions -------
Go to:

      ftp://ftp.suse.com/pub/people/kkeil/testing/e1000e/

Copy & paste this link in to a browser window, and you should see a list of files, including one:

      e1000e_recover.iso

This is an ISO image of a CD, so save it to your local system, then burn it to a CD, and use it to boot your problem system. From finding the ISO to actually booting your system is quite a few steps - if you get stuck of course just let me know and I'll guide you through the detail, but for now I'll assume that you're still with me.

From the boot options presented by the CD, select "rescue system", as that's where we'll find the eeprom recovery tool.

When prompted for user, log on as root. There's no password, so just hit return.

1) Read the current eeprom and save it to file. Be patient !

      e1000_nvm -r -u -o ethtool.dmp

2) mount a USB disk to save the file, and send the file to me <email address hidden>

I will then fix up the image, and mail it back to you as ethtoola.dmp, and then, you can boot again to the CD, and

3) Write the new eeprom content back to your system NVM, using something like (may be different depending on the device id that is indictaed in the nvm, but I will provide any update to this step along with the fixed-up NVM image that I return)

      e1000_nvm -u -P 10498086 ethtoola.dmp

And select YES when prompted.

4) You should then be able to remove the recovery CD, and successfully boot back to a working ethernet.

---- End of instructions -------

Revision history for this message
dave graham (david-graham) wrote :

I am still contacted about once per month for instructions on how to recover ethernet functionality on systems that have had their 1Gb flash content corrupted, possibly by this defect.

If you believe that you are affected by this isssue, you can safely perform steps 1 through 5 from the bullet list below, then contact me with the result, from which I will prepare a fully repaired image, and post it back to you. You can then continue with steps 6,7 & 8.

1) Download a CD image of the recovery program (originally created by Karsten Keil formerly of SuSE) from http://e1000.sourceforge.net/e1000e_recover.iso. Please type the address in your browser window and choose "save to fle"- you cannot search for this file.

2) Burn the iso to CD, & boot the CD. When prompted, select “Rescue System”
Linux will load, you’ll see an openSUSE splashscreen, and eventually a login prompt.

3) Log on as root. There's no password, so just hit return.

4) Read the current eeprom and save it to file. Be patient !

       e1000e_nvm -r –d eth0 -o ethtool.dmp

5) mount a USB disk to save the file, and send the file to me david_dot_graham_at_intel_dot_com

I will then fix up the image, and mail it back to you as ethtoola.dmp.
When you receive the updated file:

6) Write the new eeprom content back to your system NVM

            e1000e_nvm –d eth0 -P 108C8086 ethtoola.dmp

7) You will see some warnings, select YES when prompted.

8) You should then be able to remove the recovery CD, and successfully boot back to a working ethernet using Linux, Windows, OpenSolaris, or anything else.

Revision history for this message
bonsiware (bonsiware-deactivatedaccount) wrote :

Burned the iso, followed your instructions, but:

            eth0 EEprom len 4096
            checksum ed0e wrong should be 830e

So I can't send you my ethtool.dmp

My notebook is a Fujitsu Siemens Lifebook E8410

Thank you Dave!!!

Revision history for this message
dave graham (david-graham) wrote : RE: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
Download full text (3.3 KiB)

This is unusual, as the

 e1000e_nvm -r –d eth0 -o ethtool.dmp

command normally dumps out the 1Gb portion of the system flash even if it _does_ have a bad checksum, and then I've been fixing the checksum & content. Are you sure that there isn't an ethtool.dmp file created in the local directory from which you ran e1000e_nvm ?

If there really is no ethtool.dmp, please send me

1) lspci -tv
2) lspci -xxx
3) dmesg (that includes the failure of the e1000e driver to load)

and I'll send you an instrumented driver that will dump out the 1Gb flash content, and we may be able to fix it from there.

At least I hope so. As I say, I've fixed a lot of these corruptions, but do not recall seeing this particular failure mode before.

Dave

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of bonsiware
Sent: Wednesday, October 07, 2009 3:26 AM
To: Graham, David
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

Burned the iso, followed your instructions, but:

            eth0 EEprom len 4096
            checksum ed0e wrong should be 830e

So I can't send you my ethtool.dmp

My notebook is a Fujitsu Siemens Lifebook E8410

Thank you Dave!!!

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” package in Ubuntu: Fix Released
Status in “linux-lpia” package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fix Released
Status in “linux” package in Fedora: Fix Released
Status in “linux” package in Gentoo Linux: Fix Released
Status in “linux” package in Mandriva: Fix Released
Status in “linux” package in Suse: Fix Released

Bug description:
In some circumstances it appears possible for the 2.6.27-rc kernels to corrupt the NVRAM used by some Intel network parts to store data such as MAC addresses.
This is limited to the new e1000e driver, and reports have only appeared from users of "82566 and 82567 based LAN parts (ich8 and ich9)" (to quote Intel). The reports seem to be isolated to laptops, but it is not clear if this is because desktop/server parts are not vulnerable, or if use cases simply increase the chances of laptop users being hit.

Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware. Use of Intel's IABUTIL.EXE is strongly discouraged, as it will worsen the problem to the point where the network part will no longer appear on the PCI bus.

(this is a new description, the original one was based on too much guesswork. Below are the URLs originally referenced)
(the driver i blacklisted in Ubuntu for 2.6.27-rc in the latest releases, so if your network is not working, it doesn't have to be damaged, but just disabled in order to prevent any accidents until this bug is solved, don't wary!)
http://www.blahonga.org/~art/rant.html (search fo...

Read more...

Revision history for this message
bonsiware (bonsiware-deactivatedaccount) wrote :

lspci -tv:

+-19.0 Intel Corporation 82566DC Gigabit Network Connection

lspci -xxx:
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03)
00: 86 80 4b 10 03 01 10 00 03 00 00 02 00 00 00 00
10: 00 00 40 fe 00 40 42 fe 21 18 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

dmesg:
[ 1.652371] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
[ 1.652374] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 1.652440] e1000e 0000:00:19.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.652447] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[ 1.652456] e1000e 0000:00:19.0: setting latency timer to 64
[ 1.652628] alloc irq_desc for 28 on node -1
[ 1.652630] alloc kstat_irqs on node -1
[ 1.652647] e1000e 0000:00:19.0: irq 28 for MSI/MSI-X
[ 1.739443] ohci1394 0000:1c:03.4: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.757784] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 1.787486] e1000e 0000:00:19.0: PCI INT A disabled
[ 1.787495] e1000e: probe of 0000:00:19.0 failed with error -5

Revision history for this message
dave graham (david-graham) wrote :
Download full text (4.7 KiB)

Thanks,

I still don't understand how it is that e1000e_recover didn't work for you, but I admit that
I have been using it as a tool, and don't understand its inner workings.

Let's try another approach to get the invalid NVM content listed, this time
by the driver when it reads the data. I attach a patch "e1000e-1.0.15.shownvm.patch"
which can be applied our latest e1000e sourceforge release.
To install the driver, and collect that result , please proceeed as follows

1) Copy this patch to a local directory
2) Download e1000e-1.0.15.tar.gz from http://sourceforge.net/projects/e1000/files/
3) Untar the tarball to a local directory,
         tar xvzf e1000e-1.0.15.tar.gz
4) cd e1000e-1.0.15/src
5) Apply the patch
        patch -p2 <../../e1000e-1.0.15.shownvm.patch
6) Remove the old driver, build & install the new one
        rmmod e1000e
        make
        insmod e1000e.ko
7) The system message log should have the NVM content that was read.

The driver should also load even in the presence of the errored NVM. Please let me know whether it does load
and work, and .send me the dmesg log that includes the NVM dump, and I will see if I can fix it up and return
it to you with instructions on how to apply the fixed-up version,

Thanks
Dave

________________________________________
From: <email address hidden> [<email address hidden>] On Behalf Of bonsiware [<email address hidden>]
Sent: Wednesday, October 07, 2009 10:28 AM
To: Graham, David
Subject: [Bug 263555] Re: [intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk

lspci -tv:

+-19.0 Intel Corporation 82566DC Gigabit Network Connection

lspci -xxx:
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 03)
00: 86 80 4b 10 03 01 10 00 03 00 00 02 00 00 00 00
10: 00 00 40 fe 00 40 42 fe 21 18 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

dmesg:
[ 1.652371] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
[ 1.652374] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 1.652440] e1000e 0000:00:19.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.652447] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[ 1.652456] e1000e 0000:00:19.0: setting latency timer to 64
[ 1.652628] alloc irq_desc for 28 on node -1
[ 1.652630] alloc kstat_irqs on node -1
[ 1.652647] e1000e 0000:00:19.0: irq 28 for MSI/MSI-X
[ 1.739443] ohci1394 0000:1c:03.4: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.757784] 0000:00:19.0: 0000:00:19.0: The NVM Checksum Is Not Valid
[ 1.787486] e1000e 0000:00:19.0: PCI INT A disabled
[ 1.787495] e1000e: probe of 0000:00:19.0 failed with error -5

--
[intrepid] 2.6.27 e1000e driver places Intel ICH8 and ICH9 gigE chipsets at risk
https://bugs.launchpad.net/bugs/263555
You received this bug notification because you are a direct subscriber
of the bug.

Status in The Linux Kernel: Fix Released
Status in “linux” package in Ubuntu: Fix Released
Status in “linux-lpia” package in Ubuntu: Fix Released
Status in linux in Ubuntu Intrepid: Fix Released
Status in linux-lpia in Ubuntu Intrepid: Fi...

Read more...

Changed in linux (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Steve Langasek (vorlon) wrote :

Please don't change bug statuses without explanation

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
In , Neo (neo-redhat-bugs) wrote :

Anybody can provide me the fix for the cpu-stuck fix?

Also I need to get an eeprom to restore my Intel® 82573L Ethernet LAN Controller supporting Gigabit Ethernet on the motherboard D5400XS.

Changed in linux:
importance: Unknown → Medium
Changed in linux (Gentoo Linux):
importance: Unknown → Medium
Changed in linux (Mandriva):
importance: Unknown → Critical
Revision history for this message
Troex Nevelin (troex) wrote :
Download full text (4.8 KiB)

I have ThinkPad X60 with 82573L, and after upgrading to 11.04 beta with latest kernel it stop working almost at all.
Tested e1000e 1.2.20-k2 (in stock kernel) and 1.3.10a driver with no luck, booting with option "pcie_aspm=force" doesn't help, I've tried e1000e_recover.iso but it does not boot on my 32bit processor, and the last what strange I cannot read eeprom:

@tpx60:~# ifconfig eth0 up
@tpx60:~# ethtool -e eth0
Cannot get driver information: No such device
@tpx60:~# ifconfig eth0 down
@tpx60:~# ethtool -e eth0
Offset Values
------ ------
0x0000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0040 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0060 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

@tpx60:~# dmesg | grep e1000e
[ 1.231354] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.20-k2
[ 1.231358] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[ 1.231392] e1000e 0000:02:00.0: Disabling ASPM L1
[ 1.231410] e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1.231435] e1000e 0000:02:00.0: setting latency timer to 64
[ 1.231639] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 1.232563] e1000e 0000:02:00.0: Disabling ASPM L0s
[ 1.392249] e1000e 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:16:d3:3a:47:ae
[ 1.392253] e1000e 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection
[ 1.392332] e1000e 0000:02:00.0: eth0: MAC: 2, PHY: 2, PBA No: 005302-003
[ 27.120351] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 27.176320] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 28.762602] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[ 28.762610] e1000e 0000:02:00.0: eth0: 10/100 speed: disabling TSO
[ 32.855747] e1000e 0000:02:00.0: PCI INT A disabled
[ 32.855761] e1000e 0000:02:00.0: PME# enabled
[ 89.756091] e1000e 0000:02:00.0: BAR 0: set to [mem 0xee000000-0xee01ffff] (PCI address [0xee000000-0xee01ffff])
[ 89.756109] e1000e 0000:02:00.0: BAR 2: set to [io 0x2000-0x201f] (PCI address [0x2000-0x201f])
[ 89.756154] e1000e 0000:02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 89.756208] e1000e 0000:02:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[ 89.756278] e1000e 0000:02:00.0: PME# disabled
[ 89.756372] e1000e 0000:02:00.0: Disabling ASPM L1
[ 89.756474] e1000e 0000:02:00.0: irq 44 for MSI/MSI-X
[ 89.758786] e1000e 0000:02:00.0: eth0: MAC Wakeup cause - Link Status Change
[ 89.828165] e1000e 0000:02:00.0: PME# enabled
[ 89.976110] e1000e 0000:02:00.0: BAR 0: set to [mem 0xee000000-0xee01ffff] (PCI address [0xee000000-0xee01ffff])
[ 89.976128] e1000e 0000:02:00.0: BAR 2: set to [io 0x2000-0x201f] (PCI address [0x2000-0x201f])
[ 89.976184] e1000e 0000:02:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 89.976234] e1000e 0000:02:00.0: restoring config space at offs...

Read more...

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

this bug is not a catchall for all e1000e issues, the original issue this bug was filed against is fixed and will be highly unlikely to reoccur. If you're having e1000e issues please file a new bug.

Changed in linux (Fedora):
importance: Unknown → Medium
Changed in linux (Suse):
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.