rt2x00 oopses in 2.6.26-4, regression against 2.6.24-3

Bug #249242 reported by Christoph Orsinger
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King

Bug Description

I have an Edimax EW-7318UG wlan usb stick which works fine with linux-2.6.26-3 and older.
Under 2.6.26-4 the kernel oopses after plugging in the stick

uname -r: 2.6.26-4-generic

Oops log:
[ 1144.232094] usb 3-5: new high speed USB device using ehci_hcd and address 3
[ 1144.503721] usb 3-5: configuration #1 chosen from 1 choice
[ 1145.184628] phy0 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[ 1145.184646] phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[ 1145.184690] BUG: unable to handle kernel NULL pointer dereference at 00000010
[ 1145.184693] IP: [<c013cd9a>] flush_workqueue+0xa/0x50
[ 1145.184703] *pde = 00000000
[ 1145.184709] Oops: 0000 [#1] SMP
[ 1145.184713] Modules linked in: rt2500usb(+) rt2x00usb rt2x00lib rfkill led_class input_polldev mac80211 cfg80211 ipv6 binfmt_misc rfcomm l2cap bluetooth ppdev cpufreq_conservative cpufreq_powersave cpufreq_ondemand cpufreq_stats freq_table cpufreq_userspace sbs container sbshc video output battery af_packet iptable_filter ip_tables x_tables ac parport_pc lp parport nvidia(P) snd_via82xx gameport snd_ac97_codec ac97_bus snd_mpu401_uart snd_seq_dummy snd_pcsp snd_pcm_oss snd_mixer_oss snd_seq_oss snd_pcm snd_page_alloc snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device via_ircc button snd irda crc_ccitt soundcore i2c_viapro i2c_core shpchp pci_hotplug via_agp agpgart evdev ext3 jbd mbcache usb_storage usbhid hid sg libusual sr_mod sd_mod cdrom pata_acpi ata_generic pata_via uhci_hcd libata scsi_mod dock ehci_hcd ohci_hcd tulip usbcore thermal processor fan fbcon tileblit font bitblit softcursor uvesafb cn fuse
[ 1145.184771]
[ 1145.184775] Pid: 5582, comm: modprobe Tainted: P (2.6.26-4-generic #1)
[ 1145.184779] EIP: 0060:[<c013cd9a>] EFLAGS: 00010246 CPU: 0
[ 1145.184783] EIP is at flush_workqueue+0xa/0x50
[ 1145.184786] EAX: 00000000 EBX: d5ebd0a0 ECX: 00000096 EDX: 00000000
[ 1145.184789] ESI: c047b4f8 EDI: 00000000 EBP: de6fca00 ESP: dec21e44
[ 1145.184791] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 1145.184795] Process modprobe (pid: 5582, ti=dec20000 task=df883400 task.ti=dec20000)
[ 1145.184798] Stack: d5ebd0a0 d5ebd0a0 ffffffed e0cbaf48 00000000 e0cbb056 e0cbd7d0 d5ebc0f0
[ 1145.184804] e0cbd620 e0cbd744 00000000 d5ebd0a0 d5ebc1a0 e0c91307 dee34360 e0cc96ec
[ 1145.184810] c0361a38 d2a01c00 d2a01c00 00000000 de6fca00 e0cc96ec e0cc94a0 e08a6151
[ 1145.184816] Call Trace:
[ 1145.184821] [<e0cbaf48>] rt2x00lib_remove_dev+0x38/0x60 [rt2x00lib]
[ 1145.184838] [<e0cbb056>] rt2x00lib_probe_dev+0xe6/0x1b0 [rt2x00lib]
[ 1145.184850] [<e0c91307>] rt2x00usb_probe+0xe7/0x170 [rt2x00usb]
[ 1145.184860] [<c0361a38>] mutex_lock+0x8/0x20
[ 1145.184871] [<e08a6151>] usb_probe_interface+0xa1/0x110 [usbcore]
[ 1145.184917] [<c029ac60>] really_probe+0x60/0x180
[ 1145.184926] [<e08a5441>] usb_match_id+0x41/0x60 [usbcore]
[ 1145.184943] [<e08a5680>] usb_device_match+0x40/0x80 [usbcore]
[ 1145.184961] [<c029ae51>] __driver_attach+0x71/0x80
[ 1145.184967] [<c029a5c4>] bus_for_each_dev+0x44/0x70
[ 1145.184977] [<c029ab16>] driver_attach+0x16/0x20
[ 1145.184981] [<c029ade0>] __driver_attach+0x0/0x80
[ 1145.184985] [<c0299f57>] bus_add_driver+0x1a7/0x220
[ 1145.184996] [<c029afec>] driver_register+0x5c/0x130
[ 1145.185007] [<e08a63f1>] usb_register_driver+0x81/0x100 [usbcore]
[ 1145.185027] [<c0152ab8>] sys_init_module+0x88/0x1b0
[ 1145.185037] [<c0103f73>] sysenter_past_esp+0x78/0xb1
[ 1145.185055] =======================
[ 1145.185057] Code: 90 8d 50 10 e9 78 fe ff ff 90 8d b4 26 00 00 00 00 31 d2 e9 69 fe ff ff 89 f6 8d bc 27 00 00 00 00 57 89 c7 56 be f8 b4 47 c0 53 <8b> 58 10 b8 f0 b4 47 c0 85 db 0f 45 f0 e8 54 46 22 00 89 f0 e8
[ 1145.185083] EIP: [<c013cd9a>] flush_workqueue+0xa/0x50 SS:ESP 0068:dec21e44
[ 1145.185090] ---[ end trace 591fddf59e09f337 ]---

with linux-2.6.26-3-generic dmesg shows this:
[ 114.997119] usb 3-5: new high speed USB device using ehci_hcd and address 2
[ 115.268033] usb 3-5: configuration #1 chosen from 1 choice
[ 115.617564] phy0 -> rt2500usb_init_eeprom: Error - Invalid RT chipset detected.
[ 115.617579] phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
[ 115.617661] usbcore: registered new interface driver rt2500usb
[ 115.922985] phy1: Selected rate control algorithm 'pid'
[ 116.030008] Registered led device: rt73usb-phy1:radio
[ 116.030046] Registered led device: rt73usb-phy1:assoc
[ 116.030069] Registered led device: rt73usb-phy1:quality
[ 116.030975] usbcore: registered new interface driver rt73usb
[ 116.084534] firmware: requesting rt73.bin
[ 116.256835] ADDRCONF(NETDEV_UP): wlan0: link is not ready

lsusb:
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 002: ID 046d:c012 Logitech, Inc. Mouseman Dual Optical
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 002: ID 148f:2573 Ralink Technology, Corp. RT2501USB Wireless Adapter
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Any additional informations needed?

Tags: 2.6.26-4
Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
Changed in linux:
assignee: ubuntu-kernel-team → colin-king
status: Triaged → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi,

I've examined the OOPS and it occurs when the driver attempts to create a workqueue and this fails - causing the code to free the workqueue, which causes the OOPS.

I was wondering if this problem occurs every time, or was just a transient problem caused by a lack of resources for just the one instance you tried the Edimax EW-7318UG wlan usb stick. So, can you repeat the insertion of usb stick and let me know if the same OOPS occurs and is always repeatable.

Thanks. Colin.

description: updated
Revision history for this message
Christoph Orsinger (c-orsinger) wrote :

Hi Colin,

Yes, the bug is reproducible. (That's why i wrote this bug report in the first place :) )
The new 2.6.26-5-generic kernel oopses as well. The new oops message doesn't really differ from the original except for some different memory addresses. (Well, at least i think those are memory address.)

I managed to work around the error problem by blacklisting the rt2500usb driver.
But why is this driver loaded in the fist place? I'm a little confused here. I rmmoded rt2500usb experimentally under 2.6.26-3 and didn't notice any difference. It seems this driver isn't used at all. (Is there any method to verify this?)

Short:
2.6.26-3-generic rt2500usb and rt73usb are loaded --> works
2.6.26-4-generic and 2.6.26-5-generic rt2500usb is loaded --> oopses
2.6.26-4-generic and 2.6.26-5-generic with blacklisted rt2500usb --> works

Regards,
Christoph

Revision history for this message
Christoph Orsinger (c-orsinger) wrote :
Revision history for this message
Christoph Orsinger (c-orsinger) wrote :
Revision history for this message
Christoph Orsinger (c-orsinger) wrote :

Update:

It seems there is no bug at all.

I just downloaded the latest daily live CD (Kernel 2.6.26-5) and retested my WLAN adapter on the PC of my house-mate. Well, i couldn't reproduce the error on his PC.
Out of curiosity, i booted my pc with the live CD. Again, no OOPS occurred.

So my hard disk installation must have gotten corrupted somehow.
It didn't took me long to find a possible cause.
free:
             total
Mem: 514544 this is an odd number (502M), should be 512M

some traces in dmseg:
[ 0.615711] system 00:00: iomem range 0xd0000-0xd3fff has been reserved
[ 0.615716] system 00:00: iomem range 0xf0000-0xf7fff could not be reserved
[ 0.615719] system 00:00: iomem range 0xf8000-0xfbfff could not be reserved
[ 0.615722] system 00:00: iomem range 0xfc000-0xfffff could not be reserved
[ 0.615726] system 00:00: iomem range 0x1fff0000-0x1fffffff could not be reserved
[ 0.615729] system 00:00: iomem range 0xfec00000-0xfec00fff has been reserved
[ 0.615733] system 00:00: iomem range 0xffee0000-0xffef2fff has been reserved
[ 0.615736] system 00:00: iomem range 0xffef4000-0xffef8fff has been reserved
[ 0.615740] system 00:00: iomem range 0xffefa000-0xffefafff has been reserved
[ 0.615743] system 00:00: iomem range 0xffefc000-0xffefffff has been reserved
[ 0.615746] system 00:00: iomem range 0xffff0000-0xffffffff could not be reserved
[ 0.615750] system 00:00: iomem range 0x0-0x9ffff could not be reserved
[ 0.615753] system 00:00: iomem range 0x100000-0x1ffeffff could not be reserved
[ 0.615756] system 00:00: iomem range 0xfee00000-0xfee00fff has been reserved
[ 0.615760] system 00:00: iomem range 0xfff80000-0xfffeffff has been reserved

I ran memtest afterwards. (From the live CD, to be sure)
It crashes repeatedly after ~58000 errors in test #2 "Moving Inversions, ones & zeros" regardless which of my 2 memory modules I've inserted.

I suspect a hardware error in the memory controller.

        Christoph

Revision history for this message
Colin Ian King (colin-king) wrote :

The bug occurs when the driver tries to allocated some memory and fails to do so, causing the oops. Maybe your system hit this error, which generally 99.999% of the time does not happen.

I will put a fix in to catch this corner case anyway to make the driver more robust.

Colin

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.26-5.14

---------------
linux (2.6.26-5.14) intrepid; urgency=low

  [ Ben Collins ]

  * SAUCE: applesmc: Add MacBookAir
  * build: Do not build ddeb unless we are on the buildd
  * build: control: Consistency in arch fields.
  * SAUCE: Update toshiba_acpi.c to version 0.19a
    - LP: #77026
  * build: Added perm blacklist support and per-module support to abi-check
    - Blacklist p80211 module from abi checks
  * ubuntu/lirc: Get rid of drivers symlink and use real include stuff

  [ Colin Ian King ]

  * SAUCE: acerhk module - add support for Amilo A1650g keyboard
    - LP: #84159
  * SAUCE: rt2x00: Fix OOPS on failed creation of rt2x00lib workqueue
    - LP: #249242

  [ Mario Limonciello ]

  * Add LIRC back in

  [ Tim Gardner ]

  * Makefile race condition can lead to ndiswrapper build failure
    - LP: #241547
  * update linux-wlan-ng (prism2_usb) to upstream version 1861
    - LP: #245026

  [ Upstream Kernel Changes ]

  * Fix typos from signal_32/64.h merge

 -- Ben Collins <email address hidden> Fri, 01 Aug 2008 00:05:01 -0400

Changed in linux:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.