Regression: Does not list available WLAN networks (p54)

Bug #722185 reported by Matthias Klumpp
66
This bug affects 15 people
Affects Status Importance Assigned to Milestone
gcc-4.5 (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
wpasupplicant (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Since Linux Kernel 2.6.37, I cannot connect to WLAN networks anymore. Some networks are listed, but the network I need to connect to is not shown. (Although it has the strongest signal)
My WLAN adaptor is a Linksys WUSB54G device, using the p54 driver.
On Linux 2.6.35, I had no problems in accessing WLAN, the problem occurs only in Linux 2.6.37 and 2.6.38.

Log message:
11.02.2011 23:20:10 localhost NetworkManager[1048] <info> (wlan0): supplicant interface state: starting -> ready
11.02.2011 23:20:10 localhost NetworkManager[1048] <info> (wlan0): device state change: 2 -> 3 (reason 42)
11.02.2011 23:20:11 localhost wpa_supplicant[1085] Failed to initiate AP scan.

I currently use Ubuntu Natty.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Matthias,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 722185

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Matthias Klumpp (ximion) wrote :

Tested with upstream kernel 2.6.38-020638rc6-generic from Ubuntu mainline PPA, the issue is present upstream too.
Attached more complete kernel log to this report.

Changed in linux (Ubuntu):
status: Incomplete → New
tags: removed: needs-kernel-logs needs-upstream-testing
Revision history for this message
Matthias Klumpp (ximion) wrote :
Revision history for this message
Jason Conti (jconti) wrote :

I am having the same problem with the p54usb driver and isl3886 firmware from linux-firmware-nonfree. It fails the AP scan, and then goes into an associate 1, 2, 3, timed out, direct probe 1, 2, 3 timed out loop. It doesn't connect with the current mainline ppa kernel either.

Strangely, I built a kernel from the stable 2.6.38 release today and that will successfully connect and works without issues. I don't believe anything changed between the releases that causes this to work, because in debian experimental, 2.6.38-rc7 works without issues as well (using the experimental wpasupplicant 0.7.3). So it would appear to be an issue with some ubuntu patch, although that would mean the mainline kernel should work, right? So perhaps something with the kernel configuration, but I haven't managed to track down the issue.

I could include the working kernel config, but I'm not sure it would help, since I disabled everything not required for my system so it would compile quickly. I will include a syslog excerpt with wpasupplicant set to debug level 3 though.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Jason,

From the logs there I assume you're using wpasupplicant directly? If you do, and use the same config on the same machine with the different kernels, then please mark the wpasupplicant task as "Invalid".

Jason Conti (jconti)
Changed in wpasupplicant (Ubuntu):
status: New → Invalid
Jason Conti (jconti)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jason Conti (jconti) wrote :

Have been gradually enabling and disabling kernel config options to attempt to move it closer to the config for linux-image-2.6.38-7-generic and figure out where the wireless breaks. I have not yet managed to successfully build a kernel where the wireless doesn't work, however today I enabled CONFIG_EXPERT and disabled CONFIG_CC_OPTIMIZE_FOR_SIZE as it is in the ubuntu kernel config, and I get the following error that kills the build:

CC [M] net/mac80211/rc80211_minstrel_ht.o
net/mac80211/rc80211_minstrel_ht.c: In function ‘minstrel_ht_get_rate’:
net/mac80211/rc80211_minstrel_ht.c:629:1: error: unrecognizable insn:
(insn 490 430 491 14 net/mac80211/rc80211_minstrel_ht.c:317 (set (reg:SI 1 dx)
        (mem/c:QI (plus:SI (reg/f:SI 7 sp)
                (const_int 35 [0x23])) [0 %sfp+-13 S1 A8])) -1 (nil))
net/mac80211/rc80211_minstrel_ht.c:629:1: internal compiler error: in extract_insn, at recog.c:2104
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.5/README.Bugs> for instructions.
make[2]: *** [net/mac80211/rc80211_minstrel_ht.o] Error 1
make[1]: *** [net/mac80211] Error 2
make: *** [net] Error 2

which I find interesting, because mac80211 is used by the p54usb driver. I know nothing about the internals of gcc but an unrecognizable insn error appears to be an optimization error, so I re-enabled CONFIG_CC_OPTIMIZE_FOR_SIZE and rebuilt the kernel, and it compiled fine, booted, and the wireless still worked.

So perhaps an error with gcc 4.5 and optimizations in the mac80211 module? I don't know, will keep investigating.

Revision history for this message
Jason Conti (jconti) wrote :

Okay, I just built the kernel source from the linux-source-2.6.38 package using config-2.6.38-7-generic from the linux-image-2.6.38-7-generic, only changing CONFIG_VERSION_SIGNATURE, booted the new kernel, and the wireless connected successfully, so I am stumped.

What could be the difference between the linux-image-2.6.38-7-generic package and building a kernel with the config from that package and the linux-source-2.6.38 package using:
make
make modules_install
cp .config /boot/config-2.6.38-jconti-09
cp System.map /boot/System.map-2.6.38-jconti-09
cp arch/x86/boot/bzImage /boot/vmlinuz-2.6.38-jconti-09
update-initramfs -k 2.6.38-jconti-09 -c

then booting the vmlinuz-2.6.38-jconti-09 kernel and initrd.img-2.6.38-jconti-09?

Revision history for this message
Jason Conti (jconti) wrote :

I managed to build several kernels from the source for linux-image-2.6.38-7-generic using various configs that previously worked using the method above or make-kpkg, but failed to connect wirelessly using:

cp /path/to/oldconfig debian.master/configs/i386/config.flavour.generic
debian/rules updateconfigs
debian/rules clean
fakeroot debian/rules binary-generic skipabi=true skipmodule=true no_dumpfile=true

I don't understand why, but every kernel I build using the above method goes into the associating 1,2,3 timed out, direct probe 1,2,3 timed out loop.

Revision history for this message
Jason Conti (jconti) wrote :

I think I may have tracked down the problem, although I'm not sure why it is a problem. After extensive testing, when I build a kernel passing CONFIG_DEBUG_SECTION_MISMATCH=y to make, as it is in the ubuntu-style build above, the kernel will fail to connect wirelessly with the p54usb module. Without the option, it connects immediately. Tested it by rebuilding linux-image-2.6.38-7-generic with a minimal config, and seems to be working so far. Going to try rebuilding with a full generic config, but that will take about 3 hours.

Will include a patch of what I changed in build script. I don't see why the option should cause any problems, it only appears to add more verbose warnings to the build log.

Revision history for this message
Jason Conti (jconti) wrote :

Tried with the full generic config and it worked as well. Leaning towards a gcc 4.5 bug again. It happens with any 2.6.38 kernel source (ubuntu patched or upstream), gcc 4.5 and CONFIG_DEBUG_SECTION_MISMATCH, which enables -fno-inline-functions-called-once.

Tested today building with the gcc-4.4 from the natty repos and CONFIG_DEBUG_SECTION_MISMATCH=y and the wireless connected without issue.

Revision history for this message
Jason Conti (jconti) wrote :

Narrowed it down to the p54common.ko module. If I build a kernel with CONFIG_DEBUG_SECTION_MISMATCH=y, install it, then rebuild just the p54 driver: make SUBDIRS=drivers/net/wireless/p54/ modules; and update just the p54common.ko module, the wireless will connect.

I also enabled mac80211 tracing to compare with and without the above option, and they are mostly identical except in the direct probe phase the driver is returning many instances of ENOMEM:

wpa_supplicant-926 [000] 408.763132: drv_sw_scan_start: phy0
  wpa_supplicant-926 [000] 408.763133: drv_return_void: phy0
  wpa_supplicant-926 [000] 408.763134: drv_flush: phy0 drop:0
  wpa_supplicant-926 [000] 408.763135: drv_return_void: phy0
  wpa_supplicant-926 [000] 408.763136: drv_prepare_multicast: phy0 prepare mc (2)
  wpa_supplicant-926 [000] 408.763137: drv_return_u64: phy0 - 0
  wpa_supplicant-926 [000] 408.763138: drv_configure_filter: phy0 changed:0x10 total:0x80000010
  wpa_supplicant-926 [000] 408.763138: drv_return_void: phy0
     kworker/u:0-935 [000] 408.790633: drv_flush: phy0 drop:0
     kworker/u:0-935 [000] 408.790635: drv_return_void: phy0
     kworker/u:0-935 [000] 408.890579: drv_config: phy0 ch:0x40 freq:2412
     kworker/u:0-935 [000] 408.890580: drv_return_int: phy0 - -12
     kworker/u:0-935 [000] 408.890581: drv_config: phy0 ch:0x40 freq:2417
     kworker/u:0-935 [000] 408.890582: drv_return_int: phy0 - -12
     kworker/u:0-935 [000] 408.890582: drv_config: phy0 ch:0x40 freq:2422
     kworker/u:0-935 [000] 408.890583: drv_return_int: phy0 - -12

Which, with the help of many printks, I tracked down to the p54_scan function of fwio.c in the p54common module. It is returning that because p54_alloc_skb is returning NULL in the second branch: unlikely(skb_queue_len(&priv->tx_pending) > 64)

So the outgoing buffer appears to be full. The question is, why does this only happen when using gcc-4.5 and -fno-inline-functions-called-once?

Jason Conti (jconti)
Changed in gcc-4.5 (Ubuntu):
status: New → Invalid
Revision history for this message
Jason Conti (jconti) wrote :

Fixed, there was an uninitialized variable (extra_len) in p54_tx_80211, which is sometimes set in p54_tx_80211_header. However, the variable only appears to get corrupted when p54_tx_80211_header isn't inlined, which only seems to happen with gcc-4.5 and CONFIG_DEBUG_SECTION_MISMATCH=y (-fno-inline-functions-called-once). Without that option and gcc-4.5 the function is inlined, and the function is also inlined using gcc-4.4 and CONFIG_DEBUG_SECTION_MISMATCH=y. This seems to be what was causing the outgoing buffer to fill up.

Invalidating the gcc-4.5 target, if anything gcc-4.5 is finally working correctly in this instance. Also adding a patch that sets the variable to 0 when it is declared.

Matthias Klumpp (ximion)
tags: added: patch regression-release
Revision history for this message
Stefan Bader (smb) wrote :

@Jason, the patch looks ok to me. Mind to send it upstream (<email address hidden>, Christian Lamparter <email address hidden>) so it gets proper review and upstream integration. Usually if that happens and there is a upstream commit for it, it can flow back as SRU.

Revision history for this message
Matthias Klumpp (ximion) wrote :

@Jason: Thanks for you great work! The patch works excellent here :)

Brad Figg (brad-figg)
tags: added: natty
Revision history for this message
Flames_in_Paradise (ellisistfroh-deactivatedaccount) wrote :

http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.38.5 - May 2 2011

commit c6fedb562695213645926348c9885dcc81324e4c

The Kernel is patched!

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Yes, the patch from Jason went in 2.6.38.5, and is on natty tree now, which was updated to 2.6.38.5

A pre-proposed/proposed/update isn't available yet, but meanwhile I built current master-next natty tree with the fix and uploaded here:
http://people.canonical.com/~herton/lp736490/

Feel free to download and test it.

Revision history for this message
Matthias Klumpp (ximion) wrote :

@Herton: Works perfectly well! Thanks!
(Compiling the Kernel manually takes so much time...)

Revision history for this message
Brad Figg (brad-figg) wrote :

The patch was picked up via a stable upstream release commit.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Ilya Minkov (ilya-minkov) wrote :

Still have this problem in 2.6.38-8 Ubuntu kernel

Revision history for this message
Ilya Minkov (ilya-minkov) wrote :

Herton's build resolves the problem for me, i386 plattform.

Revision history for this message
Julian Wiedmann (jwiedmann) wrote :

This patch is in the current -proposed kernel now (2.6.38-10.44).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.