UDP checksum offload breaks DHCP on virtual machines

Bug #610391 reported by Neil Wilson
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
isc-dhcp (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Andrew Woodward

Bug Description

When running an Ubuntu server on RHEL6 the dhclient on Ubuntu will report 'bad udp checksum' in response to the DHCP Offer packets coming from the Host.

This is on a libvirt controlled KVM system using libvirt controlled dnsmasq processes on the Host.

Optimisations within the kernel appear to be upsetting dhclient.

http://www.pubbs.net/201004/dhcp/30492-dhcpd-problem-on-virtual-machines.html

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-24-generic-pae (not installed)
Regression: Yes
Reproducible: Yes
ProcVersionSignature: User Name 2.6.32-24.38-generic-pae 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic-pae i686
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [ 13.240032] eth0: no IPv6 routers present
 [ 2868.734972] psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
Date: Tue Jul 27 09:33:31 2010
Lspci: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: Red Hat KVM
PciMultimedia:

ProcCmdLine: root=UUID=129537b3-af8e-4e06-bd9d-06626fa40823 ro quiet splash
ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 01/01/2007
dmi.bios.vendor: Seabios
dmi.bios.version: 0.5.1
dmi.chassis.type: 1
dmi.chassis.vendor: Red Hat
dmi.modalias: dmi:bvnSeabios:bvr0.5.1:bd01/01/2007:svnRedHat:pnKVM:pvrRHEL6.0.0PC:cvnRedHat:ct1:cvr:
dmi.product.name: KVM
dmi.product.version: RHEL 6.0.0 PC
dmi.sys.vendor: Red Hat

Revision history for this message
In , Neil (neil-redhat-bugs) wrote :

Description of problem:

While running an Ubuntu Lucid virtual server under kvm on RHEL6, the DHCP system reports 'bad udp checksum' in response to DHCP offer packets coming from the libvirt launched dnsmasq process on the Host. Comparing RHEL5 to RHEL6 show that tx checksumming is switched on across the 'vnet' interfaces.

Version-Release number of selected component (if applicable):

kernel.x86_64 2.6.32-44.1.el6 @anaconda-RedHatEnterpriseLinux6-201004141252.x86_64/6.0

How reproducible:

Setup a virtual network handing out DHCP addresses and run a virtual machine running dhclient3. If you run the dhclient3 in the console, you will see 'bad udp checksum in 5 packets' errors. tcpdump shows that the DHCP offer packet are reaching the virtual machine.

Steps to Reproduce:
1.
2.
3.

Actual results:

'bad udp checksum in 5 packets'

Expected results:

virtual machine picks up the ip address as normal.

Additional info:

http://www.pubbs.net/201004/dhcp/30492-dhcpd-problem-on-virtual-machines.html

Revision history for this message
Neil Wilson (neil-aldur) wrote :
Revision history for this message
In , Neil (neil-redhat-bugs) wrote :
Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Revision history for this message
In , Dor (dor-redhat-bugs) wrote :

We solved that using netfilter, Michael ,please find the bug number to point to and close this bug

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

*** This bug has been marked as a duplicate of bug 605555 ***

Revision history for this message
In , Neil (neil-redhat-bugs) wrote :

I can't see that bug. What is the fix?

Revision history for this message
In , Neil (neil-redhat-bugs) wrote :

Verified the bug with libvirt-0.8.1-15.el6.x86_64, kernel-2.6.32-52.el6.x86_64 and iptables-1.4.7-3.el6.x86_64, and PASSED

Revision history for this message
In , Michael (michael-redhat-bugs) wrote :

Change to verified?

Revision history for this message
In , Dor (dor-redhat-bugs) wrote :

*** This bug has been marked as a duplicate of bug 605555 ***

Neil Wilson (neil-aldur)
affects: dhcp (Ubuntu) → isc-dhcp (Ubuntu)
Revision history for this message
Neil Wilson (neil-aldur) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in isc-dhcp (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

This sounded somewhat familiar but as the novell bug discusses it, back then being related to a xen installation. I have not recently heard of that, but just might have gone unnoticed or fixed. But it sounds like the pv driver of kvm may behave similar to netfront and needs a similar change. But I would need a bit more investigation.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Can't see any mention of an actual isc-dhcp bug in there. Marking isc-dhcp task invalid.

Changed in isc-dhcp (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
penalvch (penalvch) wrote :

Neil Wilson, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Andrew Woodward (xarses)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andrew Woodward (xarses) wrote :

I'm re opening this bug as I just beat my head against it.

I've been able to reproduce this in

Ubuntu 12.04 LTS 3.8.0-31-generic

Running dnmasq from Centos 6.4

I'll update the details here in a few days. I'm assigning me to track that

Changed in linux (Ubuntu):
assignee: nobody → Andrew Woodward (xarses)
penalvch (penalvch)
tags: added: needs-bisect raring
Revision history for this message
Andrew Woodward (xarses) wrote :

Ok, so here is the breakdown of what i was able to find.

DHCP server using dnsmasq (2.65.5.el6) on CentoOS 6.4 a kvm guest with virtio
+ Also running cobbler with some preseed building scripts based on 12.04 LTS

node-1 a bare (guest) to PXE boot from DHCP server, on its own linux bridge (node_1_net) in routed mode, guest nic is virtio
node-2 a bare (guest) to PXE boot from DHCP server, on its own linux bridge (node_2_net) in routed mode, guest nic is e1000
node-3 a bare (guest) to PXE from DHCP server on the same linux bridge as DHCP server with virtio nic

In the kvm host I have stopped dnsmasq (which starts automatically) on default, node_1_net, node_2_net
In the kvm host I have started dhcp-helper -i node_1_net -i node_2_net -s <DHCP server IP> (dhcp-helper 1.1)

when booting node-1 it will PXE just fine, but when running the installer, the node aborts attempting to request and address from DHCP, running dhclient by hand yields the magical "bad udp checksum" message.

when booting node-2 it will boot and install just fine, the only change being the nic driver
when booting node-3 it will boot and install just fine but in this case, the DHCP requests aren't relayed over dhcp-helper.

So it appears that some drivers won't perform the udp checksum when some dhcp servers / relays are involved in the conversation.

I will try to test the same using a 13.10 loader and report back

Changed in linux:
importance: Unknown → Critical
status: Unknown → Invalid
penalvch (penalvch)
no longer affects: linux (Ubuntu)
affects: linux → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Critical → Undecided
status: Invalid → New
Revision history for this message
penalvch (penalvch) wrote :

Andrew Woodward, the Ubuntu versions you tested this to, and the original reporter did are EOL as per https://wiki.ubuntu.com/Releases .

Is this issue reproducible in a supported release?

Also, do you still consider yourself assigned to this problem (i.e. you will be providing a fix patch soon towards this issue)?

Changed in linux (Ubuntu):
assignee: nobody → Andrew Woodward (xarses)
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.