Comment 7 for bug 181258

Revision history for this message
Anna Jonna Armannsdottir (annaj) wrote :

I can report the same problem. A number of diskless clients go flawlessly through a PXE boot, then the net interface is brought down when the Linux kernel boots and suddenly, dhcp does not work anymore.

Background research on the network: The clients are connected to Cisco switches. The configuration of the switches makes them run spanning-tree calculations. This usually takes some seconds but for large networks it can take up to 30 40 seconds.
The network port on the switch is turned on, when this calculation has been finished. This is primarily a security measure.
See also: http://networkers-online.com/blog/2008/08/what-is-bpdu-filter/
If this security measure is turned off, the client boots without problems. :)

To debug the problem, the following boot line was used:

  kernel amd64/vmlinuz
  append ro initrd=amd64/initrd.img nbdport=2000 debug=100 break=1 ip=dhcp

In my case, this causes a timeout as can be seen on the attached screenshots. The kernel loads the NIC module e1000e and reports the type of NIC et.c. The kernel reported is 5.927 .

The debug shows the following:

configure_networking
[ -n eth0 ]
 [-e /tmp/net-eth0.conf ]
ipconfig -t 60 eth0

At about 6.4 in kernel time, the NIC module reports further about it's IRQ configuration.
Two seconds (2 sec) later at about 8.56 the NIC module reports that the Link is up at 100 Mbps full duplex.
Some milliseconds later it signals ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready .

At this point in time, ipconfig is just waiting for an answer to the DHCP-discover that really was choked by the switch.
Now this is an obvious flaw in the script. It would be much more reasonable to try repeatedly with increasing timouts.

may I suggest the following timeouts:
4
8
16
30
60

The sum of this timeout is about 120 seconds.