fails to configure BOOTIF when using iscsi

Bug #2056187 reported by Alexsander de Souza
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
initramfs-tools (Ubuntu)
Fix Released
High
Unassigned
Focal
Fix Committed
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned
open-iscsi (Ubuntu)
Invalid
Undecided
Unassigned
Focal
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned

Bug Description

[ Impact ]

 * MAAS cannot PXE-boot a machine that has iSCSI disks

 * Focal is the default Ubuntu distribution deployed by MAAS, so we should
   back-port this to ensure it works out-of-the-box.

[ Test Plan ]

 * reproducing this issue requires a machine with iSCSI disks (Cisco UCS Manager
 in the original report), and a MAAS controller (3.4 or better)

 * the issue can be observed by simply enlisting the machine in MAAS. It will
 fail to boot due to the missing BOOTIF configuration.

[ Where problems could occur ]

 * the problematic code was an attempt to fix LP#2037202, so we should watch out
 for regressions.

[ Original report ]

we have a bad interaction between initramfs-tools and open-iscsi, resulting in the boot interface not being configured.

when the iscsi has a static address, the script `local-top/iscsi` from open-iscsi creates a /run/net-$DEVICE.conf file for the iscsi interface. The existence of this file makes configure_networking() skip configuring the BOOTIF later due to this code in `scripts/functions`:

        for x in /run/net-"${DEVICE}".conf /run/net-*.conf ; do
            if [ -e "$x" ]; then
                IP=done
                break
            fi
        done

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
Changed in open-iscsi (Ubuntu):
status: New → Confirmed
Revision history for this message
Benjamin Drung (bdrung) wrote :

Commit 4e7e25c4f ("configure_networking: move the "are we done" check to end of loop body") from 0.142ubuntu15 allows this bug to appear. This patch was for fixing bug #2037202.

Can you provide the content of /proc/cmdline?

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

BOOT_IMAGE=http://10.107.72.10:5248/images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-kernel nomodeset ro root=squash:http://10.107.72.10:5248/images/ubuntu/amd64/hwe-22.04/jammy/stable/squashfs ip=::::crack-gnu:BOOTIF:dhcp ip6=off overlayroot=tmpfs overlayroot_cfgdisk=disabled cc:{'datasource_list': ['MAAS']}end_cc cloud-config-url=http://10.107.72.10:5248/MAAS/metadata/latest/by-id/ky77c8/?op=get_preseed log_host=10.107.72.10 log_port=5247 --- initrd=http://10.107.72.10:5248/images/ubuntu/amd64/hwe-22.04/jammy/stable/boot-initrd BOOTIF=01-00-25-b5-00-00-00

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

network interfaces:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000\ link/ether 00:25:b5:21:a5:2c brd ff:ff:ff:ff:ff:ff
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000\ link/ether 00:25:b5:21:b5:2c brd ff:ff:ff:ff:ff:ff
4: enp12s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000\ link/ether 00:25:b5:00:00:00 brd ff:ff:ff:ff:ff:ff

enp6s0 and enp7s0 are iscsi interfaces

Revision history for this message
Benjamin Drung (bdrung) wrote :

So BOOTIF is set which should translate to DEVICE being set. Let me come up with a patch.

Revision history for this message
Benjamin Drung (bdrung) wrote :

Please test the attached patch. (My initial patch was shorter. It was using grep but grep is only available in the initramfs when using busybox).

Changed in initramfs-tools (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Changed in open-iscsi (Ubuntu):
status: Confirmed → Invalid
tags: added: patch
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

I've tested launching a Oracle Cloud baremetal instance (which boots from iSCSI) using such patch, and all worked well:

https://pastebin.ubuntu.com/p/3cdFdYBVFG/

Benjamin Drung (bdrung)
Changed in initramfs-tools (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

It seems 0.142ubuntu19 would be exposed to the bug. To test whether or not my use case would be affected by it, I've booted a Jammy instance with initramfs-tools 0.142ubuntu19, and I can't hit the problem. It is booting well through iscsi. My cmdline is as below, so it might not be exposed to the issue (and hence, please ignore my comment above as a valid test for the patch, given that I wasn't exposed to it to begin with):

BOOT_IMAGE=/boot/vmlinuz-6.5.0-1018-oracle root=UUID=2792ceda-655f-4995-bf29-6a714f9a200b ro console=tty1 console=ttyS0 nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 crash_kexec_post_notifiers

Revision history for this message
Victor Sartori (victor-sartori) wrote :

Hello all,

I'm currently testing a patch to generate an initrd image for MAAS with partial success.

The fix for the network is working well. However, when attempting to start a server via PXE using MAAS, strange errors are raised:

Warning: Type of the root file system is unknown, so skipping the check.
mount: bad address 'squash'

This leads to an initramfs prompt.

Here are the steps I follow to generate an initramfs:

1. Apply the patch to the script based on /usr/share/initramfs-tools/scripts.
2. Run the mkinitramfs command.
3. Copy the new initrd file to MAAS.
4. Start the server to boot by network.

I couldn't find clear documentation regarding generation and troubleshooting. Additionally, I'm unsure if this is the right place to ask questions about this issue.

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

did you execute this procedure in your host? or in a VM deployed by MAAS? if you did the former, I think your initrd doesn't have the support to netboot from squashfs images.

anyway, if you can confirm that we got past the original issue, I think it's enough to merge this.

Revision history for this message
Victor Sartori (victor-sartori) wrote :

Bingo! I'm running this on my host.

Yes, the problem was solved.

Revision history for this message
Victor Sartori (victor-sartori) wrote :

Definitely, I'm doing something very wrong.

I deployed a new VM (VMware) through MaaS, using the original initrd.

Applied the patch and did the same things I mentioned in my comment above.

This time, the keyboard does not work (seems to be stuck; I'm not able to scroll up to see what happens).

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

[ Impact ]

 * MAAS cannot PXE-boot a machine that has iSCSI disks

 * Focal is the default Ubuntu distribution deployed by MAAS, so we should
   back-port this to ensure it works out-of-the-box.

[ Test Plan ]

 * reproducing this issue requires a machine with iSCSI disks (Cisco UCS Manager
 in the original report), and a MAAS controller (3.4 or better)

 * the issue can be observed by simply enlisting the machine in MAAS. It will
 fail to boot due to the missing BOOTIF configuration.

[ Where problems could occur ]

 * the problematic code was an attempt to fix LP#2037202, so we should watch out
 for regressions.

[ Other Info ]

 *

Revision history for this message
Victor Sartori (victor-sartori) wrote (last edit ):

Something I forgot to mention, and it may be relevant after seeing the comment above: I'm using the HWE kernel for Jammy.

Is it better to rollback to Focal with the default kernel?

Revision history for this message
Alexsander de Souza (alexsander-souza) wrote :

I think you can continue with your current kernel. We should back-port this to all LTS

Revision history for this message
Victor Sartori (victor-sartori) wrote :

nice!

Revision history for this message
Benjamin Drung (bdrung) wrote (last edit ):

Attached the uploaded diff for jammy. Besides the few lines changed in scripts/functions I backported the improvements of the autopkgtest to increase the test coverage and to test on all architectures. See https://code.launchpad.net/~ubuntu-core-dev/ubuntu/+source/initramfs-tools/+git/initramfs-tools/+ref/ubuntu/jammy for the individual commits for details. I tested the autopkgtest changes in my PPA.

description: updated
Revision history for this message
Benjamin Drung (bdrung) wrote :

Updated jammy patch to include the increased timeout on arm64/armhf from the upcoming 0.142ubuntu23 release.

Revision history for this message
Benjamin Drung (bdrung) wrote :

And here is the diff for focal (which I also tested in my PPA).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package initramfs-tools - 0.142ubuntu23

---------------
initramfs-tools (0.142ubuntu23) noble; urgency=medium

  [ Daniel van Vugt ]
  * hooks/framebuffer: Only add simple/tiny framebuffer drivers. This is to
    limit the size of initrd when FRAMEBUFFER=y is soon enabled for desktop
    installations (LP: #1970069, #1869655).

  [ Benjamin Drung ]
  * autopkgtest: Increase QEMU timeouts on arm64/armhf
  * hooks/framebuffer:
    - Move adding framebuffer drivers into auto_add_modules
    - Drop looking in $MODULESDIR/initrd/ for kernel modules
    - Support MODULES=dep in framebuffer hook

initramfs-tools (0.142ubuntu22) noble; urgency=medium

  * autopkgtest: update systemd-udevd path from /lib to /usr/lib

initramfs-tools (0.142ubuntu21) noble; urgency=medium

  [ Benjamin Drung ]
  * configure_networking:
    - Increase minimum timeout to 30 seconds
    - Fix configuring BOOTIF when using iSCSI (LP: #2056187)
    - Set interface MTU if provided by the DHCP server (LP: #2056194)
    - log sleep durations before retries
  * Copy /etc/passwd into the initramfs to allow dhcpcd running as dhcpcd user
  * Replace obsolete pkg-config build-dependency by pkgconf

  [ Dan Bungert ]
  * Restore nvdimm and dax pmem-related modules (LP: #1981385)

 -- Benjamin Drung <email address hidden> Thu, 21 Mar 2024 10:57:54 +0100

Changed in initramfs-tools (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Alexsander, or anyone else affected,

Accepted initramfs-tools into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/initramfs-tools/0.140ubuntu13.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in initramfs-tools (Ubuntu Jammy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Chris Halse Rogers (raof) wrote :

Hello Alexsander, or anyone else affected,

Accepted initramfs-tools into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/initramfs-tools/0.136ubuntu6.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in initramfs-tools (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (initramfs-tools/0.140ubuntu13.5)

All autopkgtests for the newly accepted initramfs-tools (0.140ubuntu13.5) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

clevis/18-1ubuntu1 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#initramfs-tools

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Benjamin Drung (bdrung) wrote :

clevis/18-1ubuntu1 ran into a timeout. I retried the test.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.