Test suite often fails with "systemd units changed without reload" on s390x

Bug #2048388 reported by Olivier Gayot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Netplan
Invalid
Undecided
Unassigned
netplan.io (Ubuntu)
Fix Released
Medium
Unassigned
wpa (Ubuntu)
New
Undecided
Unassigned

Bug Description

The "ethernets" autopkgtest for netplan.io 0.107-5ubuntu2 on s390x often fails with

AssertionError: systemd units changed without reload

Looking at the history of autopkgtest runs, it looks like that the error does not always occur during execution of a specific test. I've seen occurrences of this error during the following test-cases:

test_dhcp6 (__main__.TestNetworkd.test_dhcp6) ... FAIL
test_link_local_ipv4 (__main__.TestNetworkd.test_link_local_ipv4) ... FAIL
test_eth_mtu (__main__.TestNetworkd.test_eth_mtu) ... FAIL

Example [1]:

781s FAIL: test_dhcp6 (__main__.TestNetworkd.test_dhcp6)
781s ----------------------------------------------------------------------
781s Traceback (most recent call last):
781s File "/tmp/autopkgtest.G0qQU0/build.Snp/src/tests/integration/ethernets.py", line 189, in test_dhcp6
781s self.generate_and_settle([self.state_dhcp6(self.dev_e_client)])
781s File "/tmp/autopkgtest.G0qQU0/build.Snp/src/tests/integration/base.py", line 342, in generate_and_settle
781s self.fail('systemd units changed without reload')
781s AssertionError: systemd units changed without reload

[1] https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/netplan.io/20240105_144627_cb35e@/log.gz

Related branches

Olivier Gayot (ogayot)
description: updated
Revision history for this message
Lukas Märdian (slyon) wrote :

I assume this might be due to a systemd package upgrade/isntall happening as part of the same autopkgtest? Therefore unit-files are changed unexpectedly..

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

There are tests also failing on Debian https://ci.debian.net/data/autopkgtest/testing/s390x/n/netplan.io/41584490/log.gz

These seem to be caused by problems in the VM or host

992s test_per_route_advertised_receive_window (__main__.TestNetworkManager.test_per_route_advertised_receive_window) ... eth42 .ok
992s test_per_route_congestion_window (__main__.TestNetworkManager.test_per_route_congestion_window) ... RTNETLINK answers: Cannot allocate memory
992s ERROR
992s test_per_route_mtu (__main__.TestNetworkManager.test_per_route_mtu) ... ERROR
992s test_route_from (__main__.TestNetworkManager.test_route_from) ... ERROR
992s test_route_on_link (__main__.TestNetworkManager.test_route_on_link) ... ERROR
992s test_route_table (__main__.TestNetworkManager.test_route_table) ... ERROR

Lukas Märdian (slyon)
Changed in netplan.io (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in netplan:
status: New → Invalid
Revision history for this message
Olivier Gayot (ogayot) wrote :

I did a set of retries in PPA with additional information printed when a test fails.
I haven't yet figured out why, but it looks like the tests do not fail (or fail with a different error) when the trigger=wpa/2:2.10-21 is absent.

http://autopkgtest-ppas.sigexec.com/ogayot/noble-proposed/netplan.io/noble/s390x

When it fails, it looks like the netplan-ovs-cleanup.service is the culprit.

1310s FAIL: test_bridge_path_cost (__main__.TestNetworkd.test_bridge_path_cost)
1310s ----------------------------------------------------------------------
1310s Traceback (most recent call last):
1310s File "/tmp/autopkgtest.PXEHeX/build.QOR/src/tests/integration/bridges.py", line 82, in test_bridge_path_cost
1310s self.generate_and_settle([self.dev_e2_client, self.state_dhcp4('mybr')])
1310s File "/tmp/autopkgtest.PXEHeX/build.QOR/src/tests/integration/base.py", line 373, in generate_and_settle
1310s self.fail(f'systemd units changed without reload: {units}')
1310s AssertionError: systemd units changed without reload: ['netplan-ovs-cleanup.service']

Revision history for this message
Olivier Gayot (ogayot) wrote :

Added wpa as affected so this bug can show in the proposed migrations report.

tags: added: update-excuse
Revision history for this message
Olivier Gayot (ogayot) wrote (last edit ):

After further analysis, it looks like the trigger=wpa/2:2.10-21 highly increases the likelihood of the error to occur.

At this point, I do not think that there is any regression in wpa 2:2.10-21 compared to 2:2.10-20. The changeset is minimal. OTOH I think the trigger=wpa/... alters the order of events and somehow consequently makes the race condition more likely to occur.

FWIW, with a trigger=wpa/..., wpasupplicant gets upgraded before running the ethernets test:

 96s Unpacking wpasupplicant (2:2.10-21) over (2:2.10-20) ...

https://autopkgtest.ubuntu.com/results/autopkgtest-noble-ogayot-noble-proposed//noble/s390x/n/netplan.io/20240109_171235_03058@/log.gz

Whereas. without the trigger, it gets installed:

127s Unpacking wpasupplicant (2:2.10-21) ...

https://autopkgtest.ubuntu.com/results/autopkgtest-noble-ogayot-noble-proposed//noble/s390x/n/netplan.io/20240109_162248_f656d@/log.gz

Revision history for this message
Lukas Märdian (slyon) wrote :

IMO the wpasupplicant trigger is a red herring, as we see the same failure on a trigger=system/255... test case:

https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/netplan.io/20240104_111341_511f7@/log.gz

Revision history for this message
Olivier Gayot (ogayot) wrote :

The last two autopkgtest runs in the infrastructure are green despite the trigger and they both show:

258s Unpacking wpasupplicant (2:2.10-21) ...

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):
Revision history for this message
Lukas Märdian (slyon) wrote :

This patch seems to resolve the situation locally.

For now I'll only go with the changes to tests/integration/base.py, though. As that should be enough to avoid test failures, while the service units shouldn't change (besides being re-generated 1:1).

I want to better understand what's going on exactly and why the units are re-generated multiple times, before applying changes to netplan_cli/cli/commands/apply.py

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.3 KiB)

This bug was fixed in the package netplan.io - 1.0-1

---------------
netplan.io (1.0-1) unstable; urgency=medium

  * New upstream release 1.0:
    - state/status: add support for identifying bridge/bond/vrf members !420
    - Netplan status --diff !440
    - Netplan state diff !386
    - allow COMMON_LINK_HANDLERS for VRFs (LP: #2031421, Closes: #1049432) !401
    - netplan: add support for WPA3-Enterprise (LP: #2029876) !402
    - wifi: support WPA2 and WPA3 Personal simultaneously !404
    - auth: add support for LEAP and EAP-PWD (LP: #2038811) !415
    - wifi: allow to have a psk and an eap password simultaneously !416
    - Migrate CriticalConnection to KeepConfiguration (LP: #1896799) !424
    - apply: bring "lo" back up if it's managed by NM (LP: # 2034595) !408
    - Post 0.107 cleanup & dropping API/ABI compat !400
    - ABI: Refactoring for libnetplan.so.1 !434
    - Add additional bridge port settings !410
    - SR-IOV improvements (VF-LAG support) !439
    Documentation:
    - Add spelling checking to the CI !417
    - doc: libnetplan API reference, using Doxygen and Sphinx.breathe !423
    - doc: Update 'Netplan everywhere' for 23.10 !418
    - added mii-monitor-interval !411
    - Adopt Docs Starter Pack !429
    - Fix howto docs !430
    - docs: add a topic about security !433
    - Document and restructure libnetplan's public API symbols !438
    - Lang. & formatting fixes in API docs. !441
    - Update examples.md !442
    Bug fixes:
    - GitHub CI fixes !405
    - util: don't return a placeholder netdef in the iterator !406
    - tunnels/validation: do not error out if "local" is not defined !407
    - cli/sriov: remove unused code !435
    - wireguard: ignore empty endpoints (LP: #2038811) !414
    - parse: improve the parsing of access-points (LP: #1809994) !413
    - tests: Add autopkgtest for LP: #1959570 !419
    - Fix permissions on folder '/run/NetworkManager/' !422
    - parse-nm/wg: append the correct prefix to IPv6 (LP: #2046158) !428
    - disable StartLimitBurst in the ovs-cleanup service (LP: #2047827) !431
    - ctests: stop including C files in the test files !432
    - workflow/coverity: install missing dependencies !436
    - state_diff: fix filtering of host scoped routes !426
    - sriov: don't generate duplicate entries in the rebind.service file !437
    - check if ovsdb-server.service is active before displaying warning !421
    - parser: accept the special MAC address options (LP: #2045096) !427
    - CI: fix NetworkManager autopkgtest not using deb822 !443
    - tests: Be less strict about systemctl daemon-reload (LP: #2048388)
    - Netplan status --diff refactoring !444
  * d/netplan-generator.install: Fix PLACEHOLDER location
  * d/netplan-generator.lintian-overrides: Clean up unused override
  * d/source/options: Ignore .envrc (direnv)
  * d/control: pkg-config -> pkgconf
  * d/rules: Make use of -Dpython.bytecompile=-1
  * d/control: Update short description
  * d/control,d/libnetplan1.symbols: Prepare for libnetplan1 SOVER bump
  * d/control: Add iproute2 build-dep (for running tests)
  * d/rules: Drop removal of legacy symlink (integrated in meson)
  * d/t/control: execute netplan diff test c...

Read more...

Changed in netplan.io (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.