Bug #2051733 “Specifying nohz_full breaks CPU frequency reportin...” : Bugs : linux-signed-lowlatency-hwe-6.5 package : Ubuntu

Revision history for this message

Lastique (andysem) wrote on 2024-01-30:

#1

Dependencies.txt Edit (2.6 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.8 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (300 bytes, text/plain; charset="utf-8")

Revision history for this message

Launchpad Janitor (janitor) wrote on 2024-01-31:

#2

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-lowlatency-hwe-6.5 (Ubuntu):
status:	New → Confirmed

Revision history for this message

Doug Smythies (dsmythies) wrote on 2024-01-31:

#3

I confirm your findings.

In my case I am using: The intel_pstate CPU frequency scaling driver; powersave CPU frequency scaling governor; HWP, HardWare Pstate, control is disabled; A mainline kernel, 6.8-rc1, compiled with the kernel configuration changes being considered in that other bug report. My main test server with Ubuntu 20.04.6.

Note also, in my case, the CPU frequencies actually seem to be scaling properly, it is just that they are not being reported properly via "/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq".

Revision history for this message

Andrea Righi (arighi) wrote on 2024-01-31:

#4

@dsmythies IIUC this happens also with a mainline kernel, right? Not just the Ubuntu lowlatency one.

Revision history for this message

Lastique (andysem) wrote on 2024-01-31:

#5

> Note also, in my case, the CPU frequencies actually seem to be scaling properly, it is just that they are not being reported properly via "/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq".

To be clear, I cannot be sure whether that was also the case in my testing. I didn't test the actual performance much. In a few short tests it did seem like the performance was lower, but that was not in any way scientific, so it is possible that the problem is just representation in scaling_cur_freq files.

Revision history for this message

Doug Smythies (dsmythies) wrote on 2024-01-31:

#6

Yes, this happens also with the mainline kernel.

It also happens with the intel_cpufreq CPU frequency scaling driver (i.e. the intel_pstate driver in passive mode), and all governors. It also happens with the acpi-cpufreq CPU frequency scaling driver, and all governors. However the manifestations of the incorrectly reported scaling_cur_freq can be anywhere from wrong to correct.

Example 1: 100% load on all 12 CPUs; acpi-cpufreq; schedutil:

doug@s19:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4800005
/sys/devices/system/cpu/cpu10/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu11/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu8/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu9/cpufreq/scaling_cur_freq:4101000

except for CPU 0, which seems to be reporting as is it is using a different driver, the results are correct.

Example 2: 100% load on CPU 4 only; acpi-cpufreq; ondemand:

doug@s19:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4799876
/sys/devices/system/cpu/cpu10/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu11/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu8/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu9/cpufreq/scaling_cur_freq:800000

Again, except for CPU 0, the results are correct.

Yes, this happens also with the mainline kernel.

It also happens with the intel_cpufreq CPU frequency scaling driver (i.e. the intel_pstate driver in passive mode), and all governors. It also happens with the acpi-cpufreq CPU frequency scaling driver, and all governors. However the manifestations of the incorrectly reported scaling_cur_freq can be anywhere from  wrong to correct.

Example 1: 100% load on all 12 CPUs; acpi-cpufreq; schedutil:

doug@s19:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4800005
/sys/devices/system/cpu/cpu10/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu11/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu8/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu9/cpufreq/scaling_cur_freq:4101000

except for CPU 0, which seems to be reporting as is it is using a different driver, the results are correct.

Example 2: 100% load on CPU 4 only; acpi-cpufreq; ondemand:

doug@s19:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4799876
/sys/devices/system/cpu/cpu10/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu11/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4101000
/sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu6/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu7/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu8/cpufreq/scaling_cur_freq:800000
/sys/devices/system/cpu/cpu9/cpufreq/scaling_cur_freq:800000

Again, except for CPU 0, the results are correct.

Revision history for this message

Lastique (andysem) wrote on 2024-01-31:

#7

I have run a quick `7z b` benchmark on the lowlatency kernel with and without `nohz_full` parameter, and the results are fairly close:

No `nohz_full` parameter:

Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS

22: 82694 1333 6033 80445 | 741767 1584 3994 63265
23: 81022 1400 5896 82552 | 736593 1589 4011 63731
24: 79427 1429 5978 85401 | 722675 1581 4011 63433
25: 77665 1459 6077 88676 | 711778 1587 3990 63346
---------------------------------- | ------------------------------
Avr: 1405 5996 84269 | 1585 4002 63444
Tot: 1495 4999 73856

With `nohz_full=1-15` parameter:

Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS

22: 84475 1357 6055 82177 | 738578 1552 4060 62993
23: 80361 1376 5951 81878 | 726439 1482 4240 62852
24: 80275 1437 6006 86312 | 715778 1496 4199 62827
25: 77007 1448 6073 87924 | 708632 1563 4034 63066
---------------------------------- | ------------------------------
Avr: 1404 6021 84573 | 1523 4133 62935
Tot: 1464 5077 73754

In the latter case, decompressing is slightly slower, but definitely not "800MHz" slower, so it looks like the problem is indeed with frequency reporting rather than scaling.

summary:	- Specifying nohz_full disables CPU frequency scaling + Specifying nohz_full breaks CPU frequency reporting
description:	updated

Revision history for this message

Doug Smythies (dsmythies) wrote on 2024-01-31:

#8

There is a high probability that the root issue here is related to some work done in August September.
There was already an outstanding issue with intel_cpufreq driver / schedutil governor, hwp enabled.

References:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d51847acb018d83186e4af67bc93f9a00a8644f7

https://bugzilla.kernel.org/show_bug.cgi?id=217597

Revision history for this message

Doug Smythies (dsmythies) wrote on 2024-02-02:

#9

all_cpu_frequencies.png Edit (35.5 KiB, image/png)

CPU frequency scaling driver: intel_pstate
CPU frequency scaling governor: powersave
HWP: disabled.

Purpose to verify that the driver is working correctly, regardless of CPU frequencies reported.
A single threaded load was applied to CPU 5 at 347 hertz sleep/work frequency. The load was increased then deceased. The intel_pstate_tracer.py utility was run during the test capturing the attached.
All pstates were used and appropriate per the load.

Revision history for this message

Doug Smythies (dsmythies) wrote on 2024-02-02:

#10

The way it is currently done, I don't think valid CPU frequency listing via "scaling_cur_freq", or /proc/cpuinfo, is expected to work. Why not? Because the required code is never executed, on purpose. Here is an excerpt from a commit (see the bit about NOHZ full)

commit f3eca381bd49d708073ba1a9af4fa6ea5d5810a6
Author: Thomas Gleixner <email address hidden>
Date: Fri Apr 15 21:20:04 2022 +0200

x86/aperfmperf: Replace arch_freq_get_on_cpu()

Reading the current CPU frequency from /sys/..../scaling_cur_freq involves
in the worst case two IPIs due to the ad hoc sampling.

The frequency invariance infrastructure provides the APERF/MPERF samples
already. Utilize them and consolidate this with the /proc/cpuinfo readout.

The sample is considered valid for 20ms. So for idle or isolated NOHZ full
CPUs the function returns 0, which is matching the previous behaviour.

There was couple of later commits and now it prints out the minimum CPU frequency when it thinks the number are stale. With NOHz full it always thinks the numbers are stale.

The intel_cpufreq driver seems to display CPU frequencies okay, but only the pstate that was requested, not the actual frequency granted.

Revision history for this message

Lastique (andysem) wrote on 2024-02-02:

#11

I'm not familiar with Linux kernel internals, or how scaling_cur_freq is implemented internally, but that doesn't look like a valid logic to me. If it takes an IPI (or two, as the commit message suggests) to read the core frequency, then make those IPIs. It doesn't matter how expensive it is - if the user wants to read the current frequency then he is willing to pay for that information. This likely won't be a frequent operation anyway. Providing an interface to read this information and then feeding bogus data through it is not acceptable, IMO.

Ubuntu
linux-signed-lowlatency-hwe-6.5 package

Specifying nohz_full breaks CPU frequency reporting

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux-signed-lowlatency-hwe-6.5 package

Specifying nohz_full breaks CPU frequency reporting

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux-signed-lowlatency-hwe-6.5 package