[KVM] Lower the default for halt_poll_ns to 200000 ns

Bug #1724614 reported by Jorge Niedbalski
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Xenial
Fix Released
Medium
Unassigned
Zesty
Won't Fix
Medium
Unassigned

Bug Description

[Environment]

Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

Linux porygon 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[Description]

We've identified a constant high (~90%) system time load at the host level
when a VCPU in a KVM guest remains or switches/resumes in/from halt/idle state
in a constant frequency, usually for a slightly smaller time than the default polling
period.

The halt polling mechanism has the intention to reduce latency in the cases
on which the guest is quickly resumed saving a call to the scheduler.

We've performed some testing by adjusting the /sys/module/kvm/parameters/halt_poll_ns
value which defines the max time that should be spend polling before calling the
scheduler to allow it to run other tasks (which defaults to 400000 ns in Ubuntu).

With the default value the tests shows that the load remains nearly on 90% on a
VCPU that has a single task in the run queue.

We've also tested altering the halt_poll_ns value to 200000 ns and the results
seems to drop the system time usage from 90% to ~25%.

root@porygon:/home/ubuntu# echo 200000 > /sys/module/kvm/parameters/halt_poll_ns
root@porygon:/home/ubuntu# mpstat 1 -P 6 5
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)

02:06:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:06:09 PM 6 0.00 0.00 4.85 0.00 0.00 0.00 0.00 16.50 0.00 78.64
[...]
Average: 6 0.00 0.00 4.26 0.00 0.00 0.00 0.00 17.83 0.00 77.91

root@porygon:/home/ubuntu# echo 400000 > /sys/module/kvm/parameters/halt_poll_ns
root@porygon:/home/ubuntu# mpstat 1 -P 6 5
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)

02:06:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:06:21 PM 6 0.00 0.00 87.13 0.00 0.00 0.00 0.00 11.88 0.00 0.99
[...]
Average: 6 0.00 0.00 89.59 0.00 0.00 0.00 0.00 8.45 0.00 1.96

[Reproducer]

1) Configure a KVM guest with a single pinned VCPU.
2) Run the following program (http://pastebin.ubuntu.com/25731919/) at the KVM guest.
$ gcc test.c -lpthread -o test && ./test 250 0
3) Run mpstat at the host on the pinned CPU and compare the stats
$ sudo mpstat 1 -P 6 5

[Fix]

Change the halt polling max time to half of the current value.

In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
increase heavily even in cases where the performance improvement was
small. In particular, bandwidth divided by CPU usage was as much as
60% lower.

To some extent this is the expected effect of the patch, and the
additional CPU utilization is only visible when running the
benchmarks. However, halving the threshold also halves the extra
CPU utilization (from +30-130% to +20-70%) and has no negative
effect on performance.

Signed-off-by: Paolo Bonzini <email address hidden>

* https://github.com/torvalds/linux/commit/b401ee0b85a53e89739ff68a5b1a0667d664afc9

tags: added: sts
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you plan on submitting an SRU request to the kernel team mailing list?

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Triaged
Changed in linux (Ubuntu Zesty):
status: New → Triaged
tags: added: kernel-da-key
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@jsalisbury,

Yes, I do plan to submit this as an SRU request to the kernel ML.

Victor Tapia (vtapia)
description: updated
Victor Tapia (vtapia)
Changed in linux (Ubuntu Zesty):
status: Triaged → Won't Fix
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Victor Tapia (vtapia) wrote :

#VERIFICATION-XENIAL

The tunable default is now 200000 for the kernel in xenial-proposed:

ubuntu@colt:~$ cat /sys/module/kvm/parameters/halt_poll_ns
200000

ubuntu@colt:~$ uname -a
Linux colt 4.4.0-117-generic #141-Ubuntu SMP Tue Mar 13 11:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Victor Tapia (vtapia)
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (56.9 KiB)

This bug was fixed in the package linux - 4.4.0-119.143

---------------
linux (4.4.0-119.143) xenial; urgency=medium

  * linux: 4.4.0-119.143 -proposed tracker (LP: #1760327)

  * Dell XPS 13 9360 bluetooth scan can not detect any device (LP: #1759821)
    - Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"

linux (4.4.0-118.142) xenial; urgency=medium

  * linux: 4.4.0-118.142 -proposed tracker (LP: #1759607)

  * Kernel panic with AWS 4.4.0-1053 / 4.4.0-1015 (Trusty) (LP: #1758869)
    - x86/microcode/AMD: Do not load when running on a hypervisor

  * CVE-2018-8043
    - net: phy: mdio-bcm-unimac: fix potential NULL dereference in
      unimac_mdio_probe()

linux (4.4.0-117.141) xenial; urgency=medium

  * linux: 4.4.0-117.141 -proposed tracker (LP: #1755208)

  * Xenial update to 4.4.114 stable release (LP: #1754592)
    - x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
    - usbip: prevent vhci_hcd driver from leaking a socket pointer address
    - usbip: Fix implicit fallthrough warning
    - usbip: Fix potential format overflow in userspace tools
    - x86/microcode/intel: Fix BDW late-loading revision check
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - sched/deadline: Use the revised wakeup rule for suspending constrained dl
      tasks
    - can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
    - can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
    - PM / sleep: declare __tracedata symbols as char[] rather than char
    - time: Avoid undefined behaviour in ktime_add_safe()
    - timers: Plug locking race vs. timer migration
    - Prevent timer value 0 for MWAITX
    - drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
    - drivers: base: cacheinfo: fix boot error message when acpi is enabled
    - PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID
    - PCI: layerscape: Fix MSG TLP drop setting
    - mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version
    - fs/select: add vmalloc fallback for select(2)
    - hwpoison, memcg: forcibly uncharge LRU pages
    - cma: fix calculation of aligned offset
    - mm, page_alloc: fix potential false positive in __zone_watermark_ok
    - ipc: msg, make msgrcv work with LONG_MIN
    - x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
    - ACPI / processor: Avoid reserving IO regions too early
    - ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
    - ACPICA: Namespace: fix operand cache leak
    - netfilter: x_tables: speed up jump target validation
    - netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed
      in 64bit kernel
    - netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
    - netfilter: nf_ct_expect: remove the redundant slash when policy name is
      empty
    - netfilter: nfnetlink_queue: reject verdict request from different portid
    - netfilter: restart search if moved to other chain
    - netfilter: nf_conntrack_sip: extend request line validation
    - netfilter: use fwmark_reflect in nf_send_reset
    - ext2: Don't clear SGID when inheriting ACLs
    - reiserfs: fix race in prealloc discard
    - re...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Xenial)
Changed in linux (Ubuntu Xenial):
status: New → Fix Released
no longer affects: linux (Ubuntu Groovy)
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.