waitid (..., WNOWAIT) spins or hangs the kernel inside sys_waitid if the queued event is from ptrace

Bug #161795 reported by Scott James Remnant (Canonical)
262
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Fix Released
Medium
Kees Cook
linux-source-2.6.17 (Ubuntu)
Fix Released
Medium
Kees Cook
linux-source-2.6.20 (Ubuntu)
Fix Released
Medium
Kees Cook
linux-source-2.6.22 (Ubuntu)
Fix Released
Medium
Kees Cook

Bug Description

Binary package hint: linux-source-2.6.22

waitid () is an improved wait system call added since Linux 2.6.9 that returns a siginfo structure for each event, rather than random bit fields trying to squash everything together. One of its more interesting options is WNOWAIT which returns the siginfo, but leaves the child in a waitable state so that a further call will return the same information (usually this further call is without WNOWAIT).

This has various uses, for example catching SIGCHLD and leaving the process in the table so that /proc/$PID can be examined before the process is finally reaped.

This normally works fine. However if the wait queue item is generated by ptrace, bad things happen. I've seen the process simple spinning at 100% CPU inside sys_waitid () and I've seen the machine stop responding completely -- sometimes if X is running, it behaves as if /dev/console has been closed (newlines get inserted repeatedly into the running application).

CVE References

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

This small program demonstrates the failure.

It spawns a child which issues PTRACE_TRACEME and then raises SIGSTOP (which will be trapped by the parent).

The parent waits for the child with WNOWAIT to get the details early, this will spin at 100% CPU.

Remove the "| WNOWAIT" (and the additional waitid call just below) and it will work as expected.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I've did some printk debugging of this while on the plane home. You'll probably want a copy of kernel/exit.c open as I describe the cause of the bug; line numbers refer to git head as I write this message.

The spin occurs inside sys_waitid (1696..1729) when called with the WNOWAIT option and a signal from a ptraced process to be waited upon; the syscall never exits, so depending on the kernel state, the effect can be anything from a process spinning to a full system hang.

sys_waitid is largely just a wrapper around do_wait (1534..1694), it's within this function that the infinite loop actually occurs.

The function encloses a loop so that we iterate through every thread of the process (1553..1651), and within that a loop to iterate through the children of each thread (1557..1637). In some cases, a wait entry may vanish during execution, so there is a repeat label (1543) to which we can goto if such a condition is detected.

The case of a traced child being on top of the queue is handled (1569..1581) by checking for a race condition and otherwise falling through to the same handling as a stopped child (1582..1599).

For stopped and traced children, we call wait_task_stopped and check its return value (1591..1598); a zero return value means that we found something to return, and -EAGAIN means that the entry has already been waited upon by another thread, so the function should be begun again since the data has changed.

wait_task_stopped (1355..1464) handles the specific cases, and is passed a "noreap" value set to non-zero of WNOWAIT was specified in the waitid options. Other than the initial checks, noreap is handled completely separately (1383..1395): it obtains the information about the process, checks it, and then calls out to wait_noreap_copyout() to pass the data back to userspace before returning.

It has a sanity check on lines 1389..1391; the first part of this is to make sure that the exit_code of the process is non-zero; if it's become zero, it has been waited upon by another thread, so it jumps to the bail_ref label (1422..1432) which unlocks the tasklist and returns -EAGAIN (thus causing the do_wait() function to be restarted).

This sanity check also checks that the process isn't in the TASK_TRACED state (1390), if it is, it will also return -EAGAIN.

And thus we have our infinite loop.

If the process is in TASK_TRACED, and we specify WNOWAIT, we will forever repeat inside do_wait() since wait_task_stopped will always return -EAGAIN.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

git-blame on kernel/exit.c shows that the offending line of code comes from the following commit:

commit 14bf01bb0599c89fc7f426d20353b76e12555308
Author: Linus Torvalds <email address hidden>
Date: Sat Oct 1 11:04:18 2005 -0700

    Fix inequality comparison against "task->state"

    We should always use bitmask ops, rather than depend on some ordering of
    the different states. With the TASK_NONINTERACTIVE flag, the inequality
    doesn't really work.

    Oleg Nesterov argues (likely correctly) that this test is unnecessary in
    the first place. However, the minimal fix for now is to at least make
    it work in the presense of TASK_NONINTERACTIVE. Waiting for consensus
    from Roland & co on potential bigger cleanups.

    Signed-off-by: Linus Torvalds <email address hidden>

diff --git a/kernel/exit.c b/kernel/exit.c
index ee6d8b8..4307773 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1203,7 +1203,7 @@ static int wait_task_stopped(task_t *p, int delayed_group_

                exit_code = p->exit_code;
                if (unlikely(!exit_code) ||
- unlikely(p->state > TASK_STOPPED))
+ unlikely(p->state & TASK_TRACED))
                        goto bail_ref;
                return wait_noreap_copyout(p, pid, uid,
                                           why, (exit_code << 8) | 0x7f,

This is just a style-changed caused by the addition of a new task; the behaviour should have been consistent with the previous code (currently building a due-diligence kernel to test this assumption).

Unfortunately the previous code exists in the original import of the source tree into git, so there is no annotation history for why that line was introduced.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

The original cause of this patch appears to be http://www.ussg.indiana.edu/hypermail/linux/kernel/0509.3/1322.html, where Oleg Nesterov proposes moving TASK_NONINTERACTIVE to attempt to avoid it being caught by < and > in state checks.

In that thread, Oleg asks why the specific line we are testing is the way it is; noting that it would appear to imply that WNOWAIT for TASK_PTRACED is illegal:
http://www.ussg.indiana.edu/hypermail/linux/kernel/0509.3/1763.html -- he proposes a patch to remove the check altogether.

Linus argues that it is correct for reasons that baffle me: http://www.ussg.indiana.edu/hypermail/linux/kernel/0509.3/1774.html

Oleg argues that it's ok to wait() so must be ok to WNOWAIT: http://www.ussg.indiana.edu/hypermail/linux/kernel/0510.0/0036.html

Roland pitches in with an even more unlikely patch, http://www.ussg.indiana.edu/hypermail/linux/kernel/0510.0/0112.html -- so it appears that this check is to make sure that the ptraceness doesn't "go away" during this code.

That appears to be the end of the thread; the code was changed from > TASK_STOPPED to & TASK_TRACED and otherwise left as it was.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

I've tested the following kernels:

 unmodified - as shipped (recompile for control purposes)
 reverted - Linus's patch reverted (so using > TASK_STOPPED)
 removed - That check removed entirely.

The results (and what I expected) are as follows:

 Kernel | Result | Expected
 unmodified | FAIL | FAIL
 reverted | PASS | FAIL
 removed | PASS | PASS

I'm rather surprised; it appears that reverting Linus's patch actually fixes the problem; I do not understand why yet.

I am recompiling again to make damned sure I got the test right, and didn't copy over the top of it, or something.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Have re-tested the reverted kernel, this time it FAILs as expected -- I must have copied over it or something. Doing one more test to make sure I'm not hallucinating.

Kees Cook (kees)
Changed in linux-source-2.6.22:
assignee: nobody → keescook
status: New → In Progress
Revision history for this message
Kees Cook (kees) wrote :
Changed in linux-source-2.6.22:
status: In Progress → Triaged
Changed in linux-source-2.6.20:
assignee: nobody → keescook
importance: Undecided → Medium
status: New → Triaged
Changed in linux-source-2.6.22:
importance: Undecided → Medium
Changed in linux-source-2.6.17:
assignee: nobody → keescook
importance: Undecided → Medium
status: New → Triaged
Changed in linux-source-2.6.15:
assignee: nobody → keescook
importance: Undecided → Medium
status: New → Triaged
Kees Cook (kees)
Changed in linux-source-2.6.22:
status: Triaged → Fix Committed
Changed in linux-source-2.6.20:
status: Triaged → Fix Committed
Changed in linux-source-2.6.17:
status: Triaged → Fix Committed
Changed in linux-source-2.6.15:
status: Triaged → Fix Committed
Kees Cook (kees)
Changed in linux-source-2.6.17:
status: Fix Committed → Fix Released
Changed in linux-source-2.6.20:
status: Fix Committed → Fix Released
Changed in linux-source-2.6.22:
status: Fix Committed → Fix Released
Kees Cook (kees)
Changed in linux-source-2.6.15:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.