Comment 360 for bug 263555

Revision history for this message
In , John Ronciak (john-ronciak) wrote :

Here is a patch which we at Intel LAD have been testing today. This looks to be a work-around and with the .28 a fix for the root cause of the problem. The problem was with ftrace which is what we bisec'd to last week. On systems that failed with minutes we have not been able to make it happen once ftrace was disabled. So I think the .28 ftrace needs to get included into SLES11.

>---------- Forwarded message ----------
>From: Steven Rostedt <email address hidden>
>Date: Wed, Oct 15, 2008 at 3:21 PM
>Subject: [PATCH -stable] disable CONFIG_DYNAMIC_FTRACE due to possible
>memory corruption on module unload
>To: LKML <email address hidden>, <email address hidden>
>Cc: Linus Torvalds <email address hidden>, Andrew Morton
><email address hidden>, Arjan van de Ven <email address hidden>,
><email address hidden>, <email address hidden>, Thomas Gleixner
><email address hidden>, Ingo Molnar <email address hidden>
>
>
>
>While debugging the e1000e corruption bug with Intel, we discovered
>today that the dynamic ftrace code in mainline is the likely source of
>this bug.
>
>For the stable kernel we are providing the only viable fix
>patch: labeling
>CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)
>
>We will follow up with a backport patch that contains the
>fixes. But since
>the fixes are not a one liner, the safest approach for now is to
>disable the code in question.
>
>The cause of the bug is due to the way the current code in mainline
>handles dynamic ftrace. When dynamic ftrace is turned on, it also
>turns on CONFIG_FTRACE which enables the -pg config in gcc that places
>a call to mcount at every function call. With just CONFIG_FTRACE this
>causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this
>overhead by dynamically updating the mcount call sites into nops.
>
>The problem arises when we trace functions and modules are unloaded.
>The first time a function is called, it will call mcount and the mcount
>call will call ftrace_record_ip. This records the calling site and
>stores it in a preallocated hash table. Later on a daemon will
>wake up and call kstop_machine and convert any mcount callers into
>nops.
>
>The evolution of this code first tried to do this without the
>kstop_machine
>and used cmpxchg to update the callers as they were called. But I
>was informed that this is dangerous to do on SMP machines if another
>CPU is running that same code. The solution was to do this with
>kstop_machine.
>
>We still used cmpxchg to test if the code that we are modifying is
>indeed code that we expect to be before updating it - as a final
>line of defense.
>
>But on 32bit machines, ioremapped memory and modules share the same
>address space. When a module would load its code into memory
>and execute
>some code, that would register the function.
>
>On module unload, ftrace incorrectly did not zap these functions from
>its hash (this was the bug). The cmpxchg could have saved us in most
>cases (via luck) - but with ioremap-ed memory that was exactly
>the wrong
>thing to do - the results of cmpxchg on device memory are undefined.
>(and will likely result in a write)
>
>The pending .28 ftrace tree does not have this bug anymore, as
>a general push
>towards more robustness of code patching, this is done
>differently: we do not
>use cmpxchg and we do a WARN_ON and turn the tracer off if
>anything deviates
>from its expected state. Furthermore, patch sites are
>statically identified
>during build time so there's no runtime discovery of dynamic code areas
>anymore, and no room for code unmaps to cause the hash to
>become out of date.
>
>We believe the fragility of dynamic patching has been sufficiently
>addressed in the development code via the static patching
>method, but further
>suggestions to make it more robust are welcome.
>
>Signed-off-by: Steven Rostedt <email address hidden>
>Acked-by: Ingo Molnar <email address hidden>
>Acked-by: Thomas Gleixner <email address hidden>
>---
> kernel/trace/Kconfig | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>Index: linux-compile.git/kernel/trace/Kconfig
>===================================================================
>--- linux-compile.git.orig/kernel/trace/Kconfig 2008-10-02
>10:18:49.000000000 -0400
>+++ linux-compile.git/kernel/trace/Kconfig 2008-10-15
>17:29:34.000000000 -0400
>@@ -103,7 +103,8 @@ config CONTEXT_SWITCH_TRACER
> all switching of tasks.
>
> config DYNAMIC_FTRACE
>- bool "enable/disable ftrace tracepoints dynamically"
>+ bool "enable/disable ftrace tracepoints dynamically (BROKEN)"
>+ depends on BROKEN
> depends on FTRACE
> depends on HAVE_DYNAMIC_FTRACE
> default y
>
>--
>To unsubscribe from this list: send the line "unsubscribe
>linux-kernel" in
>the body of a message to <email address hidden>
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>