Comment 43 for bug 541511

Revision history for this message
In , legolas558 (legolas558) wrote :

(In reply to comment #41)
> > --- Comment #40 from legolas558 <email address hidden> 2010-03-26 07:49:31 PST ---
> > In all cases (even 16 whack pages and/or 1000/2000 retries), no more than 2
> > failures are found in dmesg (because as you said it gives up after that).
>
> I've overlooked this, but now that I've checked, this is _very_ curious.
> With v6 you only ever see 2 chipset flush failures, no matter how hard you
> abuse your machine?
>
Yes. Never seen more than 2 since when I started using v6 patch, but I might be wrong because I never did more than 300k flushes in a session with a v6-patched kernel.

> With the three dmesgs you've posted, these two failures are always in the
> same chipset flush, just opposite directions (gtt->cpu and cpu->gtt
> transfers). They'll also coincide with the chipset flush timed out
> message. Can you please check that this is indeed the case (with the other
> dmesgs you've got lying around) with the other test runs, too? Just
> compare the "expected: xxx" value on each of the three backtraces.
>
Yes, you can also see it with v5 patch dmesg in attachment 34233

From my dmesg logs:
~~ session1 - v6 patch
[ 79.983513] i8xx chipset flush failed, expected: 5807, cpu_read: 5806
[ 79.983771] i8xx chipset flush failed, expected: 5807, gtt_read: 5806
~~ session2 - v6 patch
[ 101.807650] i8xx chipset flush failed, expected: 14194, cpu_read: 14193
[ 101.807844] i8xx chipset flush failed, expected: 14194, gtt_read: 14193
~~ session3 - v5 patch
[ 2832.905107] i8xx chipset flush failed, expected: 113457, cpu_read: 113456
[ 2832.905315] i8xx chipset flush failed, expected: 113457, gtt_read: 113456
[ 2910.626579] i8xx chipset flush failed, expected: 215361, cpu_read: 215360
[ 2910.626872] i8xx chipset flush failed, expected: 215361, gtt_read: 215360
[ 2977.424469] i8xx chipset flush failed, expected: 308976, cpu_read: 308975
[ 2977.424746] i8xx chipset flush failed, expected: 308976, gtt_read: 308975

I am gonna make more intensive tests later.

> This is strange because my code only gives up on the _current_ chipset
> flush and doesn't bother to report any further timeouts. It still executes
> all chipset flushes and still reports about failed ones. So if your hw
> only ever reports one failure where everything fails (timeout+paranoia
> check failures in both directions) and never fails again, this would be
> _very_ strange indeed.
>
Occam would say: perhaps it didn't fail at all and we are just not being informed correctly.

My raw guess is that some buddy between us and the GPU is touching something that shouldn't, and I am inclined to always blame the i8042 controller since I am already experiencing keyboard ports corruption when the battery ACPI is being used. But it is hard to link i8042 and the GPU (and the modules which cause the i8042 glitch for keyboard are never loaded), so I am still out of bullets.

> > I am worried about this fact that our hardware, apparently the same, is not
> > showing same behaviour...my .config is here:
>
> I've compared our configs and tried changing a few relevant ones to your
> setting. Still can't reproduce your failures.
>
As already stated, I am not using "clip solid fills" patch, if that might be relevant, but I doubt.

Just FYI, the crashes with hangcheck timer still happen (this time with a wine application, not a video) with the original v6 patch (no custom tuning of mine).