Comment 55 for bug 541511

Revision history for this message
In , Tony W (tonywhite100) wrote :

Hi guys,
Really appreciate the work you guys have been doing to try to fix this issue. And I truly mean that. This is a horrible issue.

If you haven't read it, the intel data sheet for this graphics card is here :
http://www.intel.com/Assets/PDF/datasheet/252615.pdf

And it does make of a reasonably interesting read when you consider it from the aspect of trying to hunt down what's causing this problem, although needle - haystack, much?

The SMM space restrictions look like a place of interest to me and also the very liberal way in which they have allowed bios manufacturers choose certain things related to the address registers.
Because it's Centrino technology the 855 is like a three in one deal, bios, 855gm chip and processor, all linked into together to render the graphics to screen and it looks like bios manufacturers have done whatever they thought was the best way to make that combo work, so any number of 855gm cards can work any number of different weird and wonderful ways using each possibly unique bios implementation. I've certainly seen evidence of that on my 855GM.
Having used Linux on the machine using the machine's original bios and then updating the bios to the latest one from the Asus website. Different behaviour exhibited by both bios versions.
The first one only required the nolapic parameter to boot. While the latest one requires mem=1001M (But the memory is supposed to be 1024M and memtest says 1016M.)
Without specifying the memory, the kernel boots but very slowly without the mem param. Fine with.
My wild stab in the dark here is that there is an undetected memory hole and that's causing the problem. The actual memory modules are fine.

As far as what you guys have been testing, I have experienced the same thing in regards to the symptoms. It will work, I can use a browser, watch flash video and it all seems fine but after an hour, it will lock up and need a manual power down to restore a working system.

If it is the case that it is the memory and more specifically the memory buffer which is causing the problem because the buffer is filling up and it is not being flushed in time, does the card not use any sort of compression to compress any parts of the buffer which would require a different type of flush to empty? (Multiple overlay?)
Could it be that because bios manufacturers have had such liberal choices on their bios implementations with this card that the memory addresses to flush are being detected incorrectly or could it be that the flush is trying to flush a part of the memory which it is not allowed to (Maybe because it's detected the addresses incorrectly) And that in turn triggers some kind of stop in the hardware, which prevents any further flushes.

I am of course clutching at straws, my knowledge is limited and it would be a tall order for me to learn C, fork the code, go through the data sheet and write a proper driver for the Linux kernel for this card. Although I would dearly love to do that.

I wish you guys luck on fixing this problem. You have made some impressive improvements so far compared to the way it was and in it's current form, the driver using your patches is so very nearly close to being suitable as a fix.

Please don't give up!