Comment 39 for bug 1317811

Revision history for this message
Stéphan Kochen (stephank) wrote :

I believe my test case is flawed, so I cannot verify with certainty if the issue is fixed or not. This is the same test case as I used before, for which I posted code in a gist: https://gist.github.com/stephank/764e3414d57bc3bcb6b3

Here's what I tried:

 - I started two new c3.large machines from ami-69e76c1e (eu-west-1 HVM 64-bit trusty with instance store)

 - I downloaded io.js 1.2.0 on machine A, together with the pub.js and sub.js scripts from my gist.

 - I installed redis-server on machine B and reconfigured redis to bind on to the internal IP (in 10.x.x.x)

 - The machines were initially running linux-virtual 3.13.0.45.52. I reproduced the issue in this setup by running sub.js twice, then pub.js once on machine A, connecting them to redis on machine B. The 'rides the rocket' message showed up in the logs, and the subs lost their connection.

 - I enabled trusty proposed on both machines with a pin, and selectively upgraded linux-virtual on both machines. Then rebooted on both. The kernel on both machines is now linux-virtual 3.13.0.46.53.

 - I ran the same test again, sub.js twice, pub.js once on machine A, connecting to machine B. There were no 'rides the rocket' messages, but the subs still lose their connections. I sporadically get 'net_ratelimit: x callbacks suppressed', but not on every test run.

 - I disabled scather/gather on both machines, which also dropped their MTU to 1500, and ran the test again several times. There were no more 'net_ratelimit' messages, but the subs still lose their connections.

 - I installed redis-server on machine A the same way, listening on the internal IP, and ran the same test on machine A, but this time connecting to itself on the internal IP. The test now runs indefinitely. (But this probably doesn't touch the driver.)

So I'm not sure what to take away from this. I suppose I could continue by trying to fix my test case to run properly without scather/gather, before again enabling it. Or find a way to trigger it using a different test, such as with redis-benchmark.

Stefan, is it sufficient verification if your own testing now shows it fixed?