back end randomly transitioned to failed
Bug #485976 reported by
ChrisW
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
gocept.zeoraid |
Confirmed
|
Critical
|
Christian Theune |
Bug Description
I just got an alert from our monitoring because zeo1 for the packed storage on zeoraid2 has become failed.
On zeoraid1, I get "Could not connect to ZEO server at 0.0.0.0:6001" when trying to check the status of the packed storage.
The unpacked storage on both servers is still optimal, which is odd.
Nothing in zeoraid2's event or debug logs.
In zeoraid1's event log I see:
2009-11-20T17:14:34 INFO ZEO.zrpc.
RuntimeError: RAID is inconsistent and was closed.
There's nothing more informative in the zeoraid1 debug log and nothing in zeo1's event or debug log.
So, I'm a little confused :-S
Changed in gocept.zeoraid: | |
milestone: | none → 1.0b8 |
importance: | Undecided → Critical |
Changed in gocept.zeoraid: | |
assignee: | nobody → Christian Theune (ct-gocept) |
Changed in gocept.zeoraid: | |
status: | New → Confirmed |
To post a comment you must log in.
Ok, so *something* made zeoraid2 think that zeo1 is failed.
This causes zeoraid2 to stop writing to zeo1. If a transaction goes through zeoraid1 and ends up in both zeo1 and zeo2 then zeo1 will quickly appear as inconsistent which is correct.
I think the major problem here is that you can't see why zeoraid2 is degrading zeo1 which is an issue already described in another bug.