ZEO versus creative __getstate__

Bug #98275 reported by Tim Peters
64
This bug affects 9 people
Affects Status Importance Assigned to Milestone
ZODB
Status tracked in 3.9
3.8
Fix Released
Critical
Unassigned
3.9
Fix Released
Critical
Unassigned
Zope 2
Fix Released
Medium
Unassigned
2.13
Fix Released
Medium
Unassigned
Zope 3
Status tracked in 3.4
3.4
Fix Released
Critical
Unassigned

Bug Description

Original report at:

http://mail.zope.org/pipermail/zodb-dev/2005-December/009673.html

Test case attached. A custom __getstate__ changes the state. ZEO complains when it tries to commit a new object of this kind. No complaint if FileStorage is used instead of ClientStorage. Don't know whether this "should" work, but offhand I don't see why not.

Tags: issue zodb
Revision history for this message
Tim Peters (tim-one) wrote :
Revision history for this message
Syver Enstad (syver) wrote :

cur_tid == tid on Linux too.

Revision history for this message
Jim Fulton (jim-zope) wrote :

Changes: importance (medium => critical)

Changed in zope3:
status: Unconfirmed → Confirmed
Revision history for this message
Christian Theune (ctheune) wrote :

Ok, I investigated this and found out what is causing it.

I used Tim's test case.

1. The object (Mine) is instanciated and added to the root object.
2. The serialization will discover the new persistent subobject in the root. When the object is stored, it's __getstate__ method will be called.
3. The __getstate__ modifies the object's state which will cause it to be registered with the connection.
4. The client cache will be invalidated for the new object and updated with the new data.
5. The object will be stored a second time, (due to the explict registration with the connection) and the client cache will be invalidated again.
6. As the current transaction invalidated the cache already in this transaction we hit the point that cur_tid == tid.

I can see multiple solutions here:

- forbid to change an object when its __getstate__ method is called
- make the connection/serialization smarter to avoid having objects registered multiple times
- make the client cache allow invalidating current data

Revision history for this message
Jim Fulton (jim-zope) wrote : Re: [Bug 98275] Re: ZEO versus creative __getstate__

Minor note. The bug is that changes to an object are registered
while __getstate__ is being called. I think the object should remain
in the changed state until later. presumably the object is getting
into the uptodate state too soon. This analysis is without looking
at the code. :)

On Oct 10, 2007, at 11:00 AM, Christian Theune wrote:

> Ok, I investigated this and found out what is causing it.
>
> I used Tim's test case.
>
> 1. The object (Mine) is instanciated and added to the root object.
> 2. The serialization will discover the new persistent subobject in
> the root. When the object is stored, it's __getstate__ method will
> be called.
> 3. The __getstate__ modifies the object's state which will cause it
> to be registered with the connection.
> 4. The client cache will be invalidated for the new object and
> updated with the new data.
> 5. The object will be stored a second time, (due to the explict
> registration with the connection) and the client cache will be
> invalidated again.
> 6. As the current transaction invalidated the cache already in this
> transaction we hit the point that cur_tid == tid.
>
> I can see multiple solutions here:
>
> - forbid to change an object when its __getstate__ method is called
> - make the connection/serialization smarter to avoid having objects
> registered multiple times
> - make the client cache allow invalidating current data
>
> --
> ZEO versus creative __getstate__
> https://bugs.launchpad.net/bugs/98275
> You received this bug notification because you are a member of Zodb-
> developers, which is the registrant for ZODB.

--
Jim Fulton
Zope Corporation

Revision history for this message
Christian Theune (ctheune) wrote :

Am Mittwoch, den 10.10.2007, 15:14 +0000 schrieb Jim Fulton:
> Minor note. The bug is that changes to an object are registered
> while __getstate__ is being called.

Right. Thanks for pointing this out more explicitly than I did.

> I think the object should remain
> in the changed state until later. presumably the object is getting
> into the uptodate state too soon. This analysis is without looking
> at the code. :)

It looks like this happens because the object is new. When looking at
the example, an attribute is immediately changed and the state is
UPTODATE. When the serializiation later calls __getstate__ it is
UPTODATE as well.

Are new objects always UPTODATE?

--
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle/saale - germany
www.gocept.com - <email address hidden> - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Revision history for this message
Jim Fulton (jim-zope) wrote :

On Oct 10, 2007, at 12:42 PM, Christian Theune wrote:

> Am Mittwoch, den 10.10.2007, 15:14 +0000 schrieb Jim Fulton:
>> Minor note. The bug is that changes to an object are registered
>> while __getstate__ is being called.
>
> Right. Thanks for pointing this out more explicitly than I did.
>
>> I think the object should remain
>> in the changed state until later. presumably the object is getting
>> into the uptodate state too soon. This analysis is without looking
>> at the code. :)
>
> It looks like this happens because the object is new. When looking at
> the example, an attribute is immediately changed and the state is
> UPTODATE. When the serializiation later calls __getstate__ it is
> UPTODATE as well.
>
> Are new objects always UPTODATE?

New objects are unsaved. During the commit process, the object
becomes uptodate by virtue of getting _p_jar and _p_oid. This state
change is somewhat implicit. I guess we should set the state to
changed adrer setting the initial _p_oid and _p_jar.

Jim

--
Jim Fulton
Zope Corporation

Revision history for this message
Christian Theune (ctheune) wrote :

Am Mittwoch, den 10.10.2007, 18:18 +0000 schrieb Jim Fulton:
> New objects are unsaved. During the commit process, the object
> becomes uptodate by virtue of getting _p_jar and _p_oid. This state
> change is somewhat implicit.

Looking at the code, I understand this: the 'unsaved' state is defined
implicitly and isn't reflected by a status of the '_p_state' attribute
(it is 0 at this time although I can't find the point where this data is
initialized). As soon as the object receives the _p_oid and _p_jar
attributes the '_p_state' (being 0) becomes relevant and therefore the
object is uptodate (which seems to be the same as 'saved').

> I guess we should set the state to
> changed adrer setting the initial _p_oid and _p_jar.

Ah, sounds reasonable to me.

My first guess was that this would be in cPersistence.c, but I couldn't
find a place where it would fit there (I don't know that code well
though).

The other place I can think of to do this would be serialize.py where it
sets the oid and connection (around line 325 on the trunk).

Christian

--
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle/saale - germany
www.gocept.com - <email address hidden> - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Revision history for this message
Jim Fulton (jim-zope) wrote :

On Oct 11, 2007, at 3:24 AM, Christian Theune wrote:
...
> My first guess was that this would be in cPersistence.c, but I
> couldn't
> find a place where it would fit there (I don't know that code well
> though).

It's way to complicated. Philosophically, the persistence framework
should be minimal and be primarily responsible to notifying data
managers of certain events.

> The other place I can think of to do this would be serialize.py
> where it
> sets the oid and connection (around line 325 on the trunk).

Right. As an aside, the ZODB implementation is overly complex
(overly abstract) IMO. The code in serialize should be moved back
to Connection where it belongs. Not now of course. :)

Jim

--
Jim Fulton
Zope Corporation

Revision history for this message
Christian Theune (ctheune) wrote :

Am Donnerstag, den 11.10.2007, 13:58 +0000 schrieb Jim Fulton:
> > The other place I can think of to do this would be serialize.py
> > where it
> > sets the oid and connection (around line 325 on the trunk).
>
> Right. As an aside, the ZODB implementation is overly complex
> (overly abstract) IMO. The code in serialize should be moved back
> to Connection where it belongs. Not now of course. :)

I tried doing that but stumbled over the fact that Python code can't
change _p_state. :/

I went back to cPersistence.c and played around a bit to try seeing
whether I find a place to stick it in there, but I didn't. (I tried
updating the state when the oid or the jar is set. It didn't work and I
don't think it's the right place anyway.)

I have two ideas how to go on:

a) allow Python code to set an object to CHANGED without triggering the
registration with the transaction

b) make the data manager that the object gets registered with more
robust to avoid duplicate registration. This seems simple when looking
at it, but while thinking about it it's not. Objects don't have to be
known to the data manager because they might be discovered while storing
another object implicitly (which is what happens in this case anyway).

Other suggestions?

Christian

--
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle/saale - germany
www.gocept.com - <email address hidden> - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Revision history for this message
Christian Theune (ctheune) wrote :

Am Freitag, den 12.10.2007, 10:15 +0000 schrieb Christian Theune:
> a) allow Python code to set an object to CHANGED without triggering the
> registration with the transaction

Update: I experimented with this option and it makes the pathological
case go away. I'm attaching the patch. It's not complete as a change to
cPersistence.c makes another test case fail.

--
gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle/saale - germany
www.gocept.com - <email address hidden> - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Revision history for this message
Christian Theune (ctheune) wrote :
Revision history for this message
Jim Fulton (jim-zope) wrote :

On Oct 12, 2007, at 6:15 AM, Christian Theune wrote:

> Am Donnerstag, den 11.10.2007, 13:58 +0000 schrieb Jim Fulton:
>>> The other place I can think of to do this would be serialize.py
>>> where it
>>> sets the oid and connection (around line 325 on the trunk).
>>
>> Right. As an aside, the ZODB implementation is overly complex
>> (overly abstract) IMO. The code in serialize should be moved back
>> to Connection where it belongs. Not now of course. :)
>
> I tried doing that but stumbled over the fact that Python code can't
> change _p_state. :/

Yes it can:

   ob._p_changed = True

...

> a) allow Python code to set an object to CHANGED without triggering
> the
> registration with the transaction

Current ZODB registers with the DM, not the transaction.

Since the DM is triggering this, it can simply remove the
registration after setting _p_changed = True,

Jim

--
Jim Fulton
Zope Corporation

Revision history for this message
Jim Fulton (jim-zope) wrote :

On Oct 12, 2007, at 7:02 AM, Christian Theune wrote:

>
> Am Freitag, den 12.10.2007, 10:15 +0000 schrieb Christian Theune:
>> a) allow Python code to set an object to CHANGED without
>> triggering the
>> registration with the transaction
>
> Update: I experimented with this option and it makes the pathological
> case go away. I'm attaching the patch. It's not complete as a
> change to
> cPersistence.c makes another test case fail.

I'm going to veto this change. cPersistence is too risky a place to
change.

See my other post.

Jim

--
Jim Fulton
Zope Corporation

Revision history for this message
Christian Theune (ctheune) wrote :

I have another fix on my machine. It requires me to modify the Connection._registeredobjects list from the serializer. Is this clean enough? I have the feeling that we break the law of demeter. :(

Note that the patch is incomplete because a Stub-Connection object breaks due to the access to internal variables.

Revision history for this message
Christian Theune (ctheune) wrote :

Note: Another plan would be to allow the ZEO client cache to invalidate current data. I don't see whether this would pull in other problems but it might be the most straight forward and clean way with the current set of APIs.

Revision history for this message
Christian Theune (ctheune) wrote :

Another note: making the ZEO client cache invalidate current data fixes the problem and other tests pass.

Revision history for this message
Jim Fulton (jim-zope) wrote :

On Nov 11, 2007, at 7:45 AM, Christian Theune wrote:

> Another note: making the ZEO client cache invalidate current data
> fixes
> the problem and other tests pass.

After looking at this a bit, I think this is probably the best
solution. There are, theoretically, other situations in which an
object can be "committed" more than once in the same transaction. One
can argue about whether this should be the case, but it has been this
way for some time and it's no where near the top of my priority list
to deal with. I think that making ZEO more tolerant is the best
solution at this time.

Jim

--
Jim Fulton
Zope Corporation

Revision history for this message
Christian Theune (ctheune) wrote :

Applied my suggested change that was preferred by Jim. (r82095)

Revision history for this message
Christian Theune (ctheune) wrote :

Backported fix from trunk (3.9)

Revision history for this message
Christian Theune (ctheune) wrote :

Reassigning as was on the duplicate.

Revision history for this message
Christian Theune (ctheune) wrote :

Released in ZODB 3.8.0c1

Revision history for this message
Wolfgang Schnerring (wosc) wrote :

has long been released

Revision history for this message
Tres Seaver (tseaver) wrote :

I don't know who 'MR_JOC' is, or why this long-closed bug should be
newly-assigned to him. His LP homepage[1] indicates that he has newly
joined, and provides no contact info.

[1] https://launchpad.net/~joc-jpg

Lukas Buda (buda-lukas)
summary: - ZEO versus creative __getstate__
+ ZEO versus creative _getstate
Revision history for this message
Martijn Pieters (mjpieters) wrote :

No idea who Lukas Buda is or why there is a need to change the title of this long-closed bug. I've reverted the change.

summary: - ZEO versus creative _getstate
+ ZEO versus creative __getstate__
Revision history for this message
Albertas Agejevas (alga) wrote :

Launchpad karma might be the motive. :-/

Moris (nerimoris)
Changed in zope2:
assignee: Rocky Burt (rocky-burt) → Moris (nerimoris)
Tres Seaver (tseaver)
Changed in zope2:
assignee: Moris (nerimoris) → nobody
api.ng (hektve)
Changed in zope2:
assignee: nobody → HECTOR DAVID (hektve)
Tres Seaver (tseaver)
Changed in zope2:
assignee: HECTOR DAVID (hektve) → nobody
Changed in zope2:
assignee: nobody → อัจฉรา สันทวี (atchaball04)
Changed in zope2:
assignee: อัจฉรา สันทวี (atchaball04) → nobody
Changed in zope2:
assignee: nobody → Kadir Selçuk (turkdevops)
Tres Seaver (tseaver)
Changed in zope2:
assignee: Kadir Selçuk (turkdevops) → nobody
ali (alli1998)
information type: Public → Public Security
Tres Seaver (tseaver)
information type: Public Security → Public
Changed in zope2:
assignee: nobody → Kadir Selçuk (turkdevops)
Tres Seaver (tseaver)
Changed in zope2:
assignee: Kadir Selçuk (turkdevops) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.