POSKeyError During Commit¶
A terrifying tale of ZODB errors that only showed during the hardest time to debug.
Believe it or not, I’m a POSKeyError virgin, I’ve never run into one… Until now!
I was working on a very small Plone 3.1.1 to 4.2.2 upgrade. Not too many add-ons, simple skin/theme customizations, 1,810 objects in ~800MB. No problem, right?
Wrong. I got a POSKeyError
after the upgrade step that converts
files and images to BLOBs. The wierd part is that it only happened
during the transaction.commit(), not during ZODB object access:
File "/opt/src/eggs/Zope2-2.13.18-py2.7.egg/Zope2/App/startup.py", line 301, in commit
transaction.commit()
File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_manager.py", line 89, in commit
return self.get().commit()
File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_transaction.py", line 329, in commit
self._commitResources()
File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_transaction.py", line 443, in _commitResources
rm.commit(self)
File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 572, in commit
oid, serial, transaction)
File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/BaseStorage.py", line 416, in checkCurrentSerialInTransaction
committed_tid = self.getTid(oid)
File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/FileStorage/FileStorage.py", line 770, in getTid
pos = self._lookup_pos(oid)
File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/FileStorage/FileStorage.py", line 403, in _lookup_pos
raise POSKeyError(oid)
POSKeyError: 0x055350
I tried all the tools in ZODB.scripts and zc.zodbdgc but they
couldn’t tell me anything about the oid
in question. That’s
probably because it was a different oid
every time I attempted the
upgrade step on the same ZODB.
Without being able to use the typical tools to find out what ZODB
objects were referring to those oids
I was really stuck. Then David
Glick came to my rescue. Actually, he was staying at my apartment
so I drafted him. :-) He suggested that I might have to create a
mapping of oids
to stack frames at the time when objects were added
to the transaction so that I could figure out what had been going on
once I knew which oid
would be the problem. This worked very well
and much thanks to David.
I added the following to the place where the objects were being added
to the transaction, ZODB.Connection.Connection.readCurrent()
:
if not hasattr(self, 'XXX'):
self.XXX = {}
self.XXX[ob._p_oid] = sys._getframe()
Then once I had my POSKeyError
in pdb.post_mortem()
I could invoke
pdb
at the place where the offending oid
had been added:
import pdb; pdbr = pdb.Pdb() ; pdbr.reset() ; pdbr.interaction(self.XXX[oid], None)
Using that I was able to find out what objects were involved. They
were all portal_catalog/path
index Trees.IIBTree.IITreeSet
objects. What’s odd is that the error happens even if I do a catalog
clear and rebuild first.
After much more debugging and discussion, we determined that the
ZODB.Connection.Connection.readCurrent() support was involved
somehow. Also, as I deleted the images that were involved I noticed
another migration error earlier on went away for each image I
removed. That’s the key, when the BLOB migration fails for one file
or image, it uses a savepoint to rollback just that image’s
migration and proceed with the rest. It sure seems like there’s a bug
in the ZODB where when a savepoint is rolled back the objects
registered with the transaction are cleaned up but the objects
registered for readCurrent
are not. Gee, someone should write a
test case. :-)
In the meantime, it turns out the reason the image migrations were failing was something else all together but I thought I should share this debugging technique for capturing stack information when you can’t know which stack you need to inspect until later. I also hope this ZODB bug gets squashed.
Updated on 21 November 2012
Imported from Plone on Mar 15, 2021. The date for this update is the last modified date in Plone.
Comments
comments powered by Disqus