S3/SimpleDB ZODB

by Ross Patterson last modified 2008-11-23T04:07:20+02:00
exploring Amazon S3/SimpleDB ZODB storage

As I understand it, when an EC2 instance stops running for whatever reason, the disk is lost.  Furthermore, I understand that an EC2 instance may occasionally be taken down or rebooted byAmazon as a part of their allocation process.  IOW, you can't count onthe disk as anything more than a cache that is local to the instance.

As such, there would need to be some other form of storage for data, such as the ZODB, that needs to persist.  One solution would be to have only ZEO clients in EC2, which incurs bandwidth costs for reads from and writes to the ZEO server which would have to be hosted with some other provider.  Another solution would be to have the ZEO server on EC2 network but keep the ZODB on another storage service such as rsync.net via sshfs, drbd, enbd, etc..  This also incursbandwidth costs.

For ZEO clusters deployed on EC2, a lot of bandwidth costs for writes out of the Amazon network could be saved if Amazon SimpleDB and/or Amazon S3 could be used as a storage for ZODB.  One approach would be to write a ZODB storage implementation for S3/SimpleDB.  Another approach would be to write a filesystem or block device layer that uses S3/SimpleDB.  I'm actually very excited about the latter as it would be more generally useful and have done a lot of thinking about it and have an implementation sketched out.  But it would be a kernel project which is a definite change of direction for me but that would be good too.

Yet another fun project to work on if I ever have any free time or if anyone wants to sponsor it.



I forgot to link to a thread I started on the Amazon forum about using S3 as a block device.  It's a lengthy discussion but it elucidates a lot of the underlying issues.