You are here: Home Blog S3/SimpleDB ZODB

S3/SimpleDB ZODB

by Ross Patterson last modified Nov 22, 2008 06:07 PM

exploring Amazon S3/SimpleDB ZODB storage

As I understand it, when an EC2 instance stops running for whatever reason, the disk is lost.  Furthermore, I understand that an EC2 instance may occasionally be taken down or rebooted by
Amazon as a part of their allocation process.  IOW, you can't count on
the disk as anything more than a cache that is local to the instance.

As such, there would need to be some other form of storage for data, such as the ZODB, that needs to persist.  One solution would be to have only ZEO clients in EC2, which incurs bandwidth costs for reads from and writes to the ZEO server which would have to be hosted with some other provider.  Another solution would be to have the ZEO server on EC2 network but keep the ZODB on another storage service such as rsync.net via sshfs, drbd, enbd, etc..  This also incurs
bandwidth costs.

For ZEO clusters deployed on EC2, a lot of bandwidth costs for writes out of the Amazon network could be saved if Amazon SimpleDB and/or Amazon S3 could be used as a storage for ZODB.  One approach would be to write a ZODB storage implementation for S3/SimpleDB.  Another approach would be to write a filesystem or block device layer that uses S3/SimpleDB.  I'm actually very excited about the latter as it would be more generally useful and have done a lot of thinking about it and have an implementation sketched out.  But it would be a kernel project which is a definite change of direction for me but that would be good too.

Yet another fun project to work on if I ever have any free time or if anyone wants to sponsor it.

Update:

 

I forgot to link to a thread I started on the Amazon forum about using S3 as a block device.  It's a lengthy discussion but it elucidates a lot of the underlying issues.

Related content
Document Actions

s3storage

Posted by Laurence Rowe at Jan 26, 2008 03:51 AM
Check out http://code.google.com/p/s3storage - I wrote it a year or so back to learn a bit more about zodb. The big problem with it was latency - it takes a while to get/put data to s3. The changes I would make to it now are:

  * one file per transaction rather than one file per object per transaction (should be able to rip the data structure for this from filestorage)

  * Simplify it so that all writes go through zeo, something needs to ensure atomicity.

  * Use a separate storage for the catalog. Once SimpleDB gets sorted results support it should be possible to replace the catalog with an implementation using SimpleDB.

If you want access to the repository to play around, send me an email.

Laurence

ZEO Raid

Posted by Ross Patterson at Jan 26, 2008 12:06 PM
I wonder if goxept's ZEO raid couldn't be used to address this issue. Maybe a ZEO server on an EC2 instance could answer ZEO reads first from a FileStorage instance and only use the S3Storage as a fallback for reads. Likewise writes could be quickly written to FileStorage and slowly written to S3Storage over time. There might be some painful asynchronous issues though.

s3 over fuse

Posted by http://witsch.myopenid.com/ at Jan 31, 2008 01:59 AM
http://code.google.com/p/s3fs/wiki/FuseOverAmazon might be an option as well, but latency issues might be harder to work around than with solution from within the zope universe... :)

JungleDisk

Posted by Ross Patterson at Feb 02, 2008 08:56 PM
JungleDisk seems to be maturing nicely, indlucing block level updates. Should be worth looking into:

http://jungledisk.com/

right, but...

Posted by http://witsch.myopenid.com/ at Feb 03, 2008 01:02 AM
afaik you cannot use it to mount s3 into your fs namespace — at least i couldn't find any info saying so on their site. plus it's not open source, either.

and then, what i actually find rather unappealing is that they have a "free download" link in _big_ letters (with the page this leads to not mentioning any license fees whatsoever), but in reality the much less prominent "pricing" link leads you to find "after the trial period, the software is just $20 to purchase"... bit of a strange marketing strategy imho :)

Yeah, but, but...

Posted by Ross Patterson at Feb 03, 2008 11:22 AM
Yeah their icky in those ways, but they seem to have good technology and to be going the directions I'd like to see solutions in. Also, see the link I added above to the thread on S3 block devices.

simpleDB

Posted by https://me.yahoo.com/a/0mMuMu1vieZUq5RyZuU_N6nUHnGi#49444 at Oct 08, 2008 02:10 PM
Has any work been done on this? I would be interested in trying to swap out the FileStorage for SimpleDBStorage or something.

simpleDB

Posted by Ross Patterson at Oct 09, 2008 10:55 AM
Actually, the original idea was much more about S3 than SimpleDB. Also I just noted that SimpleDB doesn't have python bindings. Too bad, cause with RelStorage, it might have been feasible to use SimpleDB as a ZODB storage. :(

At any rate, most people these days are using Amazon's persistent block storage in combination with its snapshot to S3 functionality to handle this.
Contact

me@rpatterson.net

IRC: zenwryly@irc.freenode.net
GTalk: merpattersonnet@gmail.com
Yahoo IM: patterson_ross
AIM: rosspatters
MSN: me@rpatterson.net
Skype: merpattersonnet

831-338-9197
Fax: 831-480-5894

PO Box 32
Boulder Creek, CA
95006