Werner Vogels, the CTO at Amazon, has a great post about the contentious idea of “eventual consistency” for the new SimpleDB service. The idea that a database could be inconsistent is a little disconcerting to a lot of people — after all, inconsistent means unpredictable, and that just doesn’t fly for us deterministic computer people. Right?

Well, “eventual consistency” isn’t entirely unpredictable. And, it has it’s benefits — especially when it means avoiding locking on highly concurrent read and write operations. That’s exactly what SimpleDB was designed to do. To quote Vogels:

“Inconsistency can be tolerated for two reasons: for improving read and write performance under highly concurrent conditions and for handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running.

“Whether or not inconsistencies are acceptable depends on the client application. A specific popular case is a website scenario in which we can have the notion of user-perceived consistency; the inconsistency window needs to be smaller than the time expected for the customer to return for the next page load. This allows for updates to propagate through the system, before the next read is expected.”

(from “Eventually Consistent“)

SimpleDB was intentionally designed to behave this way, which means it certainly wasn’t built to replace traditional ACID relational databases for all scenarios. If you think about how often you require immediate consistency in your web applications, you’ll likely find that a very significant portion of your database interactions don’t.

My biggest concern about SimpleDB isn’t consistency or relationships, it’s latency. SimpleDB queries from outside of the Amazon cloud won’t be fast enough to feed sites that require more than a couple of queries per page — unless those queries can be executed in parallel, which isn’t an easy option in single-threaded web environments (PHP, Rails, etc.).

I’m excited to see how it operates with parallel queries, though. If an application is built to make dozens of queries simultaneously, rather than sequentially, the performance could be excellent.

I have a little Java toolkit for querying web services in parallel, and I’m itching to unleash it on SimpleDB. All this hot air blowing isn’t worth much without real numbers, right? 🙂


Keith Haring Tribute

December 17, 2007

Here’s a little something different to start off the week!

Peat Wollaeger (yes, another Peat) posted a video tribute to Keith Haring, a great mural and graffiti artist from the ’80s who died in 1990 from an AIDS related disease. It’s a short video of a mural being built.

I love Peat’s stencil work, and it’s not just because we share a name. If you’re keen on seeing more, visit his site at www.stenSOUL.com. Amazon also carries a book titled Stencil Graffiti that has some great photos. Flickr has a fantastic collection of stencil art, and I also have a few shots of my own.

Happy Monday!

Amazon SimpleDB

December 14, 2007

Amazon will soon be releasing their SimpleDB service under a limited beta program.

I’m very excited about this. Persistent, high performance databases are a big missing piece in Amazon’s cloud computing initiative — EC2 doesn’t offer storage that persists across reboots, and S3 isn’t structured to provide the IO required by a database.

Conceptually, SimpleDB is very compelling. It’s designed for real time querying, has no hard limits on storage, and is metered based on storage and CPU time. It looks a lot like Amazon’s Dynamo technology … and it wouldn’t surprise me if they released the Dynamo paper to gauge interest in exposing such a service.

But, there are three big caveats.

It’s not SQL. This isn’t actually as big a deal as it seems, but I know there are going to be a lot of people who are bent out of shape on this one. Why isn’t it a big deal? Because …

It’s not relational. SimpleDB provides a big flat table, with arbitrary attributes per row. So, queries are all about filtering through data, and while they can have very complex rules, it doesn’t behave like the “normal” relational databases we’re accustomed to using.

Updates are “eventually consistent.” This means that if you immediately query for data you just pushed into SimpleDB, it may not show up. You have a guarantee that it will show up within a few seconds, but not immediately. Amazon calls this “eventual consistency.”

It may be a little scary for some folks who are most comfortable with the traditional model of building apps around a single relational database. On the other hand, it appears to be a great system for people who have built big websites, and are already comfortable dealing with lazy synchronization and custom data sources.

I’m looking forward to playing with it!

(Tip o’ the hat to @grigs)

Update: Here’s a great post that goes into a little more detail about the give and take of SimpleDB. Fun fact: it’s written in Erlang.

Update:  More stats.  Looks like their opening offering lets you create up to 100 “domains” containing up to 10 GB of data each.   That’s a good start.