Gaming the Amazon Spot Market

Amazon Web Services announced a couple of days ago that they would be auctioning off excess compute capacity in their cloud. This is a huge deal for a bunch of reasons … but I’m really interested in what’s in it for entrepreneurs like me.
Here’s the thing: the prices so far are bargain basement: a small EC2 instance that typically costs $.10 per hour fluctuates between $.026 and $.036 an hour. The price you pay is whatever the current spot rate is … even if your bid is higher than that.
The use case Amazon suggests is bidding for this unused capacity to handle batch processing that is not time-critical. But since the price so far has never risen anywhere close to the “rack rate” of .10$/hr, placing a high bid (say for $.12/hr) should allow you to have access to amazon compute resources at drastically reduced rates (currently between 25% and 35% of current rates).
Of course, my perpetual high bid will help keep the prices from dropping too low on the spot market, and I’m sure Amazon appreciates that. From the perspective of the spot market, I’m a sucker. It’s only from the perspective of someone who needs compute resources on a constant basis that I’m getting a good deal. But a 50%-75% discount on my servers in exchange for a simple change is nothing to sneeze at!
One caveat: I’m assuming that prices on the spot market will never rise much above the “rack rate” for servers. This seems like a good assumption, since if they ever do users can always instantiate those servers instead of the ones on the spot market. But these are early days for computing as a real-time commodity, and no one really knows what this market will be like.
I’ll keep you updated as we at SlideShare experiment with this exciting new pricing model … we’re going to switch half of our conversion servers over to the spot market first, and see what happens over the next several weeks.

When S3 goes down, the internet goes down!

We’re now in hour 4 of an S3 outage that is effecting the entire startup ecosystem. SlideShare is down, as is MuxTape, SmugMug, and almost every other site you can think of. Fingers crossed that they resolve this soon! Here’s the current status from AWS:

9:05 AM PDT We are currently experiencing elevated error rates with S3. We are investigating.
9:26 AM PDT We’re investigating an issue affecting requests. We’ll continue to post updates here.
9:48 AM PDT Just wanted to provide an update that we are currently pursuing several paths of corrective action.
10:12 AM PDT We are continuing to pursue corrective action.
10:32 AM PDT A quick update that we believe this is an issue with the communication between several Amazon S3 internal components. We do not have an ETA at this time but will continue to keep you updated.
11:01 AM PDT We’re currently in the process of testing a potential solution.
11:22 AM PDT Testing is still in progress. We’re working very hard to restore service to our customers.
11:45 AM PDT We are still in the process of testing a series of configuration changes aimed at bringing the service back online.
12:05 PM PDT We have now restored communication between a small subset of hosts. We are working on restoring internal communication across the rest of the fleet. Once communication is fully restored, then we will work to restore request processing.
12:25 PM PDT We have restored communication between additional hosts and are continuing this work across the rest of the fleet. Thank you for your continued patience.

Simple DB: the final piece of the puzzle falls into place

Amazon just announced “SimpleDB“, which sounds a lot like the rumored “SDS” or “Simple Database Service” that we’ve all been waiting for.
This is huge: the single biggest thing stopping you from running a webapp on EC2 is the fact that there’s nowhere safe for your database to live. EC2 is a virtual hosting service, so if a machine crashes and is rebooted, any data written to the hard drive simply disappears. Not good. As a result, EC2 was framed as a great solution for back-end processing (think transcoding videos for youtube), but not a great fit for an entire web application.
Solutions (including backing up your database continually to S3), for this problem never were very convincing. But it was always clear that SOME major initiative that would solve this problem was planned.
Now we know. This isn’t a vanilla mysql clustering service: it’s something a little weirder (it’s conceptually similar to a database, but lacks many of the features of a database, and works somewhat differently). As a result, you’ll have to build your app from the ground up as an Amazon app: this isn’t a drop-in replacement for mysql cluster.
But the benefits are potentially huge. Imagine you’re building a facebook application. You could use SimpleDB, EC2, and S3 to provide the backend, and pay very little in infrastructure costs until you actually started getting real traction. Your system would transparently scale (simply add more EC2 nodes as web/app servers as your server load increases), and you would never, ever have to worry about the huge P.I.T.A. (pain in the ass) that is setting up a database cluster, designing schemas for federating data across multiple databases, etc.
There’s never been a better time to be a software entrepreneur. Amazon has once again lowered the upfront cost of starting up a new web business, and at the same time dramatically increased the number of use cases that their other services can be used for.
Coverage from techcrunch, and gigaom here. Marcelo Calbucci frames the services as a “directory service rather than a database service“.

Using a CDN with S3

I’ve started shopping for a content delivery network for SlideShare. It’s a market with pretty opaque pricing: if you’re making the jump to using a CDN for the first time it’s not easy to get a real sense of what monthly costs will be.
Conceptually, integration between a CDN and Amazon S3 is pretty straightforward. Here’s the basic steps:
1) Dedicate a subdomain (say to serving up all the content you want to serve via the delivery network.
2) Make a cname entry in your DNS to tell traffic going to that subdomain to go to your CDN instead
3) Tell the CDN which bucket on amazon S3 you’re saving your static content on.
The CDN receives the request for content at a geographically local server (so Europeans hit a node in Europe, Asians hit a node in Asia, etc). The node will first look in it’s own (in-memory) cache. If it doesn’t find the content that is requested, it will fetch it from S3 and save it so that will have it cached for next time. How long they cache it for is typically configurable, and APIs are typically provided that allow you to flush the cache.
In my investigations so far the following companies have turned up as potential vendors:
Akamai (the biggest company in the space)
Limelight is another big contender (famously used by Youtube and other Web 2.0 video companies)
Panther express is smaller contender. I’ve had the most conversations with these guys.
Level 3 is interesting in that they’ve recently announced that they’ll be selling CDN bandwidth at normal bandwidth rates. I haven’t talked to them yet, but I probably should. ;->
If anyone has any other recommendations for vendors I should check out, feel free to reply on this post! Frankly I really wish Amazon would just provide this as a service on top of S3: that way we wouldn’t have to change any of our code at all! Unfortunately, it doesn’t seem like this going to happen in the near future.

Rabble’s ActiveRecord talk at SVRC Rocked!

Rabble’s presentation on ActiveRecord at the Silicon Valley Ruby Conference was the clearest and most coherent explanation of ActiveRecord I’ve seen to date.
Check it out!

Rabble was previously lead developer at odeo, and is now part of the Yahoo skunkworks team (err.. “semi-autonomous business unit”) called Yahoo BrickHouse.
[Update: more cool ruby presentations have been archived by rubyinside. Sweet!]

Using virtualization to automate deployment: is it a good idea or not?

As the number of servers needed to run slideshare increases, we are spending more and more of our time simply deploying our software. Each new box has to have a lot of software installed, configured, and tested before it can be hooked up. Scripting common tasks makes things go faster, but doesn’t resolve the fundamental problem, which is that there’s never any way to prove that Server A has the exact same configuration as Server B. This makes troubleshooting tricky, obviously.
One path we’re starting to consider is virtualization. I haven’t heard of this as a common use for virtualization. Typically, people seem to use software like Xen or VMWare to run multiple virtual servers on one physical server, so they can get more use out of existing hardware. We don’t have that problem: all our boxes are in the red! But we would like to be able to roll out new servers reliably, at the push of a button, the way you can make a new instance of an image on Amazon EC2 just by typing a command into your command line.
The way I look at it, the configuration of a machine is valuable intellectual property, and it needs to be captured so that it can be reproduced whenever we need it. Of course there’s a performance penalty: something like 5/10% of CPU will be consumed by the virtualization software, meaning that overall we’ll need more boxes than we would otherwise. But we’ll be able to set up or rebuild boxes faster, and right now that seems more important to me.
Thoughts? Is this a good idea or not? Has anyone used virtualization in this way? Any recommendations on which software to try first? As always, reply in the comments field below.
Also: a special bonus slideshow on virtualization for your reading pleasure!