Rails being single-threaded causes scalability problems

So anyone who works with Rails even a little bit knows that it is single-threaded. A given mongrel can only process one http request at a time. The solution is to run a large number (or “pack”) of mongrels, and load-balance your incoming requests across the pack of mongrels.
This works fine (it’s what every even moderately big rails site does), but it has one serious problem. If one of your pages are slow, then requests will “back up” behind the mongrel that is processing the slow request. You’d think you can solve the problem using load balancing (we use the NGINX fair plugin in order to get the smartest load balancing possible). But there’s a point where better load balancing simply won’t solve your problems. For example, let’s say my average request takes 250ms to respond. If I’m serving a request that is going to take a whopping 5 seconds to respond, there’s no way for the load balancer to know that something is fishy for at least 250ms. This guarantees that a certain number of requests are going to “back up” behind the slow request.
It’s all well and good to say you should profile your code and fix pages that are slow. But for a big web app like SlideShare, there are LOTS of pages. We work hard on making the important pages fast, but we don’t necessarily always have the time to profile every page. And if we do happen to have some slow pages, those pages don’t just respond slowly: they cause OTHER pages (the ones that are “backed up” behind the slow request) to respond slowly as well. So even an OCCASIONAL slow page will cause a reasonably large number of slow responses.
The solution that we’re currently working on is to route our “important” pages to different mongrels than our less important / slower pages. This should make it so that a slow page doesn’t cause other (fast) pages to slow down as well. But this is extra complexity in our system, and is a bit tricky to do for pages that don’t have an easy-to-recognize url “signature”.
I’m a bit surprised I don’t hear more chatter about this problem: it seems to me like the single biggest bottleneck to building a huge rails site. What are other big Rails sites doing to deal with this problem? Is there an easy solution that I’ve missed? As always, feel free to post suggestions in the comments!

4 thoughts on “Rails being single-threaded causes scalability problems

  1. Ikai Lan August 4, 2008 / 1:09 pm

    Hi Jon,
    I completely agree that this is a huge problem. LinkedIn owns several extremely high traffic web properties that are written in Rails. I won’t go as far as to say we’ve solved the problem, but we’ve found ways to get around the problem:
    1. Asynchronous processing, using MemCache as IPC for long operations to fake out synchronicity. During peak hours, it can take up to a second to do a single row insert. This is unacceptable, since a Mongrel process essentially blocks the entire second and requests queue up. We couldn’t make the inserts completely asynchronous, since we need to *show* the user the results of the insert. There are two cases for this and two solutions:
    a. After the insert operation, we need to show the User the last inserted row. We do two things: a synchronous write to MemCache (always cheap), and an asynchronous write to the permanent data store (MySQL). In the page following the insert operation, we reference the values inside MemCache.
    b. After the insert operation, a User needs to see all their persistent objects and be able to interact with them as if they were in the database. This is trickier. In the places where we do this, we do a pass-through write into MemCache and the work queue. Both writes are fast, with the persistent write taking the longest (handled by a background job). The front end interacts only with MemCache. This turned out to be a lot easier than I anticipated, with the only problem being that Ruby Marshaling takes a lot of space and we quickly exhausted our available memory; we fixed this by creating our own serialization and deserialization method.
    2. For a long read operation requiring lots of waiting, such as external API calls or a huge database read, the front end web servers only act as a dispatcher. That is, they create a new background job. Again, MemCache acts as IPC. The front end requires users to have JavaScript and polls the application. The background job does the read and populates MemCache when done. In practice, this is something like this:
    – User requests big chunk of data
    – A job is created
    – Background worker works on job
    – Client polls the Front End once per N seconds. Are you done? Are you done?
    – Front end checks MemCache. Is there a value? Is there a value?
    – When the Front End finds a value, it returns it to the User.
    There’s no single solution to this problem, and I have complained that it forces developers to create solutions to scale much earlier than developers in PHP or Mod_Python land. What I’ve found, however, is that almost all big web sites, no matter what interpreter they are running, eventually have to use similar solutions. It’s very obvious that Rails hits this ceiling first. That being said, using queues and background processing scales very well, almost infinitely. Queues too full? Create more workers. Queues too slow? Create more queues.
    In the end, it’s all about performance tuning. If the bulk of your application is high latency, low computation calls, this will kill you (the only applications I can think of that have this requirement are web based IM providers like Meebo or Facebook chat that make use of AJAX long polling; they have solved the problem with custom LightTPD compiles and Erlang since even Apache can’t handle the concurrency requirement). If, like most applications, most of your calls are cheap, with a few very expensive operations causing the cheap calls to back up, then the solutions I’ve described are more than adequate.

  2. Lori MacVittie August 4, 2008 / 5:02 pm

    “You’d think you can solve the problem using load balancing (we use the NGINX fair plugin in order to get the smartest load balancing possible). But there’s a point where better load balancing simply won’t solve your problems. For example, let’s say my average request takes 250ms to respond. If I’m serving a request that is going to take a whopping 5 seconds to respond, there’s no way for the load balancer to know that something is fishy for at least 250ms. This guarantees that a certain number of requests are going to “back up” behind the slow request.”
    You might be interested in Joyent’s scaling of RoR. Perhaps most load balancers aren’t capable of helping out, but an intelligent load balancer can and has helped scale RoR.
    This is a blog post with a link to an interview with the folks who’ve done it successfully.
    Scaling RoR
    Best of luck!
    Lori

  3. Gaurav August 15, 2008 / 3:25 pm

    How about NGINX (or something like nagios) checks load on each app server periodically and add that to intelligence for load balancing? Send a lot less pages to the heavily loaded mongreal server and perhaps take it out of the cluster if the problem persists for longer period of time.
    Disclaimer: I am not an RoR person. All I have done is watched that ’20 minute blog’ demo.

Comments are closed.