Slideshare.net has been online for almost a month, and we’ve learned a lot about running a large-scale social sharing website in that time. The first few days were frantic, but we’re settling back into a rhythm now. This series of posts will try to capture some of the stuff we learned through this experience.
1) The first week will be brutal
The day slideshare launched, I was woken up at 4 AM by my crew. “Techcrunch has blogged us! OMG OMG OMG!” We had hoped to push out a final revision of the code before launching, but that was not to be. We monitored the system as 100s of users started logging on and uploading powerpoints. Things of course started breaking almost immediately, and we started loading hotfixes onto the system from 9AM onwards.
After the first day, the web app held up pretty well, but other sections of the code were still failing at inopportune times throughout the first week. We stopped working regular hours and started sleeping when we could (whatever the time, day or night) and leaping into action when there was a crisis. It was fun but exhausting, and if I were to do it again I would have made sure that everyone was well-rested going into that first week.
2) Instrumented code (and an interface to look at it) help you survive
One thing that really helped us survive the initial burst of traffic was the fact that our code logs lots of system activity, and we invested up front in writing software that displays that data. We also send out emails to the team whenever there’s an error on the server. So it was easy for us to see when there was a problem, edit data, and track trends over time. We call the “shadow app” we use to monitor SlideShare “SlideShadow”.
Having good instrumentation (and interfaces that render that data) also help us understand user behavior better. We’d love to do even more of this, but we are hampered by our unwillingness to run non-critical third-party code on the web client. I would love to run google analytics and crazyegg, but it’s even more important to me that the user interface be as fast as possible.
Coming next time: why your users are better than any QA department.