One of the things I’ve mentioned on this blog before is the scale we have to deal with here at Oversee. We deal with a lot of traffic spread across a lot of different domains. While the pages we serve up may seem relatively simple, there is a lot of configuration and optimization that ends up determining the exact page we show a user. This presents some unique problems that we have to consider as we determine what is the most efficient way for us to serve up these pages. The thing is, every website is unique. The constraints you will face when you try and scale your website will not be the same that somebody else will have. So where do you even begin?
The good news is that there are some basic principles you should always follow as you think about scaling your website. Through the next several posts, I will lay out some of the industry’s best practices when it comes to building large websites.
Almost any moderately complex website is going to have to access a database. This may be to process a transaction or to look up information about the user, but inevitably your web application is going to have to hit some sort of persistent data store. Because of the nature of a RDBMS, these lookups are going to be expensive and are most likely to cause you your initial bottleneck. So what do you do? I am first going to talk about the simplest ways to scale your data. But like many things in life, the simplest things to do are not always the best thing to do. The following techniques work in certain circumstances and I think it is important to understand all possible solutions to a problem.
The simplest answer is to buy a bigger, badder database server. When you run out of CPU or your disk access is not fast enough, you can simply buy a database server which has multiple cores or has really fast solid state drives. This is what people refer to as scaling vertically. This is also the least preferred way to do it. Going down this path does have its merit. It is generally the easiest and fastest thing to do at first because for most organizations buying hardware is a lot easier to trying to free up a developer’s time to scale the solution properly. If your website will never reach more than a few thousand visitors or you have the capital to buy really big expensive data solutions, this might even be the best path for you to go down. But if you are like the rest of us and want to scale your website for maximum growth for the least amount possible, this is not the solution for you.
The next step that one might take in thinking about how to get more throughput on the data side is to use caching. Now, I don’t think this really falls under the “scaling” category as much as it does the performance tweaking part of your website. However, caching will allow your database server to seemingly handle more request. I am not going to go into this technique in depth but I do think it is worth mentioning now as it will allow you to “cheaply” get more bang out of your database. To utilize this technique, your application must be such that there are many more reads of your data than writes. It must also be true that it is OK for your data to be somewhat stale. A cache allows your to store data in RAM for some set period of time. When an application needs to read data, it will first ask the cache to see if it has the data. If the data is in the cache and is up-to-date, the result is returned to the application and the database is never queried. This offloads the work from your slower database server. If your data does not change very often, or if you get lots of identical request to your database, caching can be very effective. Probably the best solution out there is also the cheapest. Memcached is fast, scalable, simple and it is open source. Using memcached will not only decrease the load on your database server but it will actually reduce the latency of your website since accessing RAM is much much faster than doing a database lookup. Memcached is used to run very large websites like Twitter, Craigslist, and Wikipedia so you know it scales for very large sites. It should be noted that to deploy memcached successfully, you will have to take some of the techniques we talk about next to scale it to multiple servers.
Neither scaling vertically our using caching is going to allow you to scale to the truly massive size I am sure you wish your website to grow to. To do that, you are going to have to scale horizontally. To give this topic justice, I am going to give it its own blog post next time.


