I recently saw several instances of an interesting proto-meme from a couple years ago: can you find something nice to say about every programming language you’ve used? Here’s my try (limited mostly to languages I’ve used at work or for extended periods), organized chronologically:
Read the rest of this entry »
-
Saying something nice about programming languages
TweetI recently saw several instances of an interesting proto-meme from a couple years ago: can you find something nice to say about every programming language you’ve used? Here’s my try…
-
Oversee.net is co-sponsoring SCaLE 10x
Tweet Just a quick note to point out (boast?) that Oversee.net has signed up to co-sponsor SCALE 10x. We’re hoping to talk to lots of interesting Linux folks, not least…
-
Naming the parts of a URL
Tantek Çelik has an interesting summary of how different programming languages and libraries break down the parts of a URL.
Oversee.net is co-sponsoring SCaLE 10x
Just a quick note to point out (boast?) that Oversee.net has signed up to co-sponsor SCALE 10x. We’re hoping to talk to lots of interesting Linux folks, not least because we’re currently looking to hire several software developers and sysadmins.
Naming the parts of a URL
Tantek Çelik has an interesting summary of how different programming languages and libraries break down the parts of a URL. He’d left Perl off his list, so I pointed him to the URI module on CPAN, which he’s added to his table.
One inter-language discrepancy Tantek found has resulted in the silliest bug-report debate I’ve seen in a while. It’s remarkable how upset people seem to get when arguing over whether the “protocol” part of the URL should include the colon or not!
Making Things Easy Is Not an Add On
I am a busy guy. This means I’m not the type of person who likes to spend lots of time eating lunch. The quicker and easier it is for me to grab lunch the happier I am. For lunch today, I decided to get a sandwich. I discovered that the sandwich shop downstairs allows me to call in my order ahead of time so I can just walk up to the cashier and pay. There is a special, shorter line for call-in orders. Further, if paying by credit card, it is one simple swipe and less than seven seconds later you have your receipt and you are out the door. It is a true model of efficiency.
I am much more likely to be a customer of this establishment in the future because they have made it exceedingly simple for me to get what I want. I didn’t waste time in line behind people who don’t know what they want nor did I have to wait for something as simple as handing over my money. This simple yet important principle seems to be lost on so many of us in technology. As engineers, we are used to dealing with the complex. We don’t mind jumping through hoops to configure web servers, navigate deep directory structures, or manage the dozens of windows on our desktop. We seem to forget that not everyone has the same love affair we have with computers.
We tend to think of usability as an afterthought. We figure that if we got you your sandwich that should be good enough. Are you still hungry? No? Great, requirements met. It did not matter how we got you there it only mattered that you got there. This started to change in large part thanks to Apple and Steve Jobs. The introduction of the iPod and then the iPhone changed the way people talked about creating products. When I worked at Microsoft, I cannot tell you the number of times the words “iPod” and “iPhone” were said in discussions about design. It became a running joke in my team and others that no matter what we talked about, we eventually had to tie it back to how great the iPod experience was. Before this, engineers used to talk about cramming more and more features into a product and did not worry about people were going to access those features.
Ease of use should be a basic product principle, not an afterthought bolted on after all the functionality is complete. If you are working on a feature, and there is not a clear and intuitive way for your users to access it, you should honestly question whether your user will derive any value from it at all. Even worse, having even one hard to get at feature can remove the whole usefulness of all the other hard work you put into the product. To use my analogy further, imagine what I would have thought if I got all the way to the payment portion of my sandwich transaction only to realize I had to wait five more minutes to pay. The “payment” feature needed to work in concert with “ordering”, “taste”, and the rest to ensure a great overall experience.
We are living in an ever more complex world. While it is very hard to mask this complexity from your user, it will be an ever increasing differentiator between those who succeed and those who do not. Not treating ease of use as a first class citizen when it comes to design, requirements, and planning is just handing over victory to your competitors.
Was Steve Yegge Right?
I spent the first three years of my software career as basically just a guy that wrote code. I came into work, I had some project I was supposed to work on, and so I wrote some code for it. Usually this code worked and people were happy and sometimes it even made the company money, so I considered myself a good programmer. And then I found Steve Yegge’s essays which made me reconsider all of that, and to this point just reading those essays has probably been the most formative experience in my career so far.
Steve was the first person I knew of that spoke about software engineering as a discipline that demanded self-improvement. He wrote about dozens of subjects, but the one huge message I got across from reading all of them was, “there is more to this craft than meeting requirements at your day job and learning the syntax of the single programming language you use there.” Some of it was a little intimidating, especially when Steve gave examples of suboptimal behavior that I was directly exhibiting (or when he was bashing on Perl), but as long as you focused on Steve’s overall point and didn’t just argue his potentially imperfect examples in your head, you would realize he always had something great to say.
So I’ve made it a habit to reread his essays from time to time, and the other day I checked out ‘Ten Predictions.’ It’s on his old Amazon blog (as opposed to his personal blog, which sadly he doesn’t update as often as he used to), and I remember first reading this in 2007 and thinking, “hmm, I wonder if he’ll be right.” Well when I reread it, I saw a lot of references to things happening or not happening in 2010 and 2011, and since that’s, well, now, I don’t have to wonder if he’s going to be right — we can actually tell if he’s right or not.
So, let’s see what Steve had to say. Keep in mind the original essay was written in early 2004 (I believe).
Prediction #1: XML databases will surpass relational databases in popularity by 2011.
Reason for prediction: Nobody likes to do O/R mapping; everyone just wants a solution.
XML databases like BaseX never caught on, so while Steve was wrong about that, he was right that people did want solutions to problems that didn’t fit nicely into a relational database. The past five years have seen an explosion of “NoSQL” solutions for structured and unstructured data, named such because they don’t require fixed table schemas and easily scale horizontally.
Will these solutions eventually become more popular than relational databases? Well, I think that question is misleading because I don’t think relational and non-relational databases fight over ‘market share.’ We’ve seen a gradual evolution of both sides where relational database solutions are becoming more horizontally scalable and getting better at storing unstructured data, and NoSQL solutions are getting better at providing the same kind of query languages and developer tools available to relational databases. So the idea of one surpassing the other in popularity is kind of a false dichotomy, as their designations will only get less and less binary as time goes on.
DomainSponsor’s distributed server architecture
DomainSponsor is Oversee’s domain parking division. When people have domain names that they’re not ready to use yet, we show ads on them. DomainSponsor gets around a billion visits per month.
To serve ads on millions of pages per day, we used to use a fairly traditional LAMP stack. We used Apache with mod_perl to run a fairly complicated tangle of Perl code which queried several back-end MySQL databases and other back-end servers, in order to choose what to put on each page. One of the biggest problems we had was that we wanted to be able to easily add new (possibly buggy) features for A/B testing, knowing that many of those features would fail and be discarded. The old way we did this was to have a separate cluster of servers running the test version of the code. The trouble was that the test cluster was never 100% identical to the main production cluster, and the redirect needed to send a sample of requests to the test cluster introduced a delay, so we had problems with test results not being reproducible when we later launched the same feature in the main production cluster. Read the rest of this entry »
Fighting Boredom
Software engineering is about solving problems. You get hired, start working, and solve problems. Time passes and you get better at solving these problems, so your company gives you harder problems in the same domain space. Eventually you get so good at solving these problems in that you become The Guy. “Oh you have a question about the FooWidget manager tool? Ask Joe, he’s the FooWidget guy.” By definition, being The Guy has mean you’ve reached a local maxima of productivity in the company.
It also means you’re bored. It’s not a case of possibly being bored, or eventually becoming bored. Once you are are no longer a problem solver, that means you’re bored, and if you’re bored at your current company for long enough, eventually you’ll find a new company to work for.
Companies talk a lot about retention, but rather than wring their hands about salaries and titles, they’d do well to look at their engineers and ask a simple question: “Who is bored, and what can we do about that?”
How To Scale Your Website – Duplicate Yourself
One of the first things I learned when I was becoming a programmer was to be lazy. If someone had already written a piece of code, I was taught not to try and reinvent the wheel and to liberally use that other piece of code (This is a lesson a lot of developers forget but is a topic for another day). This same principle should be applied as you attempt to scale your website. Learn to copy yourself.
In my last post, I described how it was important to look carefully at your data as you attempted to scale. While I talked about how to increase performance of you database server, I did not talk about how to really achieve web scale. To do that, you need to scale horizontally. You can achieve this in a few different ways. The most straightforward of those is to clone all of your services and data and then put those objects behind a load balancer. This is a technique we use extensively at Oversee in order to handle the large loads on our systems.
So what does this actually mean? Let’s look at the data side of things. One of the things I said in the last post was to really examine your data to see how “fresh” it had to be and how often it is updated. If your system is like ours, you have a lot more reads of your data than writes and it is OK for the data not to be completely up to date. For Oversee, the ratio of read to writes is on the order of over 1000:1. What this allows you to do is to replicate the data over many servers. Almost all major database vendors support data replication right out of the box. Most relational databases will support multiple slaves to a single master. Writes are committed to the master database and eventually get propagated to the read-only slave nodes. You can fine tune how often the slave nodes get updated to reduce the data latency but for applications like ours, this is not a primary concern.
After you have successfully setup database replication, you should put your servers behind a load balancer. All requests for data should go through the load balancer which will correctly route your request to the server that is best equipped to handle it. You should design your application so the servers behind the load balancer act as a black box. It should not matter which of the servers, master or slave, handles the request. When done this way, handling more load to your system is straight forward; you setup replication on another server, place it in the pool of servers behind the load balancer, and requests will automatically be spread across the additional resources. So long as you can perfectly replicate the servers behind the load balancer, this will scale to hundreds if not thousands of servers. This technique has the additional benefit that if any server fails behind the load balancer, the service it is providing will still be available assuming there is another server to take the request. So not only do you achieve scalability, you get high availability for free!
You can, and should, apply this technique to servers other than your database. We do this for our entire web-serving stack. Almost every server we run sits behind a load balancer. This includes the pages responsible for keyword generation, page configuration, and rendering. This requires that you think carefully about how best break up the services your web application provides. That is the topic for the next blog post.
How To Scale Your Website – Look at the Database
One of the things I’ve mentioned on this blog before is the scale we have to deal with here at Oversee. We deal with a lot of traffic spread across a lot of different domains. While the pages we serve up may seem relatively simple, there is a lot of configuration and optimization that ends up determining the exact page we show a user. This presents some unique problems that we have to consider as we determine what is the most efficient way for us to serve up these pages. The thing is, every website is unique. The constraints you will face when you try and scale your website will not be the same that somebody else will have. So where do you even begin?
The good news is that there are some basic principles you should always follow as you think about scaling your website. Through the next several posts, I will lay out some of the industry’s best practices when it comes to building large websites.
Almost any moderately complex website is going to have to access a database. This may be to process a transaction or to look up information about the user, but inevitably your web application is going to have to hit some sort of persistent data store. Because of the nature of a RDBMS, these lookups are going to be expensive and are most likely to cause you your initial bottleneck. So what do you do? I am first going to talk about the simplest ways to scale your data. But like many things in life, the simplest things to do are not always the best thing to do. The following techniques work in certain circumstances and I think it is important to understand all possible solutions to a problem.
The simplest answer is to buy a bigger, badder database server. When you run out of CPU or your disk access is not fast enough, you can simply buy a database server which has multiple cores or has really fast solid state drives. This is what people refer to as scaling vertically. This is also the least preferred way to do it. Going down this path does have its merit. It is generally the easiest and fastest thing to do at first because for most organizations buying hardware is a lot easier to trying to free up a developer’s time to scale the solution properly. If your website will never reach more than a few thousand visitors or you have the capital to buy really big expensive data solutions, this might even be the best path for you to go down. But if you are like the rest of us and want to scale your website for maximum growth for the least amount possible, this is not the solution for you.
The next step that one might take in thinking about how to get more throughput on the data side is to use caching. Now, I don’t think this really falls under the “scaling” category as much as it does the performance tweaking part of your website. However, caching will allow your database server to seemingly handle more request. I am not going to go into this technique in depth but I do think it is worth mentioning now as it will allow you to “cheaply” get more bang out of your database. To utilize this technique, your application must be such that there are many more reads of your data than writes. It must also be true that it is OK for your data to be somewhat stale. A cache allows your to store data in RAM for some set period of time. When an application needs to read data, it will first ask the cache to see if it has the data. If the data is in the cache and is up-to-date, the result is returned to the application and the database is never queried. This offloads the work from your slower database server. If your data does not change very often, or if you get lots of identical request to your database, caching can be very effective. Probably the best solution out there is also the cheapest. Memcached is fast, scalable, simple and it is open source. Using memcached will not only decrease the load on your database server but it will actually reduce the latency of your website since accessing RAM is much much faster than doing a database lookup. Memcached is used to run very large websites like Twitter, Craigslist, and Wikipedia so you know it scales for very large sites. It should be noted that to deploy memcached successfully, you will have to take some of the techniques we talk about next to scale it to multiple servers.
Neither scaling vertically our using caching is going to allow you to scale to the truly massive size I am sure you wish your website to grow to. To do that, you are going to have to scale horizontally. To give this topic justice, I am going to give it its own blog post next time.








