DomainSponsor’s distributed server architecture

DomainSponsor’s distributed server architecture

DomainSponsor is Oversee’s domain parking division. When people have domain names that they’re not ready to use yet, we show ads on them. DomainSponsor gets around a billion visits per month.

To serve ads on millions of pages per day, we used to use a fairly traditional LAMP stack. We used Apache with mod_perl to run a fairly complicated tangle of Perl code which queried several back-end MySQL databases and other back-end servers, in order to choose what to put on each page. One of the biggest problems we had was that we wanted to be able to easily add new (possibly buggy) features for A/B testing, knowing that many of those features would fail and be discarded. The old way we did this was to have a separate cluster of servers running the test version of the code. The trouble was that the test cluster was never 100% identical to the main production cluster, and the redirect needed to send a sample of requests to the test cluster introduced a delay, so we had problems with test results not being reproducible when we later launched the same feature in the main production cluster. We wanted to be able to isolate test features, while still running them in the same production environment that handled the majority of our traffic. In order to do this, while maintaining or improving our response time, we re-structured our serving system into a tiered, distributed system. There’s a front-end that just understands HTTP requests, but is fairly agnostic about what to do with them. After parsing the requests, it sends them on to a server which implements the main business logic, including the ability to run new features in a sandbox, such that their failure won’t affect the rest of the system. This business logic server queries a variety of back-end data sources, and sends the resulting information to an HTML layer, which does the presentation formating of the information in HTML/CSS/JavaScript, and passes the resulting page back to the HTTP server.

We needed an efficient, but extensible, protocol for these servers to communicate with each other. At previous jobs I’d worked on other projects where we rolled our own binary protocol for similar distributed systems, and I’d seen the headaches involved in upgrading the protocol version, even just to change the format of one field. We looked around, and found several fairly similar libraries for extensible binary protocols. The main open source competitors at that time (mid-2008) were Protocol Buffers and Thrift. Protocol Buffers (or “protobufs”) had just been released by Google, after several years of internal use. Thrift had been created by some ex-Google interns working at Facebook, and so the two are fairly similar. Both produced very compact on-the-wire encodings, though protobufs, with its variable-length numeric fields, had a slight edge. Both were designed to allow easy extensibility; new optional fields could be added without breaking existing code (which would just ignore the new fields).

We chose protobufs for several additional reasons, not that overwhelming individually, but combined they made protobufs a better choice for us. First, the documentation of protobufs seemed more complete and thorough. Second, their Perl support, while not part of the core project, was better suited to our plans to write non-blocking event-driven clients and servers. Thrift’s Perl support was very complete on the client side, but there was no server-side implementation, and the client code did its own blocking network I/O, which would be hard to integrate with an event-driven framework. There are actually at least 3 Perl bindings for protobufs. protobuf-perl was the first pure-Perl implementation. We initially tried protobuf-perl, but it had some serious performance issues, partly due to teething issues with Moose, which was still relatively new and unoptimized at that point. At that point, protobuf-perlxs was released, which puts a Perl wrapper around the C++ code generated by Google’s stock protocol buffers compiler. It has the disadvantage that it requires compiling and linking to C++ code, which makes it slightly harder to port and install on unusual architectures, but its speed advantage over pure-Perl implementations more than makes up for that. Google::ProtocolBuffers is a newer pure-Perl implementation that includes its own parser for the .proto interface description file format, and is the only implementation in CPAN; it was released after we’d finished our evaluation of the other two implementations. It supports creating message definitions at run time (as opposed to build time), which could provide some interesting flexibility, but for our purposes protobuf-perlxs’s speed is more important.

We wanted our distributed server software to be as efficient as possible. At the time, the old system was using Apache in a pre-forking multi-process mode. In this mode, Apache forks dozens of child processes when it starts up, and hands requests to those children to be processed. At any given time, a child process is handling only one request; therefor, the server as a whole can only handle as many simultaneous requests as there are child processes. Processes are relatively expensive, you probably don’t want the OS to have to time slice among many thousands of them. Event-driven programming is an alternative model; it gives greater efficiency, but more care must be taken to avoid calling code that blocks, typically to do I/O. In an event-driven model (see the Reactor pattern for more details), the framework controls the main loop, and all application programming is done in callback functions. Python’s Twisted framework, Ruby’s EventMachine, and JavaScript’s Node.js are popular event-driven frameworks in other languages. Given that we were designing a new architecture, working with several pieces of new technology, and had a team already familiar with Perl, we decided to stay with that proven, robust, flexible language.

When it comes to just about any need, the Comprehensive Perl Archive Network (CPAN) provides an embarrassment of riches. There are several popular event-driven frameworks for Perl. The largest and most mature is POE; a more recent competitor is AnyEvent. The main author of POE, Rocco Caputo, has been working on a cleaner re-implementation of POE using Moose, called Reflex. There are also some smaller event-driven libraries that aren’t as all-encompassing: Event::Lib, Event, EV, and IO::Async. For our purposes, at the time we made the decision, POE provided the most useful components for things we needed, such as asynchronous DNS queries, fully-featured HTTP request processing (including keepalive, HTTPS, etc.), and the ability to easily support our own protocol.

Today, as a result, we have a large distributed system handling a huge number of requests across an easily-scalable cluster of machines, it’s relatively easy to write new servers using our POE-based framework. We can also add new fields to the messages we send between the tiers, without having to worry about compatibility issues when old and new versions of the code talk to each other.

Speak Your Mind

*