boo-box system’s web server infrastructure
During the last couple of years, a series of architectural patterns for web design software have consolidated and become popular through frameworks , which facilitate the development and maintenance of these systems. Simultaneously, servers had increased accessibility and reduced costs. Creating a web project became a far easier, faster and cheaper thing to do. However, for the project to succeed, there is still one obstacle that is not easy to overcome: scalability.
boo-box has successfully conquered this obstacle and currently has its infrastructure set up in layers capable of being interwoven horizontally, and that are sufficiently robust to serve thousands of requests per minute. Throughout this post we will present some of the practices currently in use to guarantee a good performance from Communications Systems for Social Media such as Ruby, MERB, CouchDB, Thin, Nginx, Beanstalkd. 
The boo-box infrastructure
Our infrastructure is a combination of different software. Open Source software that have been consolidated for years, such MySQLalong with other more recent ones, which generally speaking have fewer resources, are simpler, or simply more adequate to the specific case.
It is important to note that this post reflects the current infrastructure (May 2009). The rate of new Publishers joining the System and the growth in the number of visits by the Publishers already in the system leads to weekly changes in the structures of the servers, adding new computers or modifying application components. 
Server Identification
Naming servers is always a difficult decision for the development team. Some like to use planets for names, elements of the periodic table, the Greek alphabet. We like to use characters from the anime Dragon Ball Z.
Static Files
Static files are those that do not depend on server processing, such as images, CSSs, JavaScripts
. At boo-box, they are located in a sub-domain that leads directly to a dedicated server, thus relieving the load from our load balancers.
Our static files are previously uploaded to the RAM memory, increasing the system’s response time. This system runs on the HTTP server nginx.
Load Balancers
Load balancers are the servers that receive user requests and redirect the load to one of the application servers. At boo-box there are two load balancers, both linked to DNS for boo-box.com. The load balancer must respond very quickly, and therefore it does not process business rulesets. Each of the servers runs on the HTTP server nginx.
Application clusters
The application cluster is composed of the group of servers that process our Business rule. It is this cluster that decides what ad will be shown in the window, what happens when there is a user click, and what to do with the data of a new Publisher registering in the system.
Each server runs roughly 100 instances do Ruby framework MERB using the HTTP server Thin (no, we do not use RubyOnRails :).
Database Cluster
When information needs to be registered in our system, such as registering a new Publisher, changes in preference settings, blocking an advertiser, and so on, this information is stored in our database.
The boo-box database cluster contains the Vegeta (Master) server, which receives information to be recorded in the database, and also secondary Bulma and Ubb (Slave) servers, where the application servers read the information stored in the database.
As a reader pointed out, the character’s name is actually Uub, but the ninja who named the server was typing with his toes because our arms were raised gathering energy for a massive Genki Dama. Typing with your toes is hard to do, and he hit the wrong key, and the server name has stuck as Ubb :)
Splitting the writing up and reading of data onto different servers was one of the most efficient solutions we took during the last few months to improve uptime performance of the boo-box system.
The database cluster runs on a MySQL base, split between the servers writing up data and those reading it.
The true story of a company contributing to the Free Software community :)
We use the Sequel as ORM in the communication between the application and the database. When we needed to duplicate the database, recreating the Master and Slave structures, the Sequel was unable to read Slave, no matter how carefully we followed protocol.
We got in touch with the ORM developers on the IRC channel, and after running tests for a few hours; we were able to solve the problem working together.
This is only the most recent example in which boo-box has contributed to the Free Software dcommunity in different ways. We are active in this area because we truly believe that our technology here at boo-box is a fruit of the labors of Open Source.
System log cluster
All operations occurring on our servers are recorded in the system log. Which windows were seen, ads clicked, actions taken on partner websites: every action is recorded on our log.
From time to time we process the log raw, we generate statistics and we create a backup. Thus we free up theo log raw to receive more data without losing past data, and simultaneously keep a good system performance.
We use Analogger as a log component. However, problems with performance and scalability led us to look for another solution. Currently the system log is being transferred to a MySQL structure, and being split between Master (data writing) and Slave (data reading) servers.
Cache Products
The majority of ads displayed on boo-box windows are for products commercialized through eCommerce. As product information does not need to be kept for a long time we create a temporary cache of product information.
The cache lends the system solidity, and allows the system to continue functioning even if the eCommerce page is slow in responding or goes out of business.
Our cache products structure is composed of two main components:
Queue
We use Beanstalkd as a queue service for product requests. Each boo-box window has associated tags, and each new tag not yet cached is inserted into this queue. This queue will be consumed in the next few seconds, and thus will interfere with the application’s functionality.
There is an independent service that consumes the queue, going to eCommerce websites to search for products related to each tag and placing that data in the cache servers.
Product Cache cluster
Each server that stores product data runs CouchDB, a database of JSON documents.
The main resource consumed by these servers is HD space – they occupy hundreds of gigabytes in just a few days, especially because of the diverse nature of the offers displayed on the boo-box system; there are literally millions of different products.
The result
During the last few weeks boo-box’s response time and uptime has visibly improved due to the above-mentioned measures taken. These measures are the result of the hard work and experience of our ninjas.
If you have any suggestions or questions, please feel free to comment; the suggestion box is there for your use :)
Posted by Marco Gomes and Mauricio Maia.















