Saturday, March 27, 2010
Release It! - Chapter 4.2 Chain Reactions
Horizontal scaling is scaling by adding additional servers. Vertical scaling is scaling via larger boxes. Horizontal scaling uses load balancers to provide fault tolerance. Horizontally scaled systems can exhibit a failure mode known as Attack of Self Denial. The essence of the failure is that when a single node goes dark, the extra load on the remaining servers cause them to fail, typically due to a resource leak or load related bug. Since all servers have the same bug, they will eventually fail in the same load-related way forming a Chain Reaction. As more and more servers fail, the remaining servers fail faster and faster until the entire layer is dead. The only way to stop the cascade is to fix the leak. You can attempt to group your servers via the Bulkhead pattern and break the chain reactions into separate chain reactions, which might give you enough time to bring up another set of servers. Accept this: one down server places the remaining servers in jeopardy. A dead layer will then endanger the dependent layers. Hunting for resource leaks is important because resource leaks are a primary killer of systems. You must also hunt for obscure timing bugs by load testing your system. If you use the Bulkhead pattern on the server side and the Circuit Breaker pattern on the client side, you can prevent chain reactions that can take out entire layers.