- save part of the ship - Bulkheads partition capacity as a way to
preserve partial system functionality when bad things happen.
- decide whether to accept less efficient use of resources - partitioned
systems need more reserved, but probably unused, capacity. If everything is pooled together, you might need less total reserved capacity.
- pick a useful granularity - you can partition thread pools in an
application, CPUs in a server, or servers in cluster.
- very important with shared services models - if you are a SOA provider and your services go down, Chain Reactions will occur and things will come to a halt. Use Bulkheads to reduce the issue.
Wednesday, April 14, 2010
Release It! - Chapter 5.3 Bulkheads
On ships, bulkheads are a way to partition the ship into sections allowing you to seal them off from the rest of the ship if there is a hull breach. This allows the ship to stay afloat even if there is a hole in the ship. The same idea can be applied to software -- stay afloat if even part of the system has been damaged. In software this is done via redundancy -- multiple instances of an application server running on multiple pieces of hardware. You can also partition your system by function -- one set of servers for flight check-in and another set of system to purchase tickets. Scheduling maintenence might be another reason to use Bulkheads, since you can selectively turn off and update discrete portions of the system and still process transactions. Virtualization is a tool you can use to partition your system and still allow for the ebb and flow of demand. Some companies are using Amazon's EC2 to handle seasonal traffic, essentially renting resources just to cover their temporary needs Examining the business cost of a down piece of system functionality can help guide where Bulkheads might make sense. Redundancy has its costs so only pay to Bulkhead what is really important to the business. You can also consider a CPU Bulkhead where specific threads are bound to specific CPUs. That way, if a bad piece of code pegs a CPU, other CPUs might still be available to do work because they have been targetted with a different work load.