Tuesday, May 11, 2010
Release It! - 16 Case Study: Phenomenal Cosmic Powers, Itty-Bitty Living Space
Long story short: web site goes dark on Black Friday because a downstream integration can't handle the load. Using Perl scripts, they were able to script the resizing of resource pools to get the system back online, albeit not at full throughput. The moral of the story seems to be this: "The ability to restart components, instead of entire servers, is a key concept of recovery-oriented computing. Although we did not have the level of automation that ROC proposes, we were able to recover service without rebooting the world. If we had needed to change the configuration files and restart all the servers, it would have taken more than six hours under that level of load. Dynamically reconfiguring and restarting just the connection pool took less than five minutes (once we knew what to do)."