Search This Blog

Monday, April 12, 2010

Release It! - Chapter 5.1 Use Timeouts

This is the chapter I've been really looking forward to.  We seen some of the ways you can hose a system from a stability standpoint, now let's seen how we can remedy some of those situations.  This is a long chapter so each pattern will be broken out into a separate post.

Modern systems rely heavily on the network and networks break.  Waiting for an answer that is never going to come is not a wise move.  I like this tagline: "Hope is not a design method."  Make sure your code doesn't wait around forever for an answer to its request.  Ensure that any resource pool implementation that blocks a thread until a resource is available, should have a timeout enabled. In Java, always use the form of the concurrency APIs that take timeout, never the no-arg ones.  Creating reusable code that deals with the sticky issues around thread blocking and timeouts is desirable, not to mention good programming.  That way, a particular set of thread interactions are understood and shared throughout the system.  Use QueryObject and Gateway to encapsulate database access logic, making it easier to apply Circuit Breaker.   Some code attempts to retry after a failure but, generally speaking, that is not a wise thing to do.  Networks and servers don't heal quickly and making a client wait is usually not a good thing.  A better tactic is to return a result, which might be an error code or an indicator that you've queued up the request for retry at a future time.  Making the client wait will likely cause a cascading failure as his callers have to sit around waiting to get their answer from him. Store-and-Forward is generally a robust solution to timeouts but each application has its own definition of "fast enough" which you need to account for.  Timeouts and Circuit Breakers are a good combination because the Circuit Breaker can trip if timeouts become the norm instead of the exception.  Timeouts coupled with Fail Fast are another common combination.  Timeout protects you against somebody else's failure while Fail Fast is used to report to your callers why you can't complete their request.  Timeouts also can take a role in Unbounded Results in that it might take too much time to load those million records you accidentally asked for. 
  • apply Timeout to Integration Points, Blocked Threads, and Slow Response to avert Cascading Failures
  • apply Timeout as a way to recover from unexected failures.  Sometimes you can't know the precise cause of the failure but you need to give up and move on.
  • consider delayed retries.  Immediate retries are likely to fail and end up delaying the layer calling you.  Queing up the work and trying again later is usually a better alternative.
In the past, I've written utility objects intended for use throughout the system but I never crafted them with an eye towards system stability.  It makes sense to encapsulate all the gory details around network timeouts and retries into a single place.  I guess this is another reason to try and keep your code DRY.  I'll be interested to see what sort of Java idioms emerge from the combination of Timeouts and Circuit Breaker.

No comments:

Post a Comment