provides another way for a system to fail: Navel Gazing. Navel Gazing is a term used to describe when all the threads are sitting around waiting for some impossible event, which means that despite that the runtime hasn't crashed, your system isn't doing any work. There are four major issues around the problem:
- error conditions and exceptions create too many possible paths to test
- unexcpected interactions can introduce problems in previously safe code
- timing around thread interactions is crucial to manifesting this type of problem so you usually see it under times of high load where concurrent requests are more likely
- developers never test their code against 10,000 concurrent users
in your programming language. Try and beat up the vendor before insulating your Integration Point with worker threads. Blocked or slow responding threads typically appear around Integration Points and can form a feedback loop that can quickly result in a cascading failure.
Blocked threads are the cause for a high proportion of system failures. Scrutinize resource pools and make sure they are configured for concurrent access. Blocked database connection pools can lead to blocked threads, incorrect exception handling and cascading failures. Use timeouts so no thread waits for ever. Don't use the no arg wait() method, use the form that accepts a timeout instead. Use proven libraries. Writing correct concurrent code is hard and leverage the work others have done before you. Beware of code you cannot see -- test and review third-party client libraries because they will fail and it is best if you have an idea of how.