When developing enterprise-level applications, we often need to call external services and resources. These services could be a network location, database server, or web service. Whenever we call a service, there is a chance that a problem with the network or the end-service itself could cause a service failure. One method of attempting to overcome a service failure is to queue requests and retry periodically. This allows us to continue processing requests until the service becomes available again. However, if a network or service is experiencing problems, hammering it with retry attempts will not help the service to recover, especially if it is under increased load. Such a pounding can cause even more damage and interruption to services. If we know there could potentially be a problem with a service, we can help take some of the strain by implementing a Circuit Breaker pattern on the client application.
Circuit breakers in our home prevent a surge of current from damaging appliances or overheating the wiring. They work by allowing a certain level of current to enter the system. If the current exceeds the threshold, the circuit opens, stopping the current from flowing and preventing further damage. Once the problem has been fixed, the circuit breaker can be reset which closes the circuit and allows electricity to flow again. The Circuit Breaker patten uses the same concept by stopping requests to a resource if the number of failures exceed a certain threshold.
The Circuit Breaker pattern is described in Michael T. Nygard’s book, Release It! Design and Deploy Production-Ready Software. The pattern has three operational states: closed, open and half-open.
In the “closed” state, operations are executed as usual. If an operation throws an exception, the failure count is incremented and an OperationFailedException is thrown. If the failure count exceeds the threshold, the circuit breaker trips into the “open” state. If a call succeeds before the threshold is reached, the failure count is reset.
In the “open” state, all calls to the operation will fail immediately and throw an OpenCircuitException. A timeout is started when the circuit breaker trips. Once the timeout is reached, the circuit breaker enters a “half-open” state.
In the “half-open” state, the circuit breaker allows one operation to execute. If this operation fails, the circuit breaker re-enters the “open” state and the timeout is reset. If the operation succeeds, the circuit breaker enters the “closed” state and the process starts over.
You can download the circuit breaker code and tests here. If you have any comments or suggestions, I would love to hear them!
Update: I have posted a new article that contains a number of additions and improvements to the circuit breaker code.
For more information on this pattern and many other ways to improve software stability, capacity and operational ability, I highly recommend the book Release It! Design and Deploy Production-Ready Software by Michael T. Nygard.
Hi Tim, I enjoyed reading “Release IT!” and it’s great to see some of the patterns being implemented in simple solutions like yours.
However, a circuit breaker will typically run in a multithreaded environment, and your implementation lacks thread-safety as I see it.
Hi Søren, thanks for your comment. It’s true that this implementation does not take thread safety into account. If it was to be used it in a multi-threaded environment, the code would need to be changed to support this. I will look into implementing a thread-safe version of this example.
Great, I’ll look forward to seeing it.
Tim, it’s great to see an example of this. Thank you very much. It seems like a truly useful pattern, and I’m looking forward to trying it out (am reading Release It! right now).
Have you tried any of the OpsDB stuff? I’m really interested in application/business monitoring and health, so I’m looking for .NET materials related to this.
Hi Tobin, thanks for your comment. Sorry for the late response, I don’t seem to be getting emails from wordpress any more when a new comment is posted! I haven’t tried any OpsDB stuff, but it certainly sounds useful. What I really need is something that can monitor WCF services. I have found these very difficult to troubleshoot in live environments. Do you know of anything useful for this?
Hey Tim. No probs 🙂
Hmmm, I’ve not seen anything for monitoring WCF services I’m afraid. Will let you know if I stumble across anything.
Check out some more diagrams and sample source code for PHP
http://artur.ejsmont.org/blog/PHP-Circuit-Breaker-initial-Zend-Framework-proposal
Can you show a sample call pattern for using this with say a LinqToSQL data access layer? I was trying to work this into two simple methods (one that does an Update and another that does a Select) and don’t see how to cleanly use this.
Thanks.
Very nice post. I just stumbled upon your weblog and wished to say that I have really enjoyed browsing your blog posts. After all I’ll be subscribing to your feed and I hope you write again soon!
I truly enjoyed flicking via this. I think I will that a look through your other posts!
oopss!! I missed this great article, and seems that source code is remove. May I please request to reload the source code for both old and new approach? Thank you!!
Hi Rupen,
Sorry, it looks like I accidentally removed those files! Thanks for letting me know. I have now restored the links to download the source code and tests.
Great, thank you Tim, I am taking one step at a time, once done with this traditional approach, I’ll look at new article.
Just read about this on Martin Fowler’s blog. Neat pattern.