Life and death monitoring

The Brookfield Zoo posted this statement on Facebook today and also emailed it out to all members.

On July 10, there was a drop in the oxygen level at Brookfield Zoo’s Stingray Bay habitat. Veterinary staff was promptly on the scene to provide medical treatment to the affected stingrays. Additionally, immediate action was taken by animal care staff to rectify the situation and get the levels back to normal. Despite tireless efforts by staff, all the animals, which included four southern stingrays and 50 cownose rays, succumbed.

“We are devastated by the tragic loss of these animals,” said Bill Zeigler, senior vice president of animal programs for the Chicago Zoological Society, which operates the zoo. “Our staff did everything possible to try and save the animals, but the situation could not be reversed.”

Staff is currently analyzing the life support system to determine the exact cause of the malfunction. At this time, the Chicago Zoological Society has made the decision to not reopen the summer-long temporary exhibit for the remainder of the season. The popular exhibit has been operating since 2007.

The zoo posted a further clarification on what kind of monitoring was used in the enclosure.

Brookfield Zoo 15 minute monitoring - Facebook
Brookfield Zoo 15 minute monitoring – Facebook

I’m not a zoologist, I have no experience with monitoring systems for animals. But I do have vast experience monitoring critical services. A simple Google search finds $200 monitors for home aquariums that take readings every 6 seconds. It seems reasonable to assume that a commercial system would offer something similar.

In IT, we take painstaking care to ensure that our critical servers stay online.  Most of what we do has nothing to do with life-and-death. Even though we’re only protecting our company financials and reputation, any halfway decent system that I stand up has:

  • Independent power feeds from separate areas of the building
  • Dual power distribution units, dual power on all equipment
  • N+1 architecture to sustain the loss of one host in the cluster
  • Redundant storage controllers with redundant paths
  • Battery backup
  • Appropriate monitoring

In general, a monitoring interval of 5 minutes is the maximum I would ever allow for a Production server. Critical servers could be monitored as frequently as 1 minute. Load balancers watch services as frequently as every 5-10 seconds. All of this work is done to ensure availability of the services.

Anything can go wrong and it’s possible that a more frequent monitoring interval would not have made a difference in this case. But at first glance, a 15 minute interval seems negligent. If I can monitor my goldfish every 6 seconds, it seems that the zoo should have been monitoring the rays more closely. A 15 minute monitoring interval means that you can’t expect a human response for at LEAST 20 minutes, and that doesn’t seem sufficient when the lives of 54 animals depend on it.

Leave a Reply

Your email address will not be published. Required fields are marked *