My Journey Towards High Resilancy

What is high resiliency? According to Oxford Resilience means:

the capacity to recover quickly from difficulties; toughness. If we apply this definition to IT/Systems administration, it means how does your environment fair to unpredictable or predictable events that have a negative effect. This can range anywhere from an update breaking services to a disaster happening at your primary location (e.g. fire, earthquake, flood, etc)
How is resiliency measured? It does seem pretty obvious on how it should be measured, what happens when a critical service or infrastructure component goes down. What is more debated is how much should you chase a resilient infrastructure. How many nines should you chase? Some stop at just 2 or 3 nines which allows them anywhere from 3 days to one working day of downtime. Although this is only several hours, a business would loose tens of thousands of dollars to millions for every day their infrastructure is down. There are others on the other side of the spe…