Skip to content

The SRE Manifesto

Site Reliability Engineering Practice

Chaos Engineering

Practice code Practice area(s) Practice name Practice description Practice applicability Practice technology(ies) Implementation steps
AUT101 [x] Automation; [x] Systems Thinking Chaos Engineering Experiment on a system in order to build confidence in its capability to withstand turbulent conditions in production Distributed systems, microservices architectures, cloud platforms Chaos Monkey, Gremlin, LitmusChaos, AWS Fault Injection Simulator 1. Define steady-state hypothesis; 2. Identify real-world events (e.g. server crash); 3. Run automated experiments in staging/prod; 4. Analyze results and improve system resilience.

Source: Principles of Chaos Engineering

End