The SRE Manifesto

Site Reliability Engineering Practice

Practice code	Practice area(s)	Practice name	Practice description	Practice applicability	Practice technology(ies)	Implementation steps
AUT101	[x] Automation; [x] Systems Thinking	Chaos Engineering	Experiment on a system in order to build confidence in its capability to withstand turbulent conditions in production	Distributed systems, microservices architectures, cloud platforms	Chaos Monkey, Gremlin, LitmusChaos, AWS Fault Injection Simulator	1. Define steady-state hypothesis; 2. Identify real-world events (e.g. server crash); 3. Run automated experiments in staging/prod; 4. Analyze results and improve system resilience.

Source: Principles of Chaos Engineering