In the previous blog we discussed about what chaos engineering is. And in this part we will go through how to plan and implement it.
While planning for the testing keep following things in the mind:
Assume nothing: Just like any other testing assume nothing while analyzing the environment. Quite often the part of designs which we assume to be safe is the one which fails, leaving us in unprepared situation.
Cover everything: Not so of a point but a checklist to look at the infrastructure thoroughly. Every component have its failure point, our job is to test for what happens when that fails.
Do not take a word for it: Highly available,low failure rate and highly redundant, we come across such services/implementation quite often. More so since the cloud and its services have come. But Its all theoretical/SLA, the failure they promised might not cover all the scenarios for you. Also for them as per service failure might be well within SLA range, but it could cause us financial/reputation loss as organization