I am learning chaos engineering and I am following a tutorial, but my code is not running as it should.
The service I am testing.
service.py
import io
import time
import threading
from wsgiref.validate import validator
from wsgiref.simple_server import make_server
EXAMPLE_FILE = './example.dat'
def update_file():
"""Write the current time to the file every second."""
print('Updating file...')
while True:
with open(EXAMPLE_FILE, 'w') as f:
f.write(datetime.now().isoformat())
time.sleep(1)
def simple_app(environ, start_response):
"""A simple WSGI application.
This application just writes the current time to the response.
"""
status = '200 OK'
headers = [('Content-type', 'text/plain; charset=utf-8')]
start_response(status, headers)
with open(EXAMPLE_FILE, 'r') as f:
return [f.read().encode('utf-8')]
if __name__ == '__main__':
# Start the file update thread.
t = threading.Thread(target=update_file)
t.start()
httpd = make_server('', 8000, simple_app)
print("Serving on port 8000...")
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("\nKeyboard interrupt received, exiting.")
httpd.shutdown()
t.join(timeout=1)
print("Exiting.")
My chaos experiment file
experiment.json
"title": "Does our service tolerate the loss of its example file?",
"description": "Our service reads data from a file, can it work without it?",
"tags": ["tutorial", "filesystem"],
"steady-state-hypothesis": {
"title": "The exchange file must exist",
"probes": [
{
"type": "probe",
"name": "service-is-unavailable",
"tolerance": [200, 503],
"provider": {
"type": "http",
"url": "http://localhost:8000"
}
}
]
},
"method": [
{
"name": "move-example-file",
"type": "action",
"provider": {
"type": "python",
"module": "os",
"func": "rename",
"arguments": {
"src": "./example.dat",
"dst": "./example.dat.old"
}
}
}
]
}
But instead of renaming my old file, chaos creates a new file with the provided name and the experiment ends with a success, which I am not expecting.
Please help.
Finally!
Problem: The update_file
updated the example.dat
file every second and if it didn't exist, it would just create it! So when chaos
renames example.dat
to example.dat.old
, update_file
just creates another example.dat
and it seems like the chaos steady-state hypothesis is met all the time.
One solution: Set the update_file
to run after a significantly longer time. In my case, time.sleep(60) worked!
Logs from chaos run experiment.json
[2021-12-07 01:11:19 INFO] Validating the experiment's syntax
[2021-12-07 01:11:19 INFO] Experiment looks valid
[2021-12-07 01:11:19 INFO] Running experiment: Does our service tolerate the loss of its example file?
[2021-12-07 01:11:19 INFO] Steady-state strategy: default
[2021-12-07 01:11:19 INFO] Rollbacks strategy: default
[2021-12-07 01:11:19 INFO] Steady state hypothesis: The exchange file must exist
[2021-12-07 01:11:19 INFO] Probe: service-is-unavailable
[2021-12-07 01:11:21 INFO] Steady state hypothesis is met!
[2021-12-07 01:11:21 INFO] Playing your experiment's method now...
[2021-12-07 01:11:21 INFO] Action: move-example-file
[2021-12-07 01:11:21 INFO] Steady state hypothesis: The exchange file must exist
[2021-12-07 01:11:21 INFO] Probe: service-is-unavailable
[2021-12-07 01:11:23 CRITICAL] Steady state probe 'service-is-unavailable' is not in the given tolerance so failing this experiment
[2021-12-07 01:11:23 INFO] Experiment ended with status: deviated
[2021-12-07 01:11:23 INFO] The steady-state has deviated, a weakness may have been discovered