pythonmicroserviceschaos

Chaos toolkit not renaming a file


I am learning chaos engineering and I am following a tutorial, but my code is not running as it should.

The service I am testing.

service.py

import io
import time
import threading
from wsgiref.validate import validator
from wsgiref.simple_server import make_server

EXAMPLE_FILE = './example.dat'

def update_file():
    """Write the current time to the file every second."""
    print('Updating file...')
    while True:
        with open(EXAMPLE_FILE, 'w') as f:
            f.write(datetime.now().isoformat())
        time.sleep(1)

def simple_app(environ, start_response):
    """A simple WSGI application.

    This application just writes the current time to the response.
    """
    status = '200 OK'
    headers = [('Content-type', 'text/plain; charset=utf-8')]
    start_response(status, headers)
    with open(EXAMPLE_FILE, 'r') as f:
        return [f.read().encode('utf-8')]

if __name__ == '__main__':
    # Start the file update thread.
    t = threading.Thread(target=update_file)
    t.start()
    httpd = make_server('', 8000, simple_app)
    print("Serving on port 8000...")
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        print("\nKeyboard interrupt received, exiting.")
        httpd.shutdown()
        t.join(timeout=1)
        print("Exiting.")

My chaos experiment file experiment.json

  "title": "Does our service tolerate the loss of its example file?",
  "description": "Our service reads data from a file, can it work without it?",
  "tags": ["tutorial", "filesystem"],

  "steady-state-hypothesis": {
    "title": "The exchange file must exist",
    "probes": [
      {
        "type": "probe",
        "name": "service-is-unavailable",
        "tolerance": [200, 503],
        "provider": {
          "type": "http",
          "url": "http://localhost:8000"
        }
      }
    ]
  },
  "method": [
    {
      "name": "move-example-file",
      "type": "action",
      "provider": {
        "type": "python",
        "module": "os",
        "func": "rename",
        "arguments": {
          "src": "./example.dat",
          "dst": "./example.dat.old"
        }
      }
    }
  ]
}

But instead of renaming my old file, chaos creates a new file with the provided name and the experiment ends with a success, which I am not expecting.

enter image description here

Please help.


Solution

  • Finally!

    Problem: The update_file updated the example.dat file every second and if it didn't exist, it would just create it! So when chaos renames example.dat to example.dat.old, update_file just creates another example.dat and it seems like the chaos steady-state hypothesis is met all the time.

    One solution: Set the update_file to run after a significantly longer time. In my case, time.sleep(60) worked!

    Logs from chaos run experiment.json

    [2021-12-07 01:11:19 INFO] Validating the experiment's syntax
    [2021-12-07 01:11:19 INFO] Experiment looks valid
    [2021-12-07 01:11:19 INFO] Running experiment: Does our service tolerate the loss of its example file?     
    [2021-12-07 01:11:19 INFO] Steady-state strategy: default
    [2021-12-07 01:11:19 INFO] Rollbacks strategy: default
    [2021-12-07 01:11:19 INFO] Steady state hypothesis: The exchange file must exist
    [2021-12-07 01:11:19 INFO] Probe: service-is-unavailable
    [2021-12-07 01:11:21 INFO] Steady state hypothesis is met!
    [2021-12-07 01:11:21 INFO] Playing your experiment's method now...
    [2021-12-07 01:11:21 INFO] Action: move-example-file
    [2021-12-07 01:11:21 INFO] Steady state hypothesis: The exchange file must exist
    [2021-12-07 01:11:21 INFO] Probe: service-is-unavailable
    [2021-12-07 01:11:23 CRITICAL] Steady state probe 'service-is-unavailable' is not in the given tolerance so failing this experiment
    [2021-12-07 01:11:23 INFO] Experiment ended with status: deviated
    [2021-12-07 01:11:23 INFO] The steady-state has deviated, a weakness may have been discovered