This is the pattern I've seen suggested in a few different posts on SO:
metric_definitions = [
{'Name': 'loss', 'Regex': "'loss': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'learning_rate', 'Regex': "'learning_rate': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_loss', 'Regex': "'eval_loss': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_accuracy', 'Regex': "'eval_accuracy': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_f1', 'Regex': "'eval_f1': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_precision', 'Regex': "'eval_precision': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_recall', 'Regex': "'eval_recall': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_runtime', 'Regex': "'eval_runtime': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'eval_samples_per_second', 'Regex': "'eval_samples_per_second': ([0-9]+(.|e\-)[0-9]+),?"},
{'Name': 'epoch', 'Regex': "'epoch': ([0-9]+(.|e\-)[0-9]+),?"}
]
The issue is, it fails to capture the e-0x
after the digit. I've tried a few variant like these one: ([0-9]+(.*e\-)[0-9]+)\w+
which I have tested on https://regexr.com/. While it works on the website it still fails to capture the exponent part in CloudWatch.
I noticed the issue because my loss was going up and down, and when I checked the log directly I could see the loss was only going down, except every time it went from 1.254e-05 to 9.365e-06 only the first portion was captured, so it looked like the loss was just going back up and the model was not learning.
The expression you used has some issues. It only works for "1234e-05", and doesn't work for "1.234e-05". Also "." has to be escaped with back-slash ("\.") to strictly match a period character.
Instead, please try (\d+(\.\d+)?(e-\d+)?)
I only tested on Python's regular expression module, but it should capture all following patterns.