I am doing performance testing of Kafka and need to test different large schemas. At the moment, I am working on Avro-based load testing.
Usually, when working with Kafka, you have data and generate a schema from that. I must test several schemas in this scenario, for which I don't own data. I need to generate sample Avro data based on the existing schema.
What are the possible solutions?
Tried solutions:
How to generate sample data based on the existing Avro schema?
If you are comfortable with Python the fastavro
library has utilities to generate data from the schema: https://fastavro.readthedocs.io/en/latest/utils.html
As an example:
from fastavro.utils import generate_many
schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'long'},
{'name': 'temp', 'type': 'int'},
],
}
print(list(generate_many(schema, 5)))