apache-kafkaavrodata-generationstub-data-generation

How to generate sample data based on the existing Avro schema?


I am doing performance testing of Kafka and need to test different large schemas. At the moment, I am working on Avro-based load testing.

Usually, when working with Kafka, you have data and generate a schema from that. I must test several schemas in this scenario, for which I don't own data. I need to generate sample Avro data based on the existing schema.

What are the possible solutions?

Tried solutions:

How to generate sample data based on the existing Avro schema?


Solution

  • If you are comfortable with Python the fastavro library has utilities to generate data from the schema: https://fastavro.readthedocs.io/en/latest/utils.html

    As an example:

    from fastavro.utils import generate_many
    
    schema = {
        'doc': 'A weather reading.',
        'name': 'Weather',
        'namespace': 'test',
        'type': 'record',
        'fields': [
            {'name': 'station', 'type': 'string'},
            {'name': 'time', 'type': 'long'},
            {'name': 'temp', 'type': 'int'},
        ],
    }
    
    print(list(generate_many(schema, 5)))