javaavro

Avro schema parsing from data file


Since the data file in avro embedded with the schema for that, the reader does not want to keep separate .avsc file to specify the schema. I was searching for a java example that works in that way, I couldn't find that. Somebody, please help me to get a code sample for the same.

Schema schema = new Schema.Parser().parse(new File("./AvroSchema/emp.avsc"));

DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("./AvroFileStore/empData.txt"), datumReader);

GenericRecord emp = null;
while (dataFileReader.hasNext()) {
    emp = dataFileReader.next(emp);
    System.out.println(emp);
}

In this example we are providing the avro schema separately to DataFileReader by using datumReader.


Solution

  • GenericDatumReader also has a constructor that doesn't take any parameters. Simply don't pass any schema to it. Of course this will only work with datafiles, and not with data streams that don't have the schema embedded.

    BTW once you have constructed dataFileReader you can call its getSchema() method to get the schema if ever needed.

    Sources: Hadoop: The Definitive Guide by Tom White