c++avroapache-icebergavro-tools

How can I use Avro C++ to write a file for a schema that I defined programatically?


I have a sample schema that I am defining programmatically. It's something like


struct UserEntry {
  int64_t user_id;
  std::string user_name;
  std::string user_email;
  int64_t user_phn;
}

avro::ValidSchema SampleUserPhEntry() {
    avro::RecordSchema schema("UserEntries");
    schema.addField("user_id", avro::LongSchema());
    schema.addField("user_name", avro::StringSchema());
    schema.addField("user_email", avro::StringSchema());
    schema.addField("user_phone", avro::LongSchema());    
    return avro::ValidSchema(schema);
}

Most of the examples that I am seeing in the repo are using a generated struct to write avro file. Is there any way I can use my custom defined struct to write an Avro file?

I did try to write it field by field, but end up running into compilation issues with codec

int WriteUserPhEntry() {
  avro::ValidSchema schema = SampleUserPhEntry(); 
  avro::GenericDatum schema_datum(schema);
  
  const char* file_name = "user_entries.avro";
  avro::DataFileWriter<avro::GenericDatum> writer(file_name, schema);

  avro::GenericRecord& record = schema_datum.value<avro::GenericRecord>();

  record.field("user_id").value<int64_t>() = static_cast<int64_t>(64);
  record.field("user_name").value<std::string>() = "avro_user";
  record.field("user_email").value<std::string>() = "avro_user@avro.com";
  record.field("user_phone").value<int64_t>() = 1234567890;

  std::cout << record.field("user_id").value<int64_t>() << std::endl;
  
  // The following line runs into: implicit instantiation of undefined template 'avro::codec_traits<avro::GenericDatum>'
  // writer.write(schema_datum);

  writer.close();
  return 0;
   
}

Thanks


Solution

  • After reading more code in the repo, I realized that Avro is using template class specialization. All I need to do is define the right encoding / decoding logic for the struct, and it will call it correctly.

    template<> struct codec_traits<UserEntry> {
        static void encode(Encoder& e, const UserEntry& user) {
               avro::encode(e, user.user_id);
               avro::encode(e, user.user_name);
               ...
               ...
               ...
        }
    
        static void decode(Decoder& d, UserEntry& user) {
               avro::decode(d, user.user_id);
                avro::decode(d, user.name);
                ...
                ...
                ...
            }
        }
    };
    
    

    Note: If UserEntry is made of other struct types, they also need to have their encoders defined.

    To write the data

    avro::DataFileWriter<UserEntry> writer(file_name, schema);
    UserEntry user;
    ...
    // populate
    ...
    
    writer.write(user);
    writer.close();