scalaavroavro4s

Avro not populating square brackets for Array type


I have the following Avro schema:

{
"name": "schema_name",
"type": "record",
"fields" : [
{
"name": "schema",
"type": "string"
},
{
  "name": "data",
  "type": {
    "type": "array",
    "items":
      {
        "name": "data",
        "type": "record",
        "fields": [
          {
            "name": "phone_number",
            "type": "string"
          }
        ]
      }
  }
},
{
  "name":"flag",
  "type":"string"
}
]
}

And I am using it to generate Avro messages from a text file:

def main(args: Array[String]): Unit = {
  val avroSchemaStr = Source.fromFile("avro_schema.txt").mkString
  val avroSchema = new Schema.Parser().parse(avroSchemaStr)

  Source.fromFile("phone_numbers.txt").getLines.foreach { msg =>
    println(fixedWidthToAvro(msg, avroSchema))
  }
}

def fixedWidthToAvro(record: String, avroSchema: Schema): GenericRecord = {
  val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema.getElementType
  val parentRrecord = new GenericData.Record(avroSchema)
  val childRecord = new GenericData.Record(childSchema)

  childRecord.put("phone_number", "1234567890")
  parentRrecord.put("schema", "schema_name")
  parentRrecord.put("data", childRecord)
  parentRrecord.put("flag", "I")

  println(parentRrecord)
  parentRrecord
}

Everything works well, and I get the below output for a given message:

{"schema": "schema_name", "data": {"phone_number": "1234567890"}, "flag": "I"}

However, as I declared data field type as array I was expecting it to be wrapped in square brackets like a collection. Something like:

{"schema": "schema_name", "data": [{"phone_number": "1234567890"}], "flag": "I"}

I want the data filed to be wrapped in square brackets. How can I achieve that?


Solution

  • You have two elements in the record with the name data. One is the array, and the other one is the element name inside that array, and I think that's what is confusing you.

    When you pass schema.getElementType to Record, you're generating a single record and neglecting to create an Array[Record] to hold all these records.

    What you need is an array which will capacitate all your records:

    val avroSchema = new Schema.Parser().parse(schema)
    val childSchema = new GenericData.Record(avroSchema).getSchema.getField("data").schema
    
    val parentRecord = new GenericData.Record(avroSchema)
    val childRecords = new GenericData.Array[GenericData.Record](1024, childSchema)
    
    val childRecord = new GenericData.Record(childSchema.getElementType)
    
    childRecord.put("phone_number", "33333")
    childRecords.add(childRecord)
    
    parentRecord.put("schema", "schema_name")
    parentRecord.put("data", childRecords)
    parentRecord.put("flag", "I")
    
    println(parentRecord)
    

    Yields:

    {"schema": "schema_name", "data": [{"phone_number": "33333"}], "flag": "I"}