scalajson4s

Raise exception while parsing JSON with the wrong schema on the Optional field


During JSON parsing, I want to catch an exception for optional sequential files which schema differed from my case class. Let me elaborate

I have the following case class:

case class SimpleFeature(
  column: String,
  valueType: String,
  nullValue: String,
  func: Option[String])

case class TaskConfig(
  taskInfo: TaskInfo,
  inputType: String,
  training: Table,
  testing: Table,
  eval: Table,
  splitStrategy: SplitStrategy,
  label: Label,
  simpleFeatures: Option[List[SimpleFeature]],
  model: Model,
  evaluation: Evaluation,
  output: Output)

And this is part of JSON file I want to point attention to:

"simpleFeatures": [
  {
    "column": "pcat_id",
    "value": "categorical",
    "nullValue": "DUMMY"
  },
  {
    "column": "brand_code",
    "valueType": "categorical",
    "nullValue": "DUMMY"
  }
]

As you can see the first element has an error in the schema and while parsing, I want to raise an error. At the same time, I want to keep optional behavior in case there is no object to parse.

One idea that I've been researching for a while - to create the custom serializer and manually check fields, but not sure I'm on the right track

object JSONSerializer extends CustomKeySerializer[SimpleFeatures](format => {
  case jsonObj: JObject => {
    case Some(simplFeatures (jsonObj \ "simpleFeatures")) => {
    // Extraction logic goes here
    }
  }
})

I might be not quite proficient in Scala and json4s so any advice is appreciated.

json4s version
3.2.10

scala version
2.11.12

jdk version
1.8.0

Solution

  • I think you need to extend CustomSerializer class since CustomKeySerializer it is used to implement custom logic for JSON keys:

    import org.json4s.{CustomSerializer, MappingException}
    import org.json4s.JsonAST._
    import org.json4s.JsonDSL._
    import org.json4s.jackson.JsonMethods._
    
    case class SimpleFeature(column: String,
                              valueType: String,
                              nullValue: String,
                              func: Option[String])
    
    case class TaskConfig(simpleFeatures: Option[Seq[SimpleFeature]])
    
    object Main extends App {
    
    implicit val formats = new DefaultFormats {
        override val strictOptionParsing: Boolean = true
      } + new SimpleFeatureSerializer()
    
      class SimpleFeatureSerializer extends CustomSerializer[SimpleFeature](_ => ( {
        case jsonObj: JObject =>
          val requiredKeys = Set[String]("column", "valueType", "nullValue")
    
          val diff = requiredKeys.diff(jsonObj.values.keySet)
          if (diff.nonEmpty)
            throw new MappingException(s"Fields [${requiredKeys.mkString(",")}] are mandatory. Missing fields: [${diff.mkString(",")}]")
    
          val column = (jsonObj \ "column").extract[String]
          val valueType = (jsonObj \ "valueType").extract[String]
          val nullValue = (jsonObj \ "nullValue").extract[String]
          val func = (jsonObj \ "func").extract[Option[String]]
    
          SimpleFeature(column, valueType, nullValue, func)
      }, {
        case sf: SimpleFeature =>
          ("column" -> sf.column) ~
            ("valueType" -> sf.valueType) ~
            ("nullValue" -> sf.nullValue) ~
            ("func" -> sf.func)
      }
      ))
    
      // case 1: Test single feature
      val singleFeature  = """
              {
                  "column": "pcat_id",
                  "valueType": "categorical",
                  "nullValue": "DUMMY"
              }
          """
      val singleFeatureValid = parse(singleFeature).extract[SimpleFeature]
      println(singleFeatureValid)
      //  SimpleFeature(pcat_id,categorical,DUMMY,None)
    
      // case 2: Test task config
      val taskConfig  = """{
          "simpleFeatures": [
            {
              "column": "pcat_id",
              "valueType": "categorical",
              "nullValue": "DUMMY"
            },
            {
              "column": "brand_code",
              "valueType": "categorical",
              "nullValue": "DUMMY"
            }]
      }"""
    
      val taskConfigValid = parse(taskConfig).extract[TaskConfig]
      println(taskConfigValid)
      //  TaskConfig(List(SimpleFeature(pcat_id,categorical,DUMMY,None), SimpleFeature(brand_code,categorical,DUMMY,None)))
    
      // case 3: Invalid json
      val invalidSingleFeature  = """
              {
                  "column": "pcat_id",
                  "value": "categorical",
                  "nullValue": "DUMMY"
              }
          """
      val singleFeatureInvalid = parse(invalidSingleFeature).extract[SimpleFeature]
      // throws MappingException
    }
    
    

    Analysis: the main question here is how to gain access to the keys of jsonObj in order to check whether there is an invalid or missing key, one way to achieve that is through jsonObj.values.keySet. For the implementation, first we assign the mandatory fields to the requiredKeys variable, then we compare the requiredKeys with the ones that are currently present with requiredKeys.diff(jsonObj.values.keySet). If the difference is not empty that means there is a missing mandatory field, in this case we throw an exception including the necessary information.

    Note1: we should not forget to add the new serializer to the available formats.

    Note2: we throw an instance of MappingException, which is already used by json4s internally when parsing JSON string.

    UPDATE

    In order to force validation of Option fields you need to set strictOptionParsing option to true by overriding the corresponding method:

    implicit val formats = new DefaultFormats {
        override val strictOptionParsing: Boolean = true
      } + new SimpleFeatureSerializer()
    

    Resources

    https://nmatpt.com/blog/2017/01/29/json4s-custom-serializer/

    https://danielasfregola.com/2015/08/17/spray-how-to-deserialize-entities-with-json4s/

    https://www.programcreek.com/scala/org.json4s.CustomSerializer