jsonjsonschema

Dataset not failing on validation for json schema


I have a base schema and extended schema below

./resources/json-schemas/simple-person.schema

{
  "$id": "http://example.com/json-schemas/simple-person.schema",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Simple Person",
  "type": "object",
  "people": {
    "items": {
      "properties": {
        "name": {
          "type": "string",
          "description": "The person's name."
        },
        "age": {
          "type": "integer",
          "description": "The person's age."
        }
      },
      "required": [
        "name",
        "age"
      ]
    }
  }
}

./extended-person.schema

{
  "$id": "http://example.com/extended-person.schema",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Extended Person",
  "type": "object",
  "allOf": [
    {
      "$ref": "http://example.com/json-schemas/simple-person.schema"
    },
    {
      "people": {
        "items": {
          "properties": {
            "height": {
              "type": "number",
              "description": "The person's height in centimeters."
            },
            "required": [
              "height"
            ]
          }
        }
      }
    }
  ]
}

I then have an instance of this dataset that I want to validate against.

./person-dataset.json

{
    "people": [
        {
            "name": "Bob",
            "age": 25,
            "new value": "value"
        }
    ]
}

I would expect the validation to fail, but it passes with the code below

from pathlib import Path

import json

from referencing import Registry, Resource
from referencing.exceptions import NoSuchResource
from jsonschema import Draft202012Validator


def retrieve_from_filesystem(uri: str):
    SCHEMAS = Path("./resources/json-schemas/")

    if uri.startswith("http://example.com/json-schemas/"):
        path = SCHEMAS / Path(uri.removeprefix("http://example.com/json-schemas/"))
    else:
        raise NoSuchResource(ref=uri)

    contents = json.loads(path.read_text())

    return Resource.from_contents(contents)

registry = Registry(retrieve=retrieve_from_filesystem)

schema = json.load(Path("./extended-person.schema").open())
instance = json.load(Path("./person-dataset.json").open())
validator = Draft202012Validator(schema, registry=registry)

validator.validate(instance)

I expected this to fail for two reasons

  1. no "height" property is included in the dataset
  2. a new property "new value" is included in the dataset that isn't specified in the schemas

How do I fix this to make it so datasets like this will fail?


Solution

  • This one is a bit tricky in that you are adding constraints to an existing schema inside of a nested array

    {
        "$id": "http://example.com/json-schemas/extended-person.schema",
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "type": "object",
        "properties": {
            "people": {
                "type": "array",
                "items": {
                    "unevaluatedProperties": false,
                    "$ref": "simple-person.schema#/properties/people/items",
                    "properties": {
                        "height": {
                            "type": "number"
                        }
                    },
                    "required": [
                        "height"
                    ]
                }
            },
        "required": ["people"]
        }
    }
    

    this is a great reference for modeling inheritance. JSON Schema doesn't completely support inheritance, but there are definitely ways to model it very close to the expected behavior with some concessions. https://json-schema.org/blog/posts/modelling-inheritance


    p.s. $ref is now allowed at the root of a schema, alongside sibling keywords such as properties or other applicators(except another $ref) which allows you to simplify your schema from using allOf to compose multiple schemas in this fashion