jsonschemapython-jsonschema

How do I make a custom jsonschema keyword (vocabulary) that can be used anywhere in another schema?


I am trying to create a custom jsonschema keyword for some basic evaluation. An example would be "parallelArrays" where the provided lists would be checked to make sure they are the same length.

I have done this successfully for the most part, but I recently realized that I had one mistake. I want to use this keyword in the schema for my actual data. To make sure I am using the keyword properly, I want to validate the schema itself against the meta schema which will make sure (among other things) that the "parallelArrays" keyword is well-formed in the schema.

This meta schema validation works well when I use my custom keyword in the top level object of the schema, but fails when I use it elsewhere (e.g. within a nested object).

I have an example of this below with a custom keyword "myKeyword" which does nothing but must be an object. Note that the example below uses the jschon python library, but I've observed the same issue with python-jsonschema. I am assuming there is something wrong with the way I am forming vocabulary in the first place, but I can't figure it out

The following files are assumed to be in the same folder:

meta.schema.json

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/myKeyword/meta.schema.json",
    "$vocabulary": {
      "https://json-schema.org/draft/2020-12/vocab/core": true,
      "https://json-schema.org/draft/2020-12/vocab/applicator": true,
      "https://json-schema.org/draft/2020-12/vocab/validation": true,
      "https://json-schema.org/draft/2020-12/vocab/meta-data": true,
      "https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
      "https://json-schema.org/draft/2020-12/vocab/content": true,
      "https://json-schema.org/draft/2020-12/vocab/unevaluated": true
    },
    "allOf": [
      { "$ref": "https://json-schema.org/draft/2020-12/schema" },
      { "$ref": "https://example.com/myKeyword/vocab.schema.json" }
    ]
}

vocab.schema.json

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://example.com/myKeyword/vocab.schema.json",
    "properties": {
      "myKeyword": {
        "description": "myKeyword MUST be an object",
        "type": "object"
      }
    }
}

test.py (you'll need to pip install jschon)

import os
import jschon

# This just helps jschon locate our schema files
catalog = jschon.create_catalog("2020-12")
catalog.add_uri_source(jschon.URI("https://example.com/myKeyword/"), jschon.LocalSource(os.path.dirname(__file__)))


### Test 1 ###
# myKeyword is in the top level of the schema and is an object
user_schema = jschon.JSONSchema({
  "$schema": "https://example.com/myKeyword/meta.schema.json",
  "$id": "https://example.com/mySchema",

  "type": "object",
  "myKeyword": {},
  "properties": {
    "favoriteNumber": {
      "type": "integer",
    }
  }
})

assert(user_schema.validate().valid is True) # Passes


### Test 2 ###
# myKeyword is in the top level of the schema and is NOT an object
user_schema = jschon.JSONSchema({
  "$schema": "https://example.com/myKeyword/meta.schema.json",
  "$id": "https://example.com/mySchema",

  "type": "object",
  "myKeyword": [],
  "properties": {
    "favoriteNumber": {
      "type": "integer",
    }
  }
})

assert(user_schema.validate().valid is False) # Passes


### Test 3 ###
# myKeyword is NOT in the top level of the schema and is an object
user_schema = jschon.JSONSchema({
  "$schema": "https://example.com/myKeyword/meta.schema.json",
  "$id": "https://example.com/mySchema",

  "type": "object",
  "properties": {
    "favoriteNumber": {
      "type": "integer",
      "myKeyword": {}
    }
  }
})

assert(user_schema.validate().valid is True) # Passes (but trivially, see below)


### Test 4 ###
# myKeyword is NOT in the top level of the schema and is NOT an object
user_schema = jschon.JSONSchema({
  "$schema": "https://example.com/myKeyword/meta.schema.json",
  "$id": "https://example.com/mySchema",

  "type": "object",
  "properties": {
    "favoriteNumber": {
      "type": "integer",
      "myKeyword": []
    }
  }
})

assert(user_schema.validate().valid is False) # Fails!

# Essentially, when "myKeyword" is not in the top level, it is always marked as valid because
# for some reason the vocab.schema.json is not checked. This is why Test 3 passes but I called
# it trivial.

Solution

  • One of the JSON Schema authors has some great blog posts about this very topic.

    https://blog.json-everything.net/posts/updating-vocabs/

    https://docs.json-everything.net/schema/vocabs/data-2023/

    Looking at your schema, it seems your vocab schema has quite a few unnecessary definitions

    I'm fairly certain this should get you where you want to be.

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://example.com/myKeyword/vocab.schema.json",
      "$defs": {
        "myKeyword": {}
      },
      "title": "my keyword for parallel arrays",
      "$ref": "#/$defs/myKeyword"
    }
    
    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://example.com/schema/meta/myKeyword",
      "$dynamicAnchor": "meta",
      "$vocabulary": {
        <All core vocabs>, 
        "https://example.com/meta/myKeyword/vocab.schema.json": true
      },
      "allOf": [
        {"$ref": "https://json-schema.org/draft/2020-12/schema"},
        {"$ref": "https://example.com/myKeyword/vocab.schema.json"}]
    }