I am trying to create a custom jsonschema keyword for some basic evaluation. An example would be "parallelArrays" where the provided lists would be checked to make sure they are the same length.
I have done this successfully for the most part, but I recently realized that I had one mistake. I want to use this keyword in the schema for my actual data. To make sure I am using the keyword properly, I want to validate the schema itself against the meta schema which will make sure (among other things) that the "parallelArrays" keyword is well-formed in the schema.
This meta schema validation works well when I use my custom keyword in the top level object of the schema, but fails when I use it elsewhere (e.g. within a nested object).
I have an example of this below with a custom keyword "myKeyword" which does nothing but must be an object. Note that the example below uses the jschon python library, but I've observed the same issue with python-jsonschema. I am assuming there is something wrong with the way I am forming vocabulary in the first place, but I can't figure it out
The following files are assumed to be in the same folder:
meta.schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/myKeyword/meta.schema.json",
"$vocabulary": {
"https://json-schema.org/draft/2020-12/vocab/core": true,
"https://json-schema.org/draft/2020-12/vocab/applicator": true,
"https://json-schema.org/draft/2020-12/vocab/validation": true,
"https://json-schema.org/draft/2020-12/vocab/meta-data": true,
"https://json-schema.org/draft/2020-12/vocab/format-annotation": true,
"https://json-schema.org/draft/2020-12/vocab/content": true,
"https://json-schema.org/draft/2020-12/vocab/unevaluated": true
},
"allOf": [
{ "$ref": "https://json-schema.org/draft/2020-12/schema" },
{ "$ref": "https://example.com/myKeyword/vocab.schema.json" }
]
}
vocab.schema.json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/myKeyword/vocab.schema.json",
"properties": {
"myKeyword": {
"description": "myKeyword MUST be an object",
"type": "object"
}
}
}
test.py (you'll need to pip install jschon)
import os
import jschon
# This just helps jschon locate our schema files
catalog = jschon.create_catalog("2020-12")
catalog.add_uri_source(jschon.URI("https://example.com/myKeyword/"), jschon.LocalSource(os.path.dirname(__file__)))
### Test 1 ###
# myKeyword is in the top level of the schema and is an object
user_schema = jschon.JSONSchema({
"$schema": "https://example.com/myKeyword/meta.schema.json",
"$id": "https://example.com/mySchema",
"type": "object",
"myKeyword": {},
"properties": {
"favoriteNumber": {
"type": "integer",
}
}
})
assert(user_schema.validate().valid is True) # Passes
### Test 2 ###
# myKeyword is in the top level of the schema and is NOT an object
user_schema = jschon.JSONSchema({
"$schema": "https://example.com/myKeyword/meta.schema.json",
"$id": "https://example.com/mySchema",
"type": "object",
"myKeyword": [],
"properties": {
"favoriteNumber": {
"type": "integer",
}
}
})
assert(user_schema.validate().valid is False) # Passes
### Test 3 ###
# myKeyword is NOT in the top level of the schema and is an object
user_schema = jschon.JSONSchema({
"$schema": "https://example.com/myKeyword/meta.schema.json",
"$id": "https://example.com/mySchema",
"type": "object",
"properties": {
"favoriteNumber": {
"type": "integer",
"myKeyword": {}
}
}
})
assert(user_schema.validate().valid is True) # Passes (but trivially, see below)
### Test 4 ###
# myKeyword is NOT in the top level of the schema and is NOT an object
user_schema = jschon.JSONSchema({
"$schema": "https://example.com/myKeyword/meta.schema.json",
"$id": "https://example.com/mySchema",
"type": "object",
"properties": {
"favoriteNumber": {
"type": "integer",
"myKeyword": []
}
}
})
assert(user_schema.validate().valid is False) # Fails!
# Essentially, when "myKeyword" is not in the top level, it is always marked as valid because
# for some reason the vocab.schema.json is not checked. This is why Test 3 passes but I called
# it trivial.
One of the JSON Schema authors has some great blog posts about this very topic.
https://blog.json-everything.net/posts/updating-vocabs/
https://docs.json-everything.net/schema/vocabs/data-2023/
Looking at your schema, it seems your vocab schema has quite a few unnecessary definitions
I'm fairly certain this should get you where you want to be.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/myKeyword/vocab.schema.json",
"$defs": {
"myKeyword": {}
},
"title": "my keyword for parallel arrays",
"$ref": "#/$defs/myKeyword"
}
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schema/meta/myKeyword",
"$dynamicAnchor": "meta",
"$vocabulary": {
<All core vocabs>,
"https://example.com/meta/myKeyword/vocab.schema.json": true
},
"allOf": [
{"$ref": "https://json-schema.org/draft/2020-12/schema"},
{"$ref": "https://example.com/myKeyword/vocab.schema.json"}]
}