I'm trying to validate a JSON file that is provided by a user. The JSON will contain certain fixed keys, but also contain some user-defined keys too. I want to validate that this JSON object contains these fixed keys, in a certain format, and the user-defined keys are in a certain format too (as these keys will always have values in a defined format).
I came across this post Validate JSON data using python, but the documentation for jsonschema.validate
doesn't really show anything to do with user-defined keys, and also how to define if a key should have a list of dicts, or a dict which its key-values must be of a list of dicts.
Here's a sample schema:
{
"a": "some value",
"b": "some value",
"c": {
"custom_a": [{...}],
"custom_b": [{...}]
},
"d": [{...}]
}
I have tried doing the following:
import json
from jsonschema import validate
my_json = json.loads(<JSON String following above pattern>)
schema = {
"a" : {"type": "string"},
"b" : {"type": "string"},
"c" : {[{}]},
"d": [{}]
}
validate(instance=my_json, schema=schema) #raises TypeError on "c" and "d" in schema spec
I have also tried the following schema spec, but I get stuck on how to handle the custom keys, and also nested lists within dicts, etc.
schema = {
"a" : {"type": "string"},
"b" : {"type": "string"},
"c" : {
"Unsure what to define here": {"type": "list"} #but this is a list of dicts
},
"d": {"type": "list"} #but this is a list of dicts
}
There are several Python libraries available for validating JSON data, especially when it comes to complex schemas with fixed and user-defined keys. Here are some commonly used libraries, each with unique strengths and options for managing dynamic structures.
The most common are-
Using jsonschema,
from jsonschema import validate, ValidationError
# Define JSON Schema
schema = {
"type": "object",
"properties": {
"a": {"type": "string"},
"b": {"type": "string"},
"c": {
"type": "object",
"patternProperties": {
"^custom_": { # Any key in "c" must start with "custom_"
"type": "array",
"items": {"type": "object"}
}
},
"additionalProperties": False
},
"d": {
"type": "array",
"items": {"type": "object"}
}
},
"required": ["a", "b", "c", "d"],
"additionalProperties": False
}
# Sample JSON data
data = {
"a": "some value",
"b": "another value",
"c": {
"custom_a": [{"key1": "value1"}, {"key2": "value2"}],
"custom_b": [{"key3": "value3"}]
},
"d": [{"key4": "value4"}, {"key5": "value5"}]
}
# Validate the JSON data
try:
validate(instance=data, schema=schema)
print("Validation successful!")
except ValidationError as e:
print("Validation failed:", e.message)
Using marshmallow,
from marshmallow import Schema, fields, validate, ValidationError
class CustomEntrySchema(Schema):
# This allows any string keys and values in each dictionary
class Meta:
unknown = 'include'
class MainSchema(Schema):
a = fields.String(required=True)
b = fields.String(required=True)
c = fields.Dict(
keys=fields.String(validate=validate.Regexp(r'^custom_')),
values=fields.List(fields.Nested(CustomEntrySchema)),
required=True
)
d = fields.List(fields.Nested(CustomEntrySchema), required=True)
# Sample JSON data
data = {
"a": "some value",
"b": "another value",
"c": {
"custom_a": [{"key1": "value1"}, {"key2": "value2"}],
"custom_b": [{"key3": "value3"}]
},
"d": [{"key4": "value4"}, {"key5": "value5"}]
}
# Validate the JSON data
schema = MainSchema()
try:
schema.load(data)
print("Validation successful!")
except ValidationError as e:
print("Validation failed:", e.messages)
Using pydantic,
from pydantic import BaseModel, Field, ValidationError, RootModel, model_validator
from typing import List, Dict
import re
class CustomEntryModel(RootModel[Dict[str, str]]):
"""This allows arbitrary key-value pairs in each entry of 'c' and 'd'."""
class MainModel(BaseModel):
a: str
b: str
c: Dict[str, List[CustomEntryModel]] # We'll validate keys in 'c' manually
d: List[CustomEntryModel]
@model_validator(mode="before")
def validate_custom_keys(cls, values):
# Check that all keys in 'c' start with "custom_"
c_data = values.get("c", {})
for key in c_data:
if not re.match(r'^custom_', key):
raise ValueError(f"Key '{key}' in 'c' must start with 'custom_'")
return values
# Sample JSON data
data = {
"a": "some value",
"b": "another value",
"c": {
"custom_a": [{"key1": "value1"}, {"key2": "value2"}],
"custom_b": [{"key3": "value3"}]
},
"d": [{"key4": "value4"}, {"key5": "value5"}]
}
# Validate the JSON data
try:
model = MainModel(**data)
print("Validation successful!")
except ValidationError as e:
print("Validation failed:", e)
Output when I ran all of them at once
Validation successful!
Validation successful!
Validation successful!