pythonjsonjsonschema

Validate JSON Schema which has fixed keys and user defined keys in Python


I'm trying to validate a JSON file that is provided by a user. The JSON will contain certain fixed keys, but also contain some user-defined keys too. I want to validate that this JSON object contains these fixed keys, in a certain format, and the user-defined keys are in a certain format too (as these keys will always have values in a defined format).

I came across this post Validate JSON data using python, but the documentation for jsonschema.validate doesn't really show anything to do with user-defined keys, and also how to define if a key should have a list of dicts, or a dict which its key-values must be of a list of dicts.

Here's a sample schema:

{
    "a": "some value",
    "b": "some value",
    "c": {
        "custom_a": [{...}],
        "custom_b": [{...}]
    },
    "d": [{...}]
}

I have tried doing the following:

import json
from jsonschema import validate

my_json = json.loads(<JSON String following above pattern>) 

schema = {
    "a" : {"type": "string"},
    "b" : {"type": "string"},
    "c" : {[{}]},
    "d": [{}]
}

validate(instance=my_json, schema=schema) #raises TypeError on "c" and "d" in schema spec

I have also tried the following schema spec, but I get stuck on how to handle the custom keys, and also nested lists within dicts, etc.

schema = {
    "a" : {"type": "string"},
    "b" : {"type": "string"},
    "c" : {
        "Unsure what to define here": {"type": "list"} #but this is a list of dicts
    },
    "d": {"type": "list"} #but this is a list of dicts
}

Solution

  • There are several Python libraries available for validating JSON data, especially when it comes to complex schemas with fixed and user-defined keys. Here are some commonly used libraries, each with unique strengths and options for managing dynamic structures.

    The most common are-

    Using jsonschema,

    from jsonschema import validate, ValidationError
    
    # Define JSON Schema
    schema = {
        "type": "object",
        "properties": {
            "a": {"type": "string"},
            "b": {"type": "string"},
            "c": {
                "type": "object",
                "patternProperties": {
                    "^custom_": {  # Any key in "c" must start with "custom_"
                        "type": "array",
                        "items": {"type": "object"}
                    }
                },
                "additionalProperties": False
            },
            "d": {
                "type": "array",
                "items": {"type": "object"}
            }
        },
        "required": ["a", "b", "c", "d"],
        "additionalProperties": False
    }
    
    # Sample JSON data
    data = {
        "a": "some value",
        "b": "another value",
        "c": {
            "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
            "custom_b": [{"key3": "value3"}]
        },
        "d": [{"key4": "value4"}, {"key5": "value5"}]
    }
    
    # Validate the JSON data
    try:
        validate(instance=data, schema=schema)
        print("Validation successful!")
    except ValidationError as e:
        print("Validation failed:", e.message)
    

    Using marshmallow,

    from marshmallow import Schema, fields, validate, ValidationError
    
    class CustomEntrySchema(Schema):
        # This allows any string keys and values in each dictionary
        class Meta:
            unknown = 'include'
    
    class MainSchema(Schema):
        a = fields.String(required=True)
        b = fields.String(required=True)
        c = fields.Dict(
            keys=fields.String(validate=validate.Regexp(r'^custom_')),
            values=fields.List(fields.Nested(CustomEntrySchema)),
            required=True
        )
        d = fields.List(fields.Nested(CustomEntrySchema), required=True)
    
    # Sample JSON data
    data = {
        "a": "some value",
        "b": "another value",
        "c": {
            "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
            "custom_b": [{"key3": "value3"}]
        },
        "d": [{"key4": "value4"}, {"key5": "value5"}]
    }
    
    # Validate the JSON data
    schema = MainSchema()
    try:
        schema.load(data)
        print("Validation successful!")
    except ValidationError as e:
        print("Validation failed:", e.messages)
    

    Using pydantic,

    from pydantic import BaseModel, Field, ValidationError, RootModel, model_validator
    from typing import List, Dict
    import re
    
    class CustomEntryModel(RootModel[Dict[str, str]]):
        """This allows arbitrary key-value pairs in each entry of 'c' and 'd'."""
    
    class MainModel(BaseModel):
        a: str
        b: str
        c: Dict[str, List[CustomEntryModel]]  # We'll validate keys in 'c' manually
        d: List[CustomEntryModel]
    
        @model_validator(mode="before")
        def validate_custom_keys(cls, values):
            # Check that all keys in 'c' start with "custom_"
            c_data = values.get("c", {})
            for key in c_data:
                if not re.match(r'^custom_', key):
                    raise ValueError(f"Key '{key}' in 'c' must start with 'custom_'")
            return values
    
    # Sample JSON data
    data = {
        "a": "some value",
        "b": "another value",
        "c": {
            "custom_a": [{"key1": "value1"}, {"key2": "value2"}],
            "custom_b": [{"key3": "value3"}]
        },
        "d": [{"key4": "value4"}, {"key5": "value5"}]
    }
    
    # Validate the JSON data
    try:
        model = MainModel(**data)
        print("Validation successful!")
    except ValidationError as e:
        print("Validation failed:", e)
    

    Output when I ran all of them at once

    Validation successful!
    Validation successful!
    Validation successful!