jsongoogle-cloud-platformavrogoogle-cloud-pubsub

Validating PubSub message against AVRO JSON schema with multiple union types


I'm having trouble publishing messages to a new pubsub topic related to the AVRO schema. I publish a message from PHP using the Google\Cloud\PubSub\PubSubClient library and I get an error:

{
  "error": {
    "code": 400,
    "message": "Invalid data in message: Message failed schema validation.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "INVALID_JSON_AVRO_MESSAGE",
        "domain": "pubsub.googleapis.com",
        "metadata": {
          "message": "Message failed schema validation",
          "revisionInfo": "Could not validate message with any schema revision for schema: projects/foo-project/schemas/foo-schema, last checked revision: revision_id=foo-revision-id failed with status: Invalid data in message: JSON object with type string does not match schema which expected object."
        }
      }
    ]
  }
}

I tried to validate my message in Google Cloud Console https://console.cloud.google.com/cloudpubsub/schema/detail/foo-schema?project=foo-project using UI Test message, but all combinations return error: Invalid JSON -encoded message against Avro schema. without any details.

Adding optional fields with null value doesn't work, wrapping action_type inside action field doesn't help. Adding the nested "name": null inside account object doesn't help either, nor does any combination of the above. I'm quite desperate now.

Interesting fact - According to avro_validator, the message has the correct format.

This is my example message:

{
    "action": "create",
    "url": "https://my-api.com/resource/new_resource_name",
    "operation": "created",
    "callback_url": "https://my-another-api/com/resource/new_resource_name",
    "name": "new_resource_name",
    "source": "service_name",
    "account": {"number": 2830602},
    "operation_metadata": "{\"created_on\":\"2024-06-24T08:47:14+00:00\"}"
}

This is the schema I've created in GCP:

{
  "fields": [
    {
      "name": "action",
      "type": [
        "null",
        {
          "name": "action_type",
          "symbols": [
            "create",
            "another_action_type",
            "another_action_type2",
            "another_action_type3"
          ],
          "type": "enum"
        }
      ]
    },
    {
      "name": "url",
      "type": "string"
    },
    {
      "name": "operation",
      "type": {
        "name": "operation_type",
        "symbols": [
          "created",
          "another_operation_type",
          "another_operation_type2",
          "another_operation_type3"
        ],
        "type": "enum"
      }
    },
    {
      "name": "callback_url",
      "type": "string"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "default": "default_service_name",
      "name": "source",
      "type": {
        "name": "source_service",
        "symbols": [
          "service_name1",
          "service_name2"
        ],
        "type": "enum"
      }
    },
    {
      "default": null,
      "name": "homepage_url",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "account",
      "type": [
        "null",
        {
          "fields": [
            {
              "default": null,
              "name": "number",
              "type": [
                "null",
                "int"
              ]
            },
            {
              "default": null,
              "name": "name",
              "type": [
                "null",
                "string"
              ]
            }
          ],
          "name": "account_record",
          "type": "record"
        }
      ]
    },
    {
      "default": null,
      "name": "cluster",
      "type": [
        "null",
        {
          "fields": [
            {
              "default": null,
              "name": "number",
              "type": [
                "null",
                "int"
              ]
            }
          ],
          "name": "cluster_record",
          "type": "record"
        }
      ]
    },
    {
      "default": null,
      "name": "type",
      "type": [
        "null",
        {
          "name": "environment_type",
          "symbols": [
            "DEVELOPMENT",
            "STAGING",
            "PRODUCTION"
          ],
          "type": "enum"
        }
      ]
    },
    {
      "default": null,
      "name": "error",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "operation_metadata",
      "type": [
        "null",
        "string"
      ]
    }
  ],
  "name": "MyFooEvents",
  "type": "record"
}

If anyone has an idea, please give me a hint.


Solution

  • The message has several issues:

    1. It does not conform to the JSON encoding rules for Avro messages. When encoding unions, you must provide the type as a nested object.
    2. "service_name" is not a valid enum value for the "source" field.
    3. Several fields are missing. Even when nullable, they must be present in JSON.

    Here is a valid version of the message:

    {
        "action": {
          "action_type": "create"
        },
        "url": "https://my-api.com/resource/new_resource_name",
        "operation": "created",
        "callback_url": "https://my-another-api/com/resource/new_resource_name",
        "homepage_url": null,
        "name": "new_resource_name",
        "source": "service_name1",
        "account": {
          "account_record": {
            "number": {
              "int": 2830602
            },
            "name": null
          }
        },
        "cluster": null,
        "type": null,
        "operation_metadata": {
          "string": "{\"created_on\":\"2024-06-24T08:47:14+00:00\"}"
        },
        "error": null
    }