mongodb-kafka-connector

The official MongoDB Kafka source connector does not publish clean extended JSON


I've setup a pretty simple mongo kafka source connector to stream mongo's oplog to kafka. However, I see that in the messages published by the connector, the serialized oplog events do not respect the extended JSON spec; for instance, a datetime field is represented as:

{"$date": 1597841586927}

When the spec says it should be formatted as:

{"$date": {"$numberLong": "1597841586927"}}

Why am I not getting clean extended JSON?

Note: my connector config file looks like this:

{
  "name": "mongosource",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
    "tasks.max": 1,
    "connection.uri": "...",
    "topic.prefix":"mongosource",
    "database": "mydb",
    "copy.existing": true,
    "change.stream.full.document": "updateLookup",
  }
}

Solution

  • The default json formatter of the source connector is a legacy one (see this issue on the connector's JIRA project).

    From version 1.3.0 of this connector, there's a new config option that you can add to ask the connector to output proper extended JSON:

    "output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.ExtendedJson"