mysqlelasticsearchjdbclogstashlogstash-jdbc

ELK data ingestion: elasticsearch treat text as boolean even though mapping says type is 'text'


I'm using ELK 7.4.1 with docker from here, and I need to ingest data from MySQL database. One of the tables has this 'status' field defined as varchar(128). I used logstash jdbc plugin for this purpose, but when I started the docker image I saw a lot of warning messages saying org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [status] of type [boolean] in document with id '34ZXb24BsfR1FhttyYWt'. Preview of field's value: 'Success'". What puzzles me, however, is that the mapping seems to be correct: "status": { "type": "text", ... } and the data seems to be successfully ingested.

I even tried to manually create the index, and then put the mapping before ingesting the data, but that didn't help either.

Any ideas why?

Adding more info:

Table definition

CREATE TABLE records (
  id int(11) NOT NULL AUTO_INCREMENT,
  ...
  status varchar(128) NOT NULL DEFAULT '',
  ...
)

Elasticsearch mapping

{
 "properties": {
 ...
  "status": {
      "type": "text",
      "fields": {
          "keyword": {
              "type": "keyword",
              "ignore_above": 256
          }
      }
  },
 ...
}

Data example

+-----------+-----------+
| id        | status    |
+-----------+-----------+
| 452172830 | success   |
| 452172835 | other     |
| 452172840 | success   |
...

More info Elasticsearch mapping template

PUT /_template/records_template
{
  "index_patterns": ["records"],
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
        "status": {
          "type": "text",
          "fields": {
              "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
              }
        }
      }
    }
  }
}

Logstash conf

input {
    jdbc {
        tags => "records"
        jdbc_connection_string => "jdbc:mysql://10.0.2.15:3306/esd"
        jdbc_user => "dbuser"
        jdbc_password => "dbpass"
        schedule => "* * * * *"
        jdbc_validate_connection => true
        jdbc_paging_enabled => true
        jdbc_page_size => 100000
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        statement => "select * from records order by id asc"
    }
    ...
}
output {
    if "records" in [tags] {
        elasticsearch {
                hosts => "elasticsearch:9200"
                user => "elastic"
                password => "changeme"
                index => "records"
                template_name => "records_template"
                document_id => "%{id}"
        }
    }
    ...

Solution

  • Looks like the field name matters somehow. If I change the select clause to something like select status as rd_status, ... then all the errors are gone. Not sure if there's something I miss with elasticsearch mapping or logstash internally will try to guess the data type by the name status (I'd be surprised if it's the latter case)