I am using an elk stack via docker-compose with ES version 8.4.0
My goal is to use Logstash to copy an entire table from my MYSQL DB to ES. The connection works and Logstash copies about 30 entries with no problems. But then I get a long error message.
[2022-09-10T18:41:26,318][ERROR][logstash.outputs.elasticsearch][main][757e3825fce0788f949869472d03e028630de9d063200717b56bc9ceefe29d81] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in write'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in
block in bulk'", "org/jruby/RubyArray.java:1865:in each'", "org/jruby/RubyEnumerable.java:1143:in
each_with_index'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in
safe_bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in
retrying_submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in
multi_receive'", "D:/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}
I suspect this error is the reason: {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError
My config file look like that:
jdbc {
clean_run => true
jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/posts"
jdbc_user => "sqluser"
jdbc_password => "sqlpassword"
schedule => "* * * * *"
statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added
FROM telegram.channel_results where id >:sql_last_value;"
use_column_value => true
tracking_column => "id"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "posts"
user => "username"
password => "password"
}
stdout {
codec => rubydebug
}
}
I notice that if I removed the text column from the query, the process runs without any problems. In my database, the text column is of SQL type text. I suspect an encoding problem because there are also Russian texts and emotes included. I need a solution to also copy the texts in ES. Maybe it is a encoding problem with emotes and other characters in the text ?!
Try this below charset encoding input filter.
jdbc {
clean_run => true
jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/posts"
jdbc_user => "sqluser"
jdbc_password => "sqlpassword"
schedule => "* * * * *"
statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added
FROM telegram.channel_results where id >:sql_last_value;"
use_column_value => true
tracking_column => "id"
columns_charset => {
"text" => "ISO-8859-5"
}
}