Now there is a JDBC task read from db as
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://${MYSQL_MAIN_HOST}/${MYSQL_DATABASE}"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_page_size => 10000
jdbc_paging_enabled => true
jdbc_password => "${MYSQL_PASSWORD}"
jdbc_user => "${MYSQL_USER}"
schedule => "0 1 * * *"
statement_filepath => "/usr/share/logstash/pipeline/sql/select_posts.sql"
tracking_column => "updated_at"
tracking_column_type => "numeric"
use_column_value => true
last_run_metadata_path => "/usr/share/logstash/jdbc_last_run/select_posts_last_value"
}
}
/usr/share/logstash/pipeline/sql/select_posts.sql
SELECT
id,
title,
body
FROM posts
This task is heavy when the body
item goes to very large data. So I want to remove it at the first search as:
SELECT
id,
title
FROM posts
Then get the IDs and use them to find body
again.
SELECT
body
FROM posts
WHERE id in (IDs)
To set to output target. Even update the output target is also okay.
So can logstash read statement by statement in this case?
If you want to execute two separate sql queries using logstash, you can do it as follows
SELECT
id,
title
FROM posts
SELECT
body
FROM posts
WHERE id in (IDs)
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://${MYSQL_MAIN_HOST}/${MYSQL_DATABASE}"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_page_size => 10000
jdbc_paging_enabled => true
jdbc_password => "${MYSQL_PASSWORD}"
jdbc_user => "${MYSQL_USER}"
schedule => "0 1 * * *"
statement_filepath => <PATH TO query1.sql> -----query1.sql
tracking_column => "updated_at"
tracking_column_type => "numeric"
use_column_value => true
last_run_metadata_path => "/usr/share/logstash/jdbc_last_run/select_posts_last_value"
}
jdbc {
jdbc_connection_string => "jdbc:mysql://${MYSQL_MAIN_HOST}/${MYSQL_DATABASE}"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_page_size => 10000
jdbc_paging_enabled => true
jdbc_password => "${MYSQL_PASSWORD}"
jdbc_user => "${MYSQL_USER}"
schedule => "0 1 * * *"
statement_filepath => "<PATH TO query2.sql>" -----query2.sql
tracking_column => "updated_at"
tracking_column_type => "numeric"
use_column_value => true
last_run_metadata_path => "/usr/share/logstash/jdbc_last_run/select_posts_last_value"
}
}