I have question and answers entities represented in a SQL Server database as 2 tables Questions
and Answers
(see below). The relationship between them is OneToMany
.
Questions
table
Id Title
-------------------
1 Question 1
2 Question 2
Answers
table:
Id Answer Question_Id
-------------------------------
1 answer 1 1
2 answer 2 1
3 answer 3 1
4 answer 4 2
5 answer 5 2
I would like to get the ES document with a structure provided below, after moving the data through the Logstash pipeline:
{
“questionId": 1,
"questionTitle": "Question 1",
"questionAnswers": [
{
“answerId": 1,
"answer": "answer 1"
},
{
"answerId": 2,
"answer": "answer 2"
},
{
"answerId": 3,
"answer": "answer 3"
}
]
}
{
"questionId": 2,
"questionTitle": "Question 2",
"questionAnswers": [
{
"answerId": 4,
"answer": "answer 4"
},
{
"answerId": 5,
"answer": "answer 5"
}
]
}
The logstash jdbc input plugin setup uses the Question_Answers view to retrieve the data.
{
jdbc {
type => “Test_1”
jdbc_connection_string => "jdbc:sqlserver://myinstance:1433"
jdbc_user => “root”
jdbc_password => “root”
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_driver_library => "/home/abury/enu/mssql-jdbc-6.2.2.jre8.jar"
schedule => "*/3 * * * *"
statement => "SELECT * from Question_Answers"
}
}
The result set returned by view looks like this:
questionId questionTitle answerId answer
1 Question 1 1 answer 1
1 Question 1 2 answer 2
1 Question 1 3 answer 3
2 Question 2 4 answer 4
2 Question 2 5 answer 5
The Elasticsearch output plugin setup looks like this:
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "question"
document_id => "%{questionId}"
}
}
Question: How can I setup Logstash to identify records related to the same question and build ES document with desired structure provided above? Is it possible to add some aggregation logic into the output.conf file to achieve desired behavior? Or I need to re-write my DB view to return single record for each question:
questionId questionTitle answerId answer
---------------------------------------------------------------------
1 Question 1 1, 2, 3 answer 1, answer 2, answer 3
UPDATED: fix typo in column names
I was able to get the desired structure of Elasticsearch document by using the logstash aggregate filter plugin (see, Example 4):
filter {
aggregate {
task_id => "%{questionId}"
code => "
map['questionId'] ||= event.get('questionid')
map['questionTitle'] ||= event.get('questiontitle')
map['questionAnswers'] ||= []
map['questionAnswers'] << {'answerId' => event.get('answerid'), 'answer' => event.get('answer')}
event.cancel()
"
push_previous_map_as_event => true
timeout => 3
}
}