elasticsearchlogstashlogstash-configurationlogstash-jdbc

SQL Server database > Logstash > Elasticsearch: map result set records related to the same entity to same ES document


I have question and answers entities represented in a SQL Server database as 2 tables Questions and Answers (see below). The relationship between them is OneToMany.

Questions table

Id      Title
-------------------
 1      Question 1
 2      Question 2

Answers table:

Id    Answer        Question_Id
-------------------------------
1     answer 1      1
2     answer 2      1
3     answer 3      1
4     answer 4      2
5     answer 5      2

I would like to get the ES document with a structure provided below, after moving the data through the Logstash pipeline:

{
  “questionId": 1,
  "questionTitle": "Question 1",
  "questionAnswers": [
    {
      “answerId": 1,
      "answer": "answer 1"
    },
    {
      "answerId": 2,
      "answer": "answer 2"
    },
    {
      "answerId": 3,
      "answer": "answer 3"
    }
  ]
}

{
  "questionId": 2,
  "questionTitle": "Question 2",
  "questionAnswers": [
    {
      "answerId": 4,
      "answer": "answer 4"
    },
    {
      "answerId": 5,
      "answer": "answer 5"
    }
  ]
}

The logstash jdbc input plugin setup uses the Question_Answers view to retrieve the data.

{
  jdbc {
    type => “Test_1”
    jdbc_connection_string => "jdbc:sqlserver://myinstance:1433"
    jdbc_user => “root”
    jdbc_password => “root”
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_driver_library => "/home/abury/enu/mssql-jdbc-6.2.2.jre8.jar"
    schedule => "*/3 * * * *"
    statement => "SELECT * from Question_Answers"
  }
}

The result set returned by view looks like this:

questionId  questionTitle   answerId    answer
1           Question 1      1           answer 1
1           Question 1      2           answer 2
1           Question 1      3           answer 3
2           Question 2      4           answer 4
2           Question 2      5           answer 5

The Elasticsearch output plugin setup looks like this:

output {
    elasticsearch {
    hosts => "http://localhost:9200"
    index => "question"
    document_id => "%{questionId}"
  }
}

Question: How can I setup Logstash to identify records related to the same question and build ES document with desired structure provided above? Is it possible to add some aggregation logic into the output.conf file to achieve desired behavior? Or I need to re-write my DB view to return single record for each question:

questionId  questionTitle   answerId    answer
---------------------------------------------------------------------
1           Question 1      1, 2, 3     answer 1, answer 2, answer 3

UPDATED: fix typo in column names


Solution

  • I was able to get the desired structure of Elasticsearch document by using the logstash aggregate filter plugin (see, Example 4):

    filter {
        aggregate {
            task_id => "%{questionId}"
            code => "
                   map['questionId'] ||= event.get('questionid')
                   map['questionTitle'] ||= event.get('questiontitle')
                   
                   map['questionAnswers'] ||= []
                   map['questionAnswers'] << {'answerId' => event.get('answerid'), 'answer' => event.get('answer')}
    
                   event.cancel()
                 "
            push_previous_map_as_event => true
            timeout => 3
        }
    }