jsonelasticsearch

How to 'join' two indexes in ElasticSearch


I have two indexes that must be separated:

// index = `order_item`
{
    "ID": 1,
    "Name": "Shoes",
    "Price": 9.99,
    "OrderID": 82
},{
    "ID": 1,
    "Name": "Hat",
    "Price": 19.99,
    "OrderID": 82
}

// index = `order`
{
    "ID": 82,
    "Customer": "John Smith"
}

How would I 'join' these two tables on a search, such that it would return something along the lines of:

results = {
    "ID": 1,
    "Name": "Shoes",
    "Price": 9.99,
    "Order.ID": 82,
    "Customer": "John Smith"
},{
    "ID": 1,
    "Name": "Hat",
    "Price": 19.99,
    "Order.ID": 82,
    "Customer": "John Smith"
}

Solution

  • As answered in your other question, nothing prevents you from storing the Customer name inside each order_item document at indexing time, while still having a dedicated index orders also containing the Customer data. Remember that it's all about cleverly denormalizing your data so that each of your documents be as "self-contained" as you need.

    curl -XPUT localhost:9200/order_items/order_item/1 -d '{
        "ID": 1,
        "Name": "Shoes",
        "Price": 9.99,
        "OrderID": 82,
        "Customer": "John Smith"
    }'
    
    curl -XPUT localhost:9200/order_items/order_item/2 -d '{
        "ID": 2,
        "Name": "Hat",
        "Price": 19.99,
        "OrderID": 82,
        "Customer": "John Smith"
    }
    

    The advantages of this solution is that each order item is completely self-contained, and you can group/aggregate them on OrderID in order to get all items of a given order.

    Also, as @JohnAment mentioned in his comment, the order/order_item use case is also a good candidate for using either

    1. the parent/child relationship
    2. or nested objects.

    In the first case, you'd have one order "parent" document...

    curl -XPUT localhost:9200/orders/order/82 -d '{
        "ID": 82,
        "Customer": "John Smith"
    }'
    

    And several order_item "children" documents that you index using their parent ID:

    curl -XPUT localhost:9200/order_items/order_item/1?parent=82 -d '{
         "ID": 1,
         "Name": "Shoes",
         "Price": 9.99
    }'
    curl -XPUT localhost:9200/order_items/order_item/2?parent=82 -d '{
         "ID": 2,
         "Name": "Hat",
         "Price": 19.99
    }'
    

    In the second case, your order document would contain all order items in a nested OrderItems property and would look like this:

    curl -XPUT localhost:9200/orders/order/82 -d '{
        "ID": 82,
        "Customer": "John Smith",
        "OrderItems": [
          {
            "ID": 1,
            "Name": "Shoes",
            "Price": 9.99
          },{
            "ID": 2,
            "Name": "Hat",
            "Price": 19.99
          }
        ]
    }'