javascriptjsonlunrjs

Why won't lunr index multiple word strings in JSON arrays?


Lunr is doing a great job finding most results, but I can't figure out why it won't return multi-word strings contained in JSON arrays.

Here's a sample JSON file to get a sense of how my data is structured:

[{
    "title": "Rolling Loud",
    "date": "May 5–7",
    "location": "Miami, FL, USA",
    "rock-artists": [],
    "hh-artists": ["Kendrick Lamar", "Future"],
    "electronic-artists": [],
    "other-artists": []
}]

When I search for "Miami" and "Future", lunr returns the festival. However when searching for "Kendrick" or "Kendrick Lamar", lunr doesn't return the festival.

Relevant code:

// initialize lunr
var idx = lunr(function () {
    this.field('id');
    this.field('title', { boost: 3 });
    this.field('date');
    this.field('location');
    this.field('rockArtists', { boost: 3 });
    this.field('hhArtists', { boost: 3 });
    this.field('electronicArtists', { boost: 3 });
    this.field('otherArtists', { boost: 3 });

    // add festivals to lunr
    for (var key in data) {
        this.add({
           'id': key,
           'title': data[key].title,
           'date': data[key].date,
           'location': data[key].location,
           'rockArtists': data[key]['rock-artists'],
           'hhArtists': data[key]['hh-artists'],
           'electronicArtists': data[key]['electronic-artists'],
           'otherArtists': data[key]['other-artists']
        });
    }
});

Thanks!


Solution

  • Lunr is indexing the hh-artists field, you should be able to confirm this by looking for one of the values in the index:

    idx.invertedIndex['Kendrick Lamar']
    

    When a document field is an array, lunr assumes that the elements of the array are already split into tokens for indexing. So instead of adding "Kendrick" and "Lamar" to the index as separate tokens "Kendrick Lamar" is added as a single token.

    This causes issues when trying to search, because searching for "Kendrick Lamar" is actually searching for "Kendrick" OR "Lamar" since the search string is split on spaces to get tokens. Neither "Kendrick" nor "Lamar" are in the index and so there are no results.

    To get the results you are hoping for you can convert the array into a string and let lunr handle splitting it into tokens:

    this.add({
      'hhArtists': data[key]['hh-artists'].join(' ')
    })