node.jsmongodbmongoosetext-search

Mongoose text-search with partial string


Hi i'm using mongoose to search for persons in my collection.

/*Person model*/
{
    name: {
       first: String,
       last: String
    }
}

Now i want to search for persons with a query:

let regex = new RegExp(QUERY,'i');

Person.find({
   $or: [
      {'name.first': regex},
      {'name.last': regex}
   ]
}).exec(function(err,persons){
  console.log(persons);
});

If i search for John i get results (event if i search for Jo). But if i search for John Doe i am not getting any results obviously.

If i change QUERY to John|Doe i get results, but it returns all persons who either have John or Doe in their last-/firstname.

The next thing was to try with mongoose textsearch:

First add fields to index:

PersonSchema.index({
   name: {
      first: 'text',
      last: 'text'
   }
},{
   name: 'Personsearch index',
   weights: {
      name: {
          first : 10,
          last: 10
   }
}
});

Then modify the Person query:

Person.find({ 
    $text : { 
        $search : QUERY
    } 
},
{ score:{$meta:'textScore'} })
.sort({ score : { $meta : 'textScore' } })
.exec(function(err,persons){
    console.log(persons);
});

This works just fine! But now it is only returning persons that match with the whole first-/lastname:

-> John returns value

-> Jo returns no value

Is there a way to solve this?

Answers without external plugins are preferred but others are wished too.


Solution

  • You can do this with an aggregate pipeline that concatenates the first and last names together using $concat and then searches against that:

    let regex = new RegExp(QUERY,'i');
    
    Person.aggregate([
        // Project the concatenated full name along with the original doc
        {$project: {fullname: {$concat: ['$name.first', ' ', '$name.last']}, doc: '$$ROOT'}},
        {$match: {fullname: regex}}
    ], function(err, persons) {
        // Extract the original doc from each item
        persons = persons.map(function(item) { return item.doc; });
        console.log(persons);
    });
    

    Performance is a concern, however, as this can't use an index so it will require a full collection scan.

    You can mitigate that by preceding the $project stage with a $match query that can use an index to reduce the set of docs the rest of the pipeline needs to look at.

    So if you separately index name.first and name.last and then take the first word of your search string as an anchored query (e.g. /^John/i), you could prepend the following to the beginning of your pipeline:

    {$match: $or: [
      {'name.first': /^John/i},
      {'name.last': /^John/i}
    ]}
    

    Obviously you'd need to programmicatically generate that "first word" regex, but hopefully it gives you the idea.