djangofull-text-searchdjango-haystack

Django-haystack document=True field


As far as I understood from reading about full-text search engines, Document is actually the entire entity we are looking for, not just a single field within it... So this "document=True field" approach seems somewhat confusing

As the docs says

This indicates to both Haystack and the search engine about which field is the primary field for searching within.

Ok but here is my (and quite common I suppose) use-case: There are 2 fields within the model - title and some description. Title is definately the primary one and I'd like matches on that field to be of higher weight than the others.

There is such a mechanism as Field Boost that should help in achieving that goal but the example provided with documentation on that matter is even more confusing:

class NoteSearchIndex(indexes.SearchIndex, indexes.Indexable): 
    text = indexes.CharField(document=True, use_template=True) 
    title = indexes.CharField(model_attr='title', boost=1.125)

So we see that 'title' field is boosted but it doesn't have a 'document=True' on it. But it is the primary field. The previous quote said the primary field should have 'document=True'...

And also, what should I place into that 'document=True' field? Should it be some concatenation of all relevant fields on the model or maybe all but the 'title' field since I've already declared it separately?

Would appreciate a more precise explanation of what 'document=True' field actually is


Solution

  • Haystack builds its index of text, by examining a “document” of text for each model instance. Commonly, you need that document to be aggregated from multiple model fields; if so, you can use a template.

    Every SearchIndex requires there be one (and only one) field with document=True. This indicates to both Haystack and the search engine about which field is the primary field for searching within.

    Note that it's not referring to a model field there! It is the SearchIndex which has the field with document=True. Conventionally, that field is named text.

    The SearchIndex does not point simply to the Model, as you're aware. You define a Schema that describes what fields will be indexed. This is different from what fields are on the Model!

    If you have this model:

    from django.db import models
    from django.contrib.auth.models import User
    
    
    class Note(models.Model):
        user = models.ForeignKey(User)
        pub_date = models.DateTimeField()
        title = models.CharField(max_length=200)
        body = models.TextField()
    
        def __str__(self):
            return self.title
    

    You might design this SearchIndex, which specifies the schema of fields to index:

    import datetime
    from haystack import indexes
    from myapp.models import Note
    
    
    class NoteIndex(indexes.SearchIndex, indexes.Indexable):
        text = indexes.CharField(document=True, use_template=True)
        author = indexes.CharField(model_attr='user')
        pub_date = indexes.DateTimeField(model_attr='pub_date')
    
        def get_model(self):
            return Note
    
        def index_queryset(self, using=None):
            """Used when the entire index for model is updated."""
            return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
    

    what should I place into that 'document=True' field? Should it be some concatenation of all relevant fields on the model or maybe all but the 'title' field since I've already declared it separately?

    Yes.

    That use_template=True argument tells Haystack that the field content should be rendered using a template specifically for that SearchIndex field. The template name Haystack looks for is search/indexes/{app_label}/{model_name}_{field_name}.txt.

    In this case, where the field is NoteIndex.text, and the corresponding model is myapp.Note, Haystack looks for the template search/indexes/myapp/note_text.txt. Define that template so that it gathers all the relevant text into one document for the model instance:

    {{ object.title }}
    {{ object.user.get_full_name }}
    {{ object.body }}
    

    All of these examples come from the Haystack getting started tutorial.