I am working on a django application with sqlite3 database, that has a fixed database content. By fixed I mean the content of the db won't change over time. The model is something like this:
class QScript(models.Model):
ch_no = models.IntegerField()
v_no = models.IntegerField()
v = models.TextField()
There are around 6500 records in the table. Given a text that may have some words missing, or some words misspelled, I need to determine its ch_no
and v_no
. For example, if there is a v
field in db with text "This is an example verse", a given text like "This an egsample verse" should give me the ch_no
and v_no
from db. This can be done using Full text search I believe.
can full-text search do this? My guess from what I have studied, it can, as said in sqlite3 page: full-text searches is "what Google, Yahoo, and Bing do with documents placed on the World Wide Web". Cited in SO, I read this article too, along with many others, but didn't find anything that closely matches my requirements.
How to use FTS in django models? I read this but it didn't help. It seems too outdated. Read here that: "...requires direct manipulation of the database to add the full-text index". Searching gives mostly MySQL related info, but I need to do it in sqlite3. So how to do that direct manipulation in sqlite3?
Is my choice of sticking to sqlite3 correct? Or should I use something different (like haystack+elasticsearch as said by Alex Morozov)? My db will not grow any larger, and I have studied that for small sized db, sqlite is almost always better (my situation matches the fourth in sqlite's when to use checklist).
I think that while sqlite is an amazing piece of software, its full-text search capabilities are quite limited. Instead you could index your database using Haystack Django app with some backend like Elasticsearch. Having this setup (and still being able to access your sqlite database) seems to me the most robust and flexible way in terms of FTS.
Elasticsearch has a fuzzy search based on the Levenshtein distance (in a nutshell, it would handle your "egsample" queries). So all you need is to make a right type of query:
from haystack.forms import SearchForm
from haystack.generic_views import SearchView
from haystack import indexes
class QScriptIndex(indexes.SearchIndex, indexes.Indexable):
v = indexes.CharField(document=True)
def get_model(self):
return QScript
class QScriptSearchForm(SearchForm):
text_fuzzy = forms.CharField(required=False)
def search(self):
sqs = super(QScriptSearchForm, self).search()
if not self.is_valid():
return self.no_query_found()
text_fuzzy = self.cleaned_data.get('text_fuzzy')
if text_fuzzy:
sqs = sqs.filter(text__fuzzy=text_fuzzy)
return sqs
class QScriptSearchView(SearchView):
form_class = QScriptSearchForm
Update: As long as PostgreSQL has the Levenshtein distance function, you could also leverage it as the Haystack backend as well as a standalone search engine. If you choose the second way, you'll have to implement a custom query expression, which is relatively easy if you're using a recent version of Django.