pythondjangodjango-haystackstatic-fileshtml-content-extraction

Django-haystack search static content


My Django 1.10 app provides a search functionality using Haystack + Elastic Search. It works great for models data, but I need to make it work for static content too (basically HTML files).

I was thinking on scrapping the content from the HTML files (BeautifulSoup?) and save them to the database, this way the templates content could be indexed.

I found this module that does exactly what I need but seems deprecated:

https://github.com/trapeze/haystack-static-pages

So, what's the best way to allow haystack to find the content included in HTML pages?


Solution

  • I forked the module haystack-static-pages and adapted it to my needs. Now is compatible with Django 1.10 + haystack 2.5 and support login to scrap logged pages :)

    Updated version: https://github.com/pisapapiros/haystack-static-pages