pdfsolrsolrnet

Use Solr with PDF files


I want to use Solr with PDF files, but I don’t know how configure solrconfig.xml and schema.xml. What should I write in those file ? The aim is to do full-text search with synonym or spell checker for example.(I use Solr on Windows, and in the future i will use the API SolrNet).Thank you !


Solution

  • You would use Tika to extract text from a PDF file.

    Once Tika is configured, you issue a HTTP POST to Solr, specifying the PDF file you wish to index:

    curl 'http://localhost:8983/solr/techproducts/update/extract?literal.id=doc1&commit=true' -F "myfile=@example/exampledocs/solr-word.pdf"

    If you need to map the fields Tika generates (title, author, content) to different fields in your Solr index, you can use the fmap feature:

    fmap.content=text would map Tika's extracted content field to Solr's text field.