solrsolrjsolr-cell

Adding fields to pdf files using solrj


I am a newbee to solr.I am having a problem with adding fields/metadata to pdf files while indexing them in solr using the ContentStreamUpdateRequest.As the literal parameter must be used to add fields I tried the following:

public static void indexFilesSolrCell(String fileName,String solrId,int i,String name,String Category,String loc,String locat) 
                    throws IOException, SolrServerException {
    //...
    ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
    File f1 = new File(fileName);
    up.addFile(new File(fileName));
    up.setParam("literal.id",solrId);
    up.setParam("literal.name",name );
    up.setParam("literal.url_file", loc);
    up.setParam("literal.location",locat);
    up.setParam("literal.Category",Category);
    //..
}

The pdf file gets indexed in solr but the problem is that Not all the fields have been created using literals.The following fields have been created:

  1. id
  2. name
  3. Category.

    While it does not create fields like url_file or anything like path or location.*At times* it does not create the field Category.
    As of what I have gone through, any random field could be created using the literal parameter to create a metadata.Why is that fields like id or name or even blah_s are created always but when I try a random field like the above mentioned,solr does not create?
    Do we have to declare these random fields anywhere else as well?
    Any help is greatly appreciated.
    Update: Doesn't calling the method up.setParam("literal.myField") modify the schema.xml to create a new field?


Solution

  • That's because you're using the solr examples, which doesn't contain the url_file and location fields. You can find the schema.xml under example/solr/conf. I suggest you to clean it up a little keeping only the fields you need, since that schema contains a lot of fields you don't really need.

    The blah_s field gets created because the schema you're using contains the following definition:

    <dynamicField name="*_s" type="string"  indexed="true"  stored="true"/>
    

    It's a dynamic field with suffix _s, it means every field with that suffix will be taken by Solr as string, indexed and stored as well.

    To modify the schema.xml you need to open it locally and make changes to the xml file, then reload Solr. Remember that after a schema change you need to reindex, re-running the code you pasted in your question.