solrdismax

Solr 4.1 dismax pf not returning expected results


I am using solr4.1 and qt=dismax. I have a similar set with solr1.4 as well.

When I query solr 4.1 with a pf field, the results returned do not have the documents with matching phrases at the top. With my previous installation of solr 1.4, I was getting correct results i.e. documents that have phrases did rank higher than the ones that do not have the phrases.

In solrconfig.xml i have this configuration:

    <requestHandler name="dismax" class="solr.SearchHandler" >
    <lst name="defaults">
        <str name="defType">dismax</str>
        <str name="echoParams">explicit</str>
        <float name="tie">1.0</float>
    </lst>
    </requestHandler>

My Query looks like this:

qt=dismax&q=product%20manager&qf=summ_svc_descr+skills+past_proj_tag+past_proj_name+past_proj_descr+login_name+business_name+primary_state+primary_country+primary_city+tagline+dtl_svc_descr+keywords+about_us+parent_cat_name+experience+credentials+past_cat_name+groups+company_login_name+company_business_name&fl=dtl_svc_descr+uniq_id,login_name,login_userid,parent_cat_name,parent_cat_id,net_score,business_name,business_name_sort,primary_state,primary_country,primary_city,primary_zip,reviews_positive_12mos,reviews_12mos,feedback_avg_12mos,earnings_12mos,reviews_positive_6mos,reviews_6mos,feedback_avg_6mos,earnings_6mos,earnings_overall,tagline,summ_svc_descr,hourly_rate,is_individual,user_id,score,tier_seller_id,file_upload_id,file_upload_name,new_provider,is_team,team_cnt,skill_ids,skills,portfolio_yn,jobs_accepted_12mos,is_agent,company_userid,company_login_name,company_business_name,available_y**&pf=summ_svc_descr^1.2+skills^1.8+past_proj_tag+past_proj_name+past_proj_descr+experience+credentials+tagline^1.8+dtl_svc_descr^1.2+keywords+about_us^1.2**&rows=25&start=0&wt=json

when i checked the debug output, i see that the parsedquery does evaluate for phrases too:

parsedquery_toString: "+(((skills:product | about_us:product | keywords:product | past_proj_name:product | past_proj_descr:product | past_cat_name:product | summ_svc_descr:product | past_proj_tag:product | company_login_name:product | parent_cat_name:product | business_name:product | login_name:product | company_business_name:product | credentials:product | experience:product | dtl_svc_descr:product | primary_state:product | primary_country:product | primary_city:product | groups:product | tagline:product)~1.0 (skills:manag | about_us:manag | keywords:manag | past_proj_name:manag | past_proj_descr:manag | past_cat_name:manag | summ_svc_descr:manag | past_proj_tag:manag | company_login_name:manag | parent_cat_name:manag | business_name:manag | login_name:manag | company_business_name:manag | credentials:manag | experience:manag | dtl_svc_descr:manag | primary_state:manager | primary_country:manager | primary_city:manager | groups:manag | tagline:manag)~1.0)~2) (skills:"product manag"~1^1.8 | about_us:"product manag"~1^1.2 | keywords:"product manag"~1 | past_proj_name:"product manag"~1 | past_proj_descr:"product manag"~1 | summ_svc_descr:"product manag"~1^1.2 | past_proj_tag:"product manag"~1 | experience:"product manag"~1 | credentials:"product manag"~1 | dtl_svc_descr:"product manag"~1^1.2 | tagline:"product manag"~1^1.8)~1.0"


Solution

  • I found the issue. Posting the answer for everyone's benefit.

    There is nothing wrong with the pf argument and the output itself. That was just a symptom of a deeper rooted issue.

    There was a custom similarity class(by the developer working on it before me as I found from the schema file) defined that was causing the fieldNorm to be 0 for a lot of documents. Thanks to the detailed debugQuery output I was able to find the issue and also figured out how to do per-field Similarity. Also, i had tried using the default similarity class provided by Solr but that didnt help in getting results, because I had not reindexed the documents. Had I re-indexed the documents, it would have been clearer that the custom similarity was the culprit.

    Solr uses the Similarity class at index as-well-as query times. So whenever you choose to change the similarity class in the schema, you most likely will need to reindex all your documents if you want the new Similarity class to fully take effect.