boostsolr-boost

Explain Apache SOLR boost function


I'm try to implement a logic in APACHE SOLR so that documents older than 2 years should get penalty based on the difference in number of days or months.

I am using this boost function, which I got after googling a lot.

 recip(ms(NOW,publicationDate),3.16e-11,1,1) // Currently it is set to use 1 year

Can any please confirm if this penalties old documents or what ?

Thanks


Solution

  • A reciprocal function with recip(x,m,a,b) implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function.

    enter image description here

    In case of your parameters, your function will look like this:

    f(x) = 1 /(3.16e-11*x + 1)
    

    Function ms returns milliseconds of difference between it's arguments.

    Dates are relative to the Unix or POSIX time epoch, midnight, January 1, 1970 UTC.

    Imagine, your publication date is September 1st 2015, ms will get us NOW = 1507725936061 and publication date is 1441065600000 and the whole result will be around 0.3 which will be the score for this document.

    For publication date of yesterday, we will get score of 0.99, which leads to the idea, so, this formula will apply penalty to every document not only to ones which are 2 years old. For example, for the same day 1 year ago the score will be 0.5

    I could think potentially about sorting by this function (starting from Solr 6)

    if(gt(ms(mydatefield,NOW-2YEARS),0),1,recip(ms(NOW,publicationDate),3.16e-11,1,1))
    

    I didn't test it (not sure about NOW-2YEARS part), but basically, i'm doing this:

    if mydatefield - NOW-2YEARS greater 
        than 0 => score will be 1.0 
        else   => I'm calculating reciprocal function
    

    One last remark: there are 3.16e10 milliseconds in a year, so one can scale dates to fractions of a year with the inverse, or 3.16e-11, so for 2 years, you may select something different.