lucenehibernate-searchquery-analyzer

Hibernate Search with AnalyzerDiscriminator - Analyzer called only when creating Entity?


can you help me?

I am implementing Hibernate Search, to retrieve results for a global search on a localized website (portuguese and english content)

To do this, I have followed the steps indicated on the Hibernate Search docs: http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4141

Along with the specific configuration in the entity itself, I have implemented a "LanguageDiscriminator" class, following the instructions in this doc.

Because I am not getting exactly the results I was expecting (e.g. my entity has the text "Capuchinho" stored, but when I search for "capucho" I get no hits), I have decided to try and debug the execution, and try to understand if the Analyzers which I have configured are being used at all.

When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?

I am posting the key parts of my code below. Thanks a lot for any feedback!

This is one entity I want to index

@Entity
@Table(name="NEWS_HEADER")
@Indexed
@AnalyzerDefs({
@AnalyzerDef(name = "en",
        tokenizer = @TokenizerDef(factory =     StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, 
                            params = {@Parameter(name="language", value="English")}
            )
        }
),
@AnalyzerDef(name = "pt",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, 
                            params = {@Parameter(name="language", value="Portuguese")}
            )
        }
)
})
public class NewsHeader implements Serializable {

static final long serialVersionUID = 20140301L;

private int         id;
private String          articleHeader;
private String          language;
private Set<NewsParagraph>  paragraphs = new HashSet<NewsParagraph>();

/**
 * @return the id
 */
@Id
@Column(name="ID")
@GeneratedValue(strategy=GenerationType.AUTO)
@DocumentId
public int getId() {
    return id;
}
/**
 * @param id the id to set
 */
public void setId(int id) {
    this.id = id;
}
/**
 * @return the articleHeader
 */
@Column(name="ARTICLE_HEADER")
@Field(index=Index.YES, store=Store.NO)
public String getArticleHeader() {
    return articleHeader;
}
/**
 * @param articleHeader the articleHeader to set
 */
public void setArticleHeader(String articleHeader) {
    this.articleHeader = articleHeader;
}
/**
 * @return the language
 */
@Column(name="LANGUAGE")
@Field
@AnalyzerDiscriminator(impl=LanguageDiscriminator.class)
public String getLanguage() {
    return language;
}
...
}

This is my LanguageDiscriminator class

public class LanguageDiscriminator implements Discriminator {

@Override
public String getAnalyzerDefinitionName(Object value, Object entity, String field) {

    String result = null;

    if (value != null) {
        result = (String) value;
    }
    return result;
}

}

This is my search method present in my SearchDAO

public List<NewsHeader> searchParagraph(String patternStr) {

    Session session = null;

    Transaction tx;

    List<NewsHeader> result = null;

    try {
        session = sessionFactory.getCurrentSession();
        FullTextSession fullTextSession = Search.getFullTextSession(session);
        tx = fullTextSession.beginTransaction();

        // Create native Lucene query using the query DSL
        QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity(NewsHeader.class).get();

        org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
            .keyword()
            .onFields("articleHeader", "paragraphs.content")
            .matching(patternStr)
            .createQuery();

        // Wrap Lucene query in a org.hibernate.Query
        org.hibernate.Query hibernateQuery = 
            fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);

        // Execute search
        result = hibernateQuery.list();

    } catch (Exception xcp) {
        logger.error(xcp);
    } finally {

        if ((session != null) && (session.isOpen())) {
            session.close();
        }
    }
    return result;
}

Solution

  • When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?
    

    The selection of the analyzer is dependent on the state of a given entity, in your case NewsHeader. You are dealing with entity instances during indexing. While querying you don't have entities to start with, you are searching for them. Which analyzer would you Hibernate Search to select for your query?

    That said, I think there is a shortcoming in the DSL. It does not allow you to explicitly specify the analyzer for a class. There is ignoreAnalyzer, but that's not what you want. I guess you could create a feature request in the Search issue tracker - https://hibernate.atlassian.net/browse/HSEARCH.

    In the mean time you can build the query using the native Lucene query API. However, you will need to know which language you are targeting with your query (for example via the preferred language of the logged in user or whatever). This will depend on your use case. It might be you are looking at the wrong feature to start with.