lucenelucene.netzend-search-lucene

In Lucene How to Index the word position of text file


I am new in lucene environment and using tutorial.

1- Indexer.java class is used to index the raw data so that we can make it searchable using lucene library.

2- LuceneConstants.java class is used to provide various constants to be used across the sample application.

3- Searcher.java class is used to search the indexes created by Indexer to search the requested contents.

4- TextFileFilter.java class is used as a .txt file filter.

5- LuceneTester.java lass is used to test the indexing and search capability of lucene library..

Now I am trying to index the Field for word position(TermVectorPosition) in Indexer.java and retrieve it through query in LuceneTester.java . Any one help me


Solution

  •     public void doSearch(String querystr) throws IOException, ParseException {      
    
    
       StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);  
    
       Directory index = FSDirectory.open(new File(indexDir));  
    
       // 2. query  
    
       Query q = new QueryParser(Version.LUCENE_36, LuceneConstants.Term_Vector_Position, analyzer).parse(querystr);  
    
    
       // 3. search  
       int hitsPerPage = 10;  
       IndexSearcher searcher = new IndexSearcher(index, true);  
       IndexReader reader = IndexReader.open(index, true);  
       searcher.setDefaultFieldSortScoring(true, true);  
       TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);  
       searcher.search(q, collector);  
       ScoreDoc[] hits = collector.topDocs().scoreDocs;  
    
       // 4. display term positions, and term indexes   
       System.out.println("Found " + hits.length + " hits.");  
     //  System.out.println("Found " + hits.clone().length+ " hits.");
       for(int i=0;i<hits.clone().length;++i) {  
    
           int docId = hits[i].doc;  
           System.out.println("docId:" + docId);
           TermFreqVector tfvector = reader.getTermFreqVector(docId, "TVP");  
           TermPositionVector tpvector = (TermPositionVector)tfvector; 
           System.out.println("tfvector " + tfvector + " tpvector" + tpvector);
           int termidx = tfvector.indexOf(querystr);  
           System.out.println("termidx " + termidx );
           int[] termposx = tpvector.getTermPositions(termidx);
           //System.out.println("termposx " + termposx.length);
           TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);
    
         for (int j=0;j<termposx.length;j++) {  
              System.out.println("termpos at j :"+j + ": " +termposx[j]);  
          } 
    
          System.out.println("tvoffsetinfo " + tvoffsetinfo.length);
    
          for (int j=0;j<tvoffsetinfo.length;j++) {  
              int offsetStart = tvoffsetinfo[j].getStartOffset();  
              int offsetEnd = tvoffsetinfo[j].getEndOffset();  
              System.out.println("offsets : "+offsetStart+" "+offsetEnd);  
          }  
    
          Document d = searcher.doc(docId);  
          System.out.println((i + 1) + ". " + d.get("filepath"));
    
       }
    
        // searcher can only be closed when there  
       // is no need to access the do***ents any more.   
       searcher.close();  
    

    }