phpregexsearchsphinx

Sphinx doesn't return matches while querying with amphersand, dot or space


When I try to search for the text C&A, sphinx returned 0 results even though C&A is indexed in the search. It returns C&A when the letter 'C' is searched which means C&A is already indexed.

I think the problem is that Sphinx doesn't treat & as a word character, so it's treated as a word separator instead.

What I've tried so far

  1. Used charsettable charset_table = 0..9, A..Z->a..z, _, a..z,U+410..U+42F->U+430..U+44F, U+430..U+44F,U+0026

  2. Used the api escape string function $escaped = $cl->EscapeString ( "escaping-sample@query/string" );

  3. Tried custom code to escape characters using str_replace ( $from, $to, $string )

Nothing seem to work. How do I change this behaviour in Sphinx?

Using Sphinx version: 2.0.4


Solution

  • After much effort of reading the sphinx documentation, I couldn't find any approach to solve this problem. Hence I went the php way. Here is what I did,

    1. Used replace() in the sql index query to replace all special characters with their equivalent text.

      Select id,Replace(Replace( Replace(name, '&', 'and'),' ','space'),'-','hyphen').....

    2. From the user query, I replaced the characters accordingly with its equivalent text as in the sql.

      //decode html encoding from input
      $text = html_entity_decode($text);
      
      // split and replace with &
      if(strpos($text, '&'))
      {      
      $array = explode("&",$text);
      $text = $array[0]. "and". $array[1];
      }
      
      // split and replace with hyphen
      if(strpos($text, '-'))
      {      
      $array = explode("-",$text);
      $text = $array[0]. "hyphen". $array[1];
      }
      
      // split and replace with space
      if(strpos($text, ' '))
      {      
      $array = explode(" ",$text);
      $text = $array[0]. "space". $array[1];
      }
      

    Now, taking the ampersand example, when user queries for the text C&A, sphinx takes it as canda and returns the match C&A as expected.

    Note: In my case, Sphinx has indexed all special characters, I only had the problem while querying.

    EDIT: Updating Sphinx to latest version seems have solved this problem. Use blend_chars in your index conf.