sphinxmanticore-search

SphinxQL match equivalent of MySQL LIKE %searchterm%


In my MySQL database I have this result querying my data:

mysql> select count(*) from emails where email like '%johndoe%';
+----------+
| count(*) |
+----------+
|      102 |
+----------+
1 row in set (15.50 sec)

My data is indexed under Sphinx (Manticore Search actually) with min_word_len = 1. Now, when I search with SphinxQL I get only partial results:

mysql> SELECT count(*) FROM search1 WHERE MATCH('@email johndoe') LIMIT 1000 OPTION max_matches=1000;
+----------+
| count(*) |
+----------+
|       16 |
+----------+
1 row in set (0.00 sec)

Any idea how to match the results MySQL give me? I tried SPH_MATCH_ANY and SPH_MATCH_EXTENDED with the sphinxapi, same results.


Solution

  • I suspect it's mainly due to whole word matching. Sphinx matches whole words. With 'words' defined as per charset_table http://sphinxsearch.com/docs/current/conf-charset-table.html

    ie MATCH('@email johndoe') is only matching addresses with johndoe in them as a word. default charset_table keeps . - and @ (common in emails!) all as separators so would match johndoe@domain.com or email@johndoe.com, but NOT email@myjohndoe.com, the word being indexed is myjohndoe not johndoe

    Whereas mysql 'LIKE' will happy match part words. eg email like '%johndoe%' would johndoesmith@domain.com, johndoes555@domain.com and 555@johndoes.comor whatever. Its a pure substring match.


    In short might want to tweak charset_table. could . - and @ all be word chars, so email would be whole word.


    alternatively might just enable part word matching with min_infix_len. http://sphinxsearch.com/docs/current.html#conf-min-infix-len

    then could do MATCH('@email *johndoe*') which would get much closer results.


    complementary to min_infix_len would be expand_keywords http://sphinxsearch.com/docs/current.html#conf-expand-keywords

    then the * wildcards would be added automatically, so could go back to MATCH('@email johndoe')