phpfull-text-searchsphinxsphinxqlmanticore-search

Manticore - sphinxQL GROUP BY duplicated grouped id


When I using GROUP BY syntax in Manticore, there are results with duplicated grouped id. We've just migrated from sphinx 2.X to the latest Manticore, and in Sphinx there wasn't this promlem with the same query.

This is the sphinxQL query:

SELECT model_id, model_root, model_name FROM search WHERE model_id != 0 GROUP BY model_root WITHIN GROUP ORDER BY model_level ASC ORDER BY model_level ASC, model_occurrence DESC, model_name ASC LIMIT 0, 13

So grouped the model_root, and there is a duplicated key at -> 10,11 (Cannon) -> This is not what I expected.

This is the result:

array:13 [▼
  0 => array:3 [▼
    "model_id" => "62763"
    "model_root" => "62763"
    "model_name" => "HP"
  ]
  1 => array:3 [▼
    "model_id" => "72771"
    "model_root" => "72771"
    "model_name" => "Sony"
  ]
  2 => array:3 [▼
    "model_id" => "72524"
    "model_root" => "72524"
    "model_name" => "Compaq"
  ]
  3 => array:3 [▼
    "model_id" => "62783"
    "model_root" => "62783"
    "model_name" => "Samsung"
  ]
  4 => array:3 [▼
    "model_id" => "62760"
    "model_root" => "62760"
    "model_name" => "Asus"
  ]
  5 => array:3 [▼
    "model_id" => "62761"
    "model_root" => "62761"
    "model_name" => "Toshiba"
  ]
  6 => array:3 [▼
    "model_id" => "85086"
    "model_root" => "85086"
    "model_name" => "Panasonic"
  ]
  7 => array:3 [▼
    "model_id" => "151763"
    "model_root" => "151763"
    "model_name" => "Acer"
  ]
  8 => array:3 [▼
    "model_id" => "72548"
    "model_root" => "72548"
    "model_name" => "Packard Bell"
  ]
  9 => array:3 [▼
    "model_id" => "62762"
    "model_root" => "62762"
    "model_name" => "Lenovo"
  ]
  10 => array:3 [▼
    "model_id" => "83072"
    "model_root" => "83072"
    "model_name" => "Canon"
  ]
  11 => array:3 [▼
    "model_id" => "83072"
    "model_root" => "83072"
    "model_name" => "Canon"
  ]
  12 => array:3 [▼
    "model_id" => "73476"
    "model_root" => "73476"
    "model_name" => "LG"
  ]
]

What expected:

array:13 [▼
  0 => array:3 [▼
    "model_id" => "62763"
    "model_root" => "62763"
    "model_name" => "HP"
  ]
  1 => array:3 [▼
    "model_id" => "72771"
    "model_root" => "72771"
    "model_name" => "Sony"
  ]
  2 => array:3 [▼
    "model_id" => "72524"
    "model_root" => "72524"
    "model_name" => "Compaq"
  ]
  3 => array:3 [▼
    "model_id" => "62783"
    "model_root" => "62783"
    "model_name" => "Samsung"
  ]
  4 => array:3 [▼
    "model_id" => "62760"
    "model_root" => "62760"
    "model_name" => "Asus"
  ]
  5 => array:3 [▼
    "model_id" => "62761"
    "model_root" => "62761"
    "model_name" => "Toshiba"
  ]
  6 => array:3 [▼
    "model_id" => "85086"
    "model_root" => "85086"
    "model_name" => "Panasonic"
  ]
  7 => array:3 [▼
    "model_id" => "151763"
    "model_root" => "151763"
    "model_name" => "Acer"
  ]
  8 => array:3 [▼
    "model_id" => "72548"
    "model_root" => "72548"
    "model_name" => "Packard Bell"
  ]
  9 => array:3 [▼
    "model_id" => "62762"
    "model_root" => "62762"
    "model_name" => "Lenovo"
  ]
  10 => array:3 [▼
    "model_id" => "83072"
    "model_root" => "83072"
    "model_name" => "Canon"
  ]
  11 => array:3 [▼
    "model_id" => "73476"
    "model_root" => "73476"
    "model_name" => "LG"
  ]
  12 => array:3 [▼
    "model_id" => "73266"
    "model_root" => "73266"
    "model_name" => "Fujitsu"
  ]
]

This is the index definiton:

index search
{
  type = plain
  source = search
  path = /var/lib/manticore/data/search
  min_word_len = 1
  dict = keywords
  min_prefix_len = 1
  index_field_lengths = 1
  charset_table = 0..9,non_cjk,-,.,/,"
}

and in the source definiton the required fields:

sql_attr_uint = model_id
sql_attr_uint  = model_root
sql_field_string = model_name

Any ideas what is the problem with the query or index definiton?


Solution

  • I've reproduced your issue. Yes, Manticore's behaviour differs in this case and most likely the default max_matches value (1000) is not enough comparing to Sphinx 2.x. In case of the test you have provided max_matches=1025 should be enough (while in Sphinx 2.2 it's 892). In your production case please experiment with the most optimal value yourself.

    Please read about how max_matches affects grouping results here https://docs.manticoresearch.com/latest/html/searching/grouping_clustering_search_results.html