mysqlsphinx

sphinx, how correctly create sql_query for many joins


I have DB and related tables:

user

cars_posts userId -> FK for user.id

user_cars userId -> FK for user.id

categories

makes

models

generations

I have this sql_query:

sql_query = \
        SELECT \
            user.id, \
            user.username, \
            user.avatar, \
            cars_posts.id AS carsPostId, \
            cars_posts.title, \
            cars_posts.content, \
            cars_posts.searchContent, \
            cars_posts.coverUrl, \
            cars_posts.categoryId, \
            cars_posts.createdAt, \
            user_cars.id AS userCarId, \
            user_cars.makeId, \
            user_cars.modelId, \
            user_cars.generationId, \
            user_cars.engine, \
            user_cars.transmission, \
            user_cars.drive, \
            makes.name AS makeName, \
            generations.name AS generationName, \
            generations.defaultName as generationDefaultName, \
            models.name as modelName, \
            models.defaultName as modelDefaultName, \
            categories.name as categoryName \
        FROM user \
        LEFT JOIN user_cars ON user_cars.userId = user.id \
        LEFT JOIN cars_posts ON cars_posts.userId = user.id \
        LEFT JOIN makes ON makes.id = user_cars.makeId \
        LEFT JOIN generations on generations.id = user_cars.generationId \
        LEFT JOIN models on models.id = user_cars.modelId \
        LEFT JOIN categories on categories.id = cars_posts.categoryId \
        WHERE user.id>=$start AND user.id<=$end

MySql result for this query ( and other colums from select) : enter image description here

One user can have many posts or cars. But when i try to indexer data i have error like WARNING: 162175 duplicate document id pairs found. I understand why ( id duplicates ) , for sphinx must be unique ids.

How can i correctly create sql_query or what sphinx offers to solve such problems?

Thanks!


Solution

  • You either need to make it an index on the 'most numerous' table, the one that has the multiple rows per user. Possibly user_posts, then its own document per post, possibly using carPostId as the document_id. But as it seems not all users have posts, that might not work.

    ... or make sure that only have one row/document in the resultset. Typically with GROUP BY, so add ... GROUP BY user.id to the end of the sql_query But then need to cover the case when there may be multiple rows from some tables.

    So instead of just getting the title, would do something like GROUP_CONCAT(cars_posts.title) as title, which would man one aggratated field for each user, containing all their post titles. Would do something similar for all the columns from tables that (may) have mutliple rows per user.

    If really dont like using GROUP_CONCAT() there is also https://sphinxsearch.com/docs/manual-2.3.2.html#conf-sql-joined-field and https://sphinxsearch.com/docs/manual-2.3.2.html#mva which can be defined by a seperate sql query.