mysqlsql-serverperformancequery-performanceinverse

SQL How to get categories that don't contain products?


SELECT DISTINCT {c2.name} 
FROM 
    {BrandCategory AS c  
    JOIN CategoryProductRelation AS c2p  
    ON  {c.pk} = {c2p.source}  
    JOIN Product AS p  
    ON  {c2p.target} = {p.pk}  
    JOIN CategoryProductRelation AS c2p2  
    ON  {p.pk} = {c2p2.target}  
    JOIN TaxonomyCategory AS c2 ON {c2.pk} = {c2p2.source} 
    JOIN CatalogVersion AS cat 
    ON {c.catalogVersion} = {cat.PK}} 
WHERE 
    {c.code} = 'brand-MCH' 
    AND{cat.version} = 'Online' 
    AND {c2.code} NOT LIKE'%webFamily%' 
    AND {p.code} IN ('35365','34299')

Above query pulls out Taxonomy Categories that contain products '35365' and '34299'

Result:
Passenger
Touring
All-Season
Truck

All Categories:
All-Season
Performance
Passenger
Winter
Touring
Summer
Truck
All-Terrain
Competition
Lawn

Now requirement is to pull out all other Taxonomy Categories, meaning those categories that dont contain products '35365' and '34299'

Attempt 1:(Fail)(NOT IN)

SELECT  DISTINCT {c2.name}
    FROM  {BrandCategory AS c
    JOIN  CategoryProductRelation AS c2p  ON {c.pk} = {c2p.source}
    JOIN  Product AS p  ON {c2p.target} = {p.pk}
    JOIN  CategoryProductRelation AS c2p2  ON {p.pk} = {c2p2.target}
    JOIN  TaxonomyCategory AS c2  ON {c2.pk} = {c2p2.source}
    JOIN  CatalogVersion AS cat  ON {c.catalogVersion} = {cat.PK}}
    WHERE  {c.code} = 'brand-MCH' AND{cat.version} = 'Online'
      AND  {c2.code} NOT LIKE'%webFamily%'
      AND  {p.code} NOT IN ('35365','34299') 

Attempt 2:(Fail)(NOT EXISTS)

SELECT  DISTINCT {c2.name}
    FROM  {BrandCategory AS c
    JOIN  CategoryProductRelation AS c2p  ON {c.pk} = {c2p.source}
    JOIN  Product AS p  ON {c2p.target} = {p.pk}
    JOIN  CategoryProductRelation AS c2p2  ON {p.pk} = {c2p2.target}
    JOIN  TaxonomyCategory AS c2  ON {c2.pk} = {c2p2.source}
    JOIN  CatalogVersion AS cat  ON {c.catalogVersion} = {cat.PK}}
    WHERE  {c.code} = 'brand-MCH' AND{cat.version} = 'Online'
      AND  {c2.code} NOT LIKE'%webFamily%'
      AND  {c2.code}
      AND  NOT EXISTS ({{
        SELECT  DISTINCT {c3.name}
            FROM  {BrandCategory AS c
            JOIN  CategoryProductRelation AS c2p  ON {c.pk} = {c2p.source}
            JOIN  Product AS p  ON {c2p.target} = {p.pk}
            JOIN  CategoryProductRelation AS c2p2  ON {p.pk} = {c2p2.target}
            JOIN  TaxonomyCategory AS c3  ON {c3.pk} = {c2p2.source}
            JOIN  CatalogVersion AS cat  ON {c.catalogVersion} = {cat.PK}}
            WHERE  {c3.name}={c2.name}
              AND  {c.code} = 'brand-MCH' AND{cat.version} = 'Online'
              AND  {c3.code} NOT LIKE'%webFamily%'
              AND  {p.code} IN ('35365','34299')}}
                      ) 

Attempt 3:(Fail)(NOT IN SUBQUERY)

SELECT  DISTINCT {c2.name}
    FROM  {BrandCategory AS c
    JOIN  CategoryProductRelation AS c2p  ON {c.pk} = {c2p.source}
    JOIN  Product AS p  ON {c2p.target} = {p.pk}
    JOIN  CategoryProductRelation AS c2p2  ON {p.pk} = {c2p2.target}
    JOIN  TaxonomyCategory AS c2  ON {c2.pk} = {c2p2.source}
    JOIN  CatalogVersion AS cat  ON {c.catalogVersion} = {cat.PK}}
    WHERE  {c.code} = 'brand-MCH' AND{cat.version} = 'Online'
      AND  {c2.code} NOT LIKE'%webFamily%'
      AND  {c2.name} NOT IN ({{
        SELECT  DISTINCT {c2.name}
            FROM  {BrandCategory AS c
            JOIN  CategoryProductRelation AS c2p  ON {c.pk} = {c2p.source}
            JOIN  Product AS p  ON {c2p.target} = {p.pk}
            JOIN  CategoryProductRelation AS c2p2  ON {p.pk} = {c2p2.target}
            JOIN  TaxonomyCategory AS c2  ON {c2.pk} = {c2p2.source}
            JOIN  CatalogVersion AS cat  ON {c.catalogVersion} = {cat.PK}}
            WHERE  {c.code} = 'brand-MCH' AND{cat.version} = 'Online'
              AND  {c2.code} NOT LIKE'%webFamily%'
              AND  {p.code} IN ('35365','34299')}}
                          ) 

...

Result:
All-Season
Performance
Passenger
Winter
Touring
Summer
Truck
All-Terrain
Competition
Lawn

Expected:
Performance
Winter
Summer
All-Terrain
Competition
Lawn

Please help out on getting those categories that don't contain with a optimal query keeping performance in check.

Also, if there is a way to get all categories with some flag in result to differentiate what product contains true or not, that would be absolute Gold, because here we are hitting DB twice to get categories that contains and then categories that don't contain through a on-demand call

Note: These are essentially SQL queries, but just slightly modified with those braces to support Flexible Search Query format in Hybris Framework


Solution

  • Queries-
    1. Using NOT IN-
    SELECT {c.code} FROM {Product as p join CategoryProductRelation as c2p on {c2p.target} = {p.pk} join Category as c on {c2p.source} = {c.pk}} WHERE {p.code} NOT IN ('35365','34299')

    1. Using NOT EXISTS-
      SELECT {c.code} FROM {CategoryProductRelation as c2p join Category as c on {c2p.source} = {c.pk}} WHERE NOT EXISTS ({{SELECT {p.code} FROM {Product as p} WHERE {p.code} IN ('35365','34299') and {c2p.target}={p.pk}}})

    2. Using subquery-
      SELECT {c.code} FROM {Product as p join CategoryProductRelation as c2p on {c2p.target} = {p.pk} join Category as c on {c2p.source} = {c.pk}} WHERE {p.pk} IN ({{SELECT {pk} FROM {Product} WHERE {code} IN ('35365','34299')}})