sqlsqliteintersectionset-theory

Haystack needles intersection - find set in haystack with most elements


I need to find out which price is valid for a feature

Sqlfiddle demo with sample records

From the haystack (the order) each feature (here F1) must be matched to a pricerule. The valid price is the one with the most features intersection.

Sample records and tables

This picture shows the mapping between pricerules and different orders. The price that is relevant for Feature F1 is based on the intersection between fetaures in the pricerule table and the features that are part of the order.

pricerules for feature F1 and orders with feature F1

Example

This example has three pricerules for feature F1 and three different orders.

This query is a is my attempt to join the records but many duplicates and wrong join records are present

SELECT op.OrderId, op.Id as OrderPositionId, 
       op.FeatureId, f.Featurename, pr.Price 
FROM OrderPositions op 
INNER JOIN FeatureCombinations fc ON op.FeatureId = fc.FeatureId
INNER JOIN Features f on op.FeatureId = f.Id
INNER JOIN PriceRules pr ON fc.PriceRuleId = pr.Id
ORDER BY op.OrderId, op.Id, f.Id;

Sqlfiddle demo with sample records

From the current result all records with (x) are wrong

OrderId OrderPositionId FeatureId Featurename Price
401 211 1 F1 6
401 211 x 1 F1 4
401 211 x 1 F1 2
402 221 x 1 F1 6
402 221 1 F1 4
402 221 x 1 F1 2
402 222 x 2 F2 4
402 222 x 2 F2 2
403 231 x 1 F1 6
403 231 x 1 F1 4
403 231 1 F1 2
403 232 x 2 F2 4
403 232 x 2 F2 2
403 233 x 3 F3 2

Question

How to determine which pricerule for a feature matches best (elements from order ∩ elements from pricerule)?

SQL Statemens to create the tables with records

Create tables

CREATE TABLE Features(
  Id    INTEGER PRIMARY KEY, 
  Featurename  TEXT
);
CREATE TABLE PriceRules(
  Id     INTEGER PRIMARY KEY, 
  Price  REAL   
);
CREATE TABLE FeaturesPriceRules(
  FeatureId   INTEGER, 
  PriceRuleId INTEGER,
  FOREIGN KEY(FeatureId) REFERENCES Features(Id),
  FOREIGN KEY(PriceRuleId) REFERENCES PriceRules(Id)
);

CREATE TABLE FeatureCombinations(
  PriceRuleId INTEGER,
  FeatureId   INTEGER,   
  FOREIGN KEY(PriceRuleId) REFERENCES PriceRules(Id),
  FOREIGN KEY(FeatureId) REFERENCES Features(Id)  
);

CREATE TABLE OrderPositions(
  Id INTEGER  PRIMARY KEY, 
  OrderId   INTEGER,   
  FeatureId   INTEGER,     
  FOREIGN KEY(FeatureId) REFERENCES Features(Id)  
);

Create records


INSERT INTO Features (Id, Featurename)
VALUES (1, 'F1'), (2, 'F2'), (3, 'F3'),
       (4, 'F6'), (5, 'F7'), (6, 'F9'),
       (7, 'E1'), (8, 'ZA2');

INSERT INTO PriceRules (Id, Price)
VALUES (101, '6.00'), (102, '4.00'), (103, '2.00'),
       (105, '24.00'), (106, '22.00');  
    
INSERT INTO FeaturesPriceRules (FeatureId, PriceRuleId)
VALUES (1, 101), (1, 102), (1, 103),(7, 105),(7, 106);  
               
INSERT INTO FeatureCombinations (PriceRuleId, FeatureId)
VALUES (101, 1), (102, 1), (102, 2),
       (103, 1), (103, 2),(103, 3), 
       (105, 7), (106, 7), (106, 8); 
       
INSERT INTO OrderPositions(Id, OrderId, FeatureId)
VALUES (211, 401, 1), (221, 402, 1),(222, 402, 2),
(231, 403, 1),(232, 403, 2),(233, 403, 3);         
   

See also this Sqlfiddle demo with sample records


Solution

  • You want to show all order positions. For this we join OrderPositions with Features. Now the only thing missing is the price that we can get in a correlated subquery:

    select
      op.orderid,
      op.id as orderpositionid,
      op.featureid,
      f.featurename,
      (...) as price
    from orderpositions op
    join features f on f.id = op.featureid
    order by op.orderid, op.id;
    

    To get the price is a bit tricky, because we must look at all the features in the order to get the price rule that covers most features in the order. This requires an aggregation of the feature set joined with the order features per rule. All features required in the rule must be matched. The matching rules - if there is more than one - must be ranked, so we use an ORDER BY clause: In case of ties, we want the rule that requires most features (order by count of features in descending order), and if we still get a tie, e.g. an order with features F1, F2, F3 and two candidate rules on [F1, F2] and [F1, F3], this algorithm will pick one of them arbitrarily.

    This is the subquery: We aggregate by rule and count the number of matching features in that rule. Then we order by the match count descendingly. For ties we want to get the rule that requires less matches, so we order by the total feature count, too.

    select pr.price
    from featurespricerules fpr
    join PriceRules pr on pr.id = fpr.PriceRuleId
    join featurecombinations fc on fc.priceruleid = pr.id
    left join OrderPositions op2 on op2.orderid = op.orderid
                                and op2.featureid = fc.featureid
    where fpr.featureid = op.featureid
    group by pr.id
    having count(op2.id) = count(*)
    order by count(*) desc
    limit 1
    

    The complete query:

    select
      op.orderid,
      op.id as orderpositionid,
      op.featureid,
      f.featurename,
      (
         select pr.price
         from featurespricerules fpr
         join PriceRules pr on pr.id = fpr.PriceRuleId
         join featurecombinations fc on fc.priceruleid = pr.id
         left join OrderPositions op2 on op2.orderid = op.orderid
                                     and op2.featureid = fc.featureid
         where fpr.featureid = op.featureid
         group by pr.id
         having count(op2.id) = count(*)
         order by count(*) desc
         limit 1
      ) as price
    from orderpositions op
    join features f on f.id = op.featureid
    order by op.orderid, op.id;
    

    For your sample data I get

    OrderId orderpositionid FeatureId Featurename price
    401 211 1 F1 6
    402 221 1 F1 4
    402 222 2 F2 (null)
    403 231 1 F1 2
    403 232 2 F2 (null)
    403 233 3 F3 (null)

    Your sample data doesn't contain price rules for F2 and F3, so we don't know their respective prices.

    Demo: http://sqlfiddle.com/#!5/740cf/113

    Update

    new Fiddle URL with filter NULL out