cypheraggregation

Cypher - Return rows where aggregation(count or sum) is greater than 1


Newbie and I have a graph with 'employees' WORKING AT 'retailers', and 'customers' SHOPPING at 'retailers'. Some of these transactions which the customers do are 'disputed.' I am trying to figure out a list of employees who work with more than 1 retailer where a transaction has been marked as 'disputed'

Here is what I have written which gives me good results, but contains all employees who work at 1 store as well. I want to filter-out these employees who work at 1 store.

MATCH (employees:Employee)-[worklocation:WORKS_AT]->(retailer:Retailer)
WITH employees, retailer,  COUNT(DISTINCT worklocation) AS Count_Worklocations  
  WHERE  (retailer)<-[:SHOPPED_AT {status : "Disputed"}]-(:Customer) 
RETURN employees.name, sum(Count_Worklocations) AS WORKPLACES
order by WORKPLACES DESC

Results:

╒═════════════════╤══════════╕
│employees.name   │WORKPLACES│
╞═════════════════╪══════════╡
│"Irvin Clayton"  │3         │
├─────────────────┼──────────┤
│"Ricky Bond"     │2         │
├─────────────────┼──────────┤
│"Carmen Dixon"   │2         │
├─────────────────┼──────────┤
│"Bryon Ramos"    │2         │
├─────────────────┼──────────┤
│"Seth Snow"      │1         │
├─────────────────┼──────────┤
│"Donny Pollard"  │1         │
├─────────────────┼──────────┤
│"Isaac Mendez"   │1         │
├─────────────────┼──────────┤
│"Sonny Horn"     │1         │
├─────────────────┼──────────┤
│"Roxie Aguilar"  │1         │
├─────────────────┼──────────┤
│"Letha Hardy"    │1         │
├─────────────────┼──────────┤
│"Don Howe"       │1         │
├─────────────────┼──────────┤
│"Kelvin Haney"   │1         │
├─────────────────┼──────────┤
│"Denver Glover"  │1         │
├─────────────────┼──────────┤
│"Steven Carney"  │1         │
├─────────────────┼──────────┤
│"Kraig Hensley"  │1         │
├─────────────────┼──────────┤
│"Andrea Gallegos"│1         │
├─────────────────┼──────────┤
│"Lina Rivers"    │1         │
├─────────────────┼──────────┤
│"Deidre Duke"    │1         │
├─────────────────┼──────────┤
│"Jerold Mccarthy"│1         │
├─────────────────┼──────────┤
│"Malik Copeland" │1         │
└─────────────────┴──────────┘
MATCH (employee:Employee)-[Employee:WORKS_AT]->(retailer) 
WITH employee.name AS `Employee Name`, 
  collect(DISTINCT retailer.name) AS `Retailer Name`, 
  count(retailer.name) as cnt
WHERE cnt >1 
MATCH (employee:Employee)-[Employee:WORKS_AT]->(retailer),
  (customer:Customer)-  [transaction:SHOPPED_AT]->(retailer) 
WHERE transaction.status = "Disputed" 
RETURN DISTINCT `Employee Name`, `Retailer Name`

This gives me -

╒═══════════════╤════════════════════╕
│Employee Name  │Retailer Name       │
╞═══════════════╪════════════════════╡
│"Seth Snow"    │["Gap", "Target"]   │
├───────────────┼────────────────────┤
│"Roxie Aguilar"│["Gap", "BestBuy"]  │
├───────────────┼────────────────────┤
│"Ricky Bond"   │["BestBuy", "Nordstr│
│               │om"]                │
├───────────────┼────────────────────┤
│"Carmen Dixon" │["Coach", "Nordstrom│
│               │"]                  │
├───────────────┼────────────────────┤
│"Bryon Ramos"  │["Coach", "Foot Lock│
│               │er"]                │
├───────────────┼────────────────────┤
│"Irvin Clayton"│["Express", "Kohls",│
│               │ "Nordstrom"]       │
└───────────────┴────────────────────┘

However, this is wrong as no Disputed transactions were reported at retailer 'Gap' so Roxie Aguilar and Seth Snow have worked at only 1 store where 'Disputed' transaction was reported and their names should not be there.


Solution

  • You can get what you need by using an EXISTS subquery in the first MATCH and only include retailers that have disputes:

    MATCH (employee:Employee)-[:WORKS_AT]->(retailer:Retailer)
    WHERE EXISTS { (retailer)<-[:SHOPPED_AT {status: "Disputed"}]-(:Customer) } 
    WITH employee, collect(retailer) AS retailers, count(*) AS count WHERE count > 1
    RETURN employee.name AS `Employee Name`, 
          [retailer IN retailers | retailer.name] AS `Retailer Names`
    

    This return result columns as in your second example.