excelduplicatesms-access-2013calculated-columnsexpressionbuilder

How to create a calculated column in access 2013 to detect duplicates


I'm recreating a tool I made in Excel as it's getting bigger and performance is getting out of hand.

The issue is that I only have MS Access 2013 on my work laptop and I'm fairly new to the Expression Builder in Access 2013, which has a very limited function base to be honest.

My data has duplicates on the [Location] column, meaning that, I have multiple SKUs on that warehouse location. However, some of my calculations need to be done only once per [Location]. My solution to that, in Excel, was to make a formula (see below) putting 1 only on the first appearance of that location, putting 0 on next appearances. Doing that works like a charm because summing over that [Duplicate] column while imposing multiple criteria returns the number of occurrences of the multiple criteria counting locations only once.

Now, MS Access 2013 Expression Builder has no SUM nor COUNT functions to create a calculated column emulating my [Duplicate] column from Excel. Preferably, I would just input the raw data and let Access populate the calculated fields vs also inputting the calculated fields as well, since that would defeat my original purpose of reducing the computational cost of creating my dashboard.

The question is, how would you create a calculated column, in MS Access 2013 Expression Builder to recreate the below Excel function:

= IF($D$2:$D3=$D4,0,1)

In the sake of reducing the file size (over 100K rows) I even replace the 0 by a blank character "".

Thanks in advance for your help

Y


Solution

  • First and foremost, understand MS Access' Expression Builder is a convenience tool to build an SQL expression. Everything in Query Design ultimately is to build an SQL query. For this reason, you have to use a set-based mentality to see data in whole sets of related tables and not cell-by-cell mindset.

    Specifically, to achieve:

    putting 1 only on the first appearance of that location, putting 0 on next appearances

    Consider a whole set-based approach by joining on a separate, aggregate query to identify the first value of your needed grouping, then calculate needed IIF expression. Below assumes you have an autonumber or primary key field in table (a standard in relational databases):

    Aggregate Query (save as a separate query, adjust columns as needed)

    SELECT ColumnD, MIN(AutoNumberID) As MinID
    FROM myTable
    GROUP BY ColumnD
    

    Final Query (join to original table and build final IIF expression)

    SELECT m.*, IIF(agg.MinID = AutoNumberID, 1, 0) As Dup_Indicator
    FROM myTable m
    INNER JOIN myAggregateQuery agg
       ON m.[ColumnD] = agg.ColumnD
    

    To demonstrate with random data:

    Original

    | ID | GROUP  | INT | NUM          | CHAR | BOOL  | DATE       |
    |----|--------|-----|--------------|------|-------|------------|
    | 1  | r      | 9   | 1.424490258  | B6z  | TRUE  | 7/4/1994   |
    | 2  | stata  | 10  | 2.591235683  | h7J  | FALSE | 10/5/1971  |
    | 3  | spss   | 6   | 0.560461966  | Hrn  | TRUE  | 11/27/1990 |
    | 4  | stata  | 10  | -1.499272175 | eXL  | FALSE | 4/17/2010  |
    | 5  | stata  | 15  | 1.470269177  | Vas  | TRUE  | 6/13/2010  |
    | 6  | r      | 14  | -0.072238898 | puP  | TRUE  | 4/1/1994   |
    | 7  | julia  | 2   | -1.370405263 | S2l  | FALSE | 12/11/1999 |
    | 8  | spss   | 6   | -0.153684675 | mAw  | FALSE | 7/28/1977  |
    | 9  | spss   | 10  | -0.861482674 | cxC  | FALSE | 7/17/1994  |
    | 10 | spss   | 2   | -0.817222582 | GRn  | FALSE | 10/19/2012 |
    | 11 | stata  | 2   | 0.949287754  | xgc  | TRUE  | 1/18/2003  |
    | 12 | stata  | 5   | -1.580841322 | Y1D  | TRUE  | 6/3/2011   |
    | 13 | r      | 14  | -1.671303816 | JCP  | FALSE | 5/15/1981  |
    | 14 | r      | 7   | 0.904181025  | Rct  | TRUE  | 7/24/1977  |
    | 15 | stata  | 10  | -1.198211174 | qJY  | FALSE | 5/6/1982   |
    | 16 | julia  | 10  | -0.265808162 | 10s  | FALSE | 3/18/1975  |
    | 17 | r      | 13  | -0.264955027 | 8Md  | TRUE  | 6/11/1974  |
    | 18 | r      | 4   | 0.518302149  | 4KW  | FALSE | 9/12/1980  |
    | 19 | r      | 5   | -0.053620183 | 8An  | FALSE | 4/17/2004  |
    | 20 | r      | 14  | -0.359197116 | F8Q  | TRUE  | 6/14/2005  |
    | 21 | spss   | 11  | -2.211875193 | AgS  | TRUE  | 4/11/1973  |
    | 22 | stata  | 4   | -1.718749471 | Zqr  | FALSE | 2/20/1999  |
    | 23 | python | 10  | 1.207878576  | tcC  | FALSE | 4/18/2008  |
    | 24 | stata  | 11  | 0.548902226  | PFJ  | TRUE  | 9/20/1994  |
    | 25 | stata  | 6   | 1.479125922  | 7a7  | FALSE | 3/2/1989   |
    | 26 | python | 10  | -0.437245299 | r32  | TRUE  | 6/7/1997   |
    | 27 | sas    | 14  | 0.404746106  | 6NJ  | TRUE  | 9/23/2013  |
    | 28 | stata  | 8   | 2.206741458  | Ive  | TRUE  | 5/26/2008  |
    | 29 | spss   | 12  | -0.470694096 | dPS  | TRUE  | 5/4/1983   |
    | 30 | sas    | 15  | -0.57169507  | yle  | TRUE  | 6/20/1979  |
    

    SQL (uses aggregate in subquery but can be a stored query)

    SELECT r.*, IIF(sub.MinID = r.ID,1, 0) AS Dup
    FROM Random_Data r
    LEFT JOIN 
       (
          SELECT r.GROUP, MIN(r.ID) As MinID
          FROM Random_Data r
          GROUP BY r.Group
       )  sub
    
    ON r.[Group] = sub.[GROUP] 
    

    Output (notice the first GROUP value is tagged 1, all else 0)

    | ID | GROUP  | INT | NUM          | CHAR | BOOL  | DATE       | Dup |
    |----|--------|-----|--------------|------|-------|------------|-----|
    | 1  | r      | 9   | 1.424490258  | B6z  | TRUE  | 7/4/1994   | 1   |
    | 2  | stata  | 10  | 2.591235683  | h7J  | FALSE | 10/5/1971  | 1   |
    | 3  | spss   | 6   | 0.560461966  | Hrn  | TRUE  | 11/27/1990 | 1   |
    | 4  | stata  | 10  | -1.499272175 | eXL  | FALSE | 4/17/2010  | 0   |
    | 5  | stata  | 15  | 1.470269177  | Vas  | TRUE  | 6/13/2010  | 0   |
    | 6  | r      | 14  | -0.072238898 | puP  | TRUE  | 4/1/1994   | 0   |
    | 7  | julia  | 2   | -1.370405263 | S2l  | FALSE | 12/11/1999 | 1   |
    | 8  | spss   | 6   | -0.153684675 | mAw  | FALSE | 7/28/1977  | 0   |
    | 9  | spss   | 10  | -0.861482674 | cxC  | FALSE | 7/17/1994  | 0   |
    | 10 | spss   | 2   | -0.817222582 | GRn  | FALSE | 10/19/2012 | 0   |
    | 11 | stata  | 2   | 0.949287754  | xgc  | TRUE  | 1/18/2003  | 0   |
    | 12 | stata  | 5   | -1.580841322 | Y1D  | TRUE  | 6/3/2011   | 0   |
    | 13 | r      | 14  | -1.671303816 | JCP  | FALSE | 5/15/1981  | 0   |
    | 14 | r      | 7   | 0.904181025  | Rct  | TRUE  | 7/24/1977  | 0   |
    | 15 | stata  | 10  | -1.198211174 | qJY  | FALSE | 5/6/1982   | 0   |
    | 16 | julia  | 10  | -0.265808162 | 10s  | FALSE | 3/18/1975  | 0   |
    | 17 | r      | 13  | -0.264955027 | 8Md  | TRUE  | 6/11/1974  | 0   |
    | 18 | r      | 4   | 0.518302149  | 4KW  | FALSE | 9/12/1980  | 0   |
    | 19 | r      | 5   | -0.053620183 | 8An  | FALSE | 4/17/2004  | 0   |
    | 20 | r      | 14  | -0.359197116 | F8Q  | TRUE  | 6/14/2005  | 0   |
    | 21 | spss   | 11  | -2.211875193 | AgS  | TRUE  | 4/11/1973  | 0   |
    | 22 | stata  | 4   | -1.718749471 | Zqr  | FALSE | 2/20/1999  | 0   |
    | 23 | python | 10  | 1.207878576  | tcC  | FALSE | 4/18/2008  | 1   |
    | 24 | stata  | 11  | 0.548902226  | PFJ  | TRUE  | 9/20/1994  | 0   |
    | 25 | stata  | 6   | 1.479125922  | 7a7  | FALSE | 3/2/1989   | 0   |
    | 26 | python | 10  | -0.437245299 | r32  | TRUE  | 6/7/1997   | 0   |
    | 27 | sas    | 14  | 0.404746106  | 6NJ  | TRUE  | 9/23/2013  | 1   |
    | 28 | stata  | 8   | 2.206741458  | Ive  | TRUE  | 5/26/2008  | 0   |
    | 29 | spss   | 12  | -0.470694096 | dPS  | TRUE  | 5/4/1983   | 0   |
    | 30 | sas    | 15  | -0.57169507  | yle  | TRUE  | 6/20/1979  | 0   |