excelexcel-formuladuplicatesdistinctworksheet

Excel formula to find duplicates from multiple sheets and list down on a new sheet or master sheet


I have an excel workbook which stores Employee data. It has 3 sheet which has data coming from 3 different sources. All the 3 sheet has the Employee ID column as common which is ideally the first column.

Now I am looking for a formula which could find Employee ID which are present on all 3 sheets and list down in the master sheet in the same workbook preferably in Column1. I am not looking for Unique or Distinct Employee ID's but one's which are present on all these 3 sheets.

Compulsorily the Employee ID has to be on all the 3 sheets then only should appear in the Master sheet. How can i achieve that?

Sample

Every time I manually copy paste the employee ids in Master sheet and then do a vlookup on all 3 sheet and then i extract the one's which appear on all the 3 sheets and delete the rest. This take sometime. Now I am looking for formula which can do that for me. Compare, search and then list down.


Solution

  • Try using the following formula:

    enter image description here


    • Formula used in cell A1 --> Method 1

    =LET(
         _Stores, TOCOL('*'!A:A,1),
         _Uniq, UNIQUE(_Stores),
         _Counts, MMULT(N(_Uniq=TOROW(_Stores)),SEQUENCE(ROWS(_Stores))^0),
         SORT(FILTER(_Uniq,_Counts=3)))
    

    Or, If applicable can use GROUPBY() --> Exclusively Applicable MS365 Office Insiders Version. Method 2:

    =LET(
         _Stores, TOCOL('*'!A:A,1),
         _Group, GROUPBY(_Stores,_Stores,ROWS,,0,1),
         TAKE(FILTER(_Group, TAKE(_Group,,-1)=3),,1))
    

    Note: When placing the formula in the cell the * in TOCOL() function will automatically change to TOCOL(Sheet1:Sheet3!A:A,1)


    Or, Can use BYROW() function - Method 3:

    =LET(
         _Stores, TOCOL(Sheet1:Sheet3!A:A,1),
         _Uniq, SORT(UNIQUE(_Stores)),
         FILTER(_Uniq, BYROW(_Uniq, LAMBDA(x, SUM(--(x=_Stores))))>=3))
    

    Explanations about the formula logic:

    The algorithm remains same for the other two following formulas provided, the second uses a GROUPBY() function which is Beta/Office Insider Version, one needs to enable it while using MS365 and writing the above, it is bit shorter, and the last method uses a LAMBDA() helper function called BYROW() which does custom logical calculations per row in an array and returns one result per row as an array.

    The basic logic remains same, stack all the data into one, then get the uniques, do the counts and keep only those which are all present in the three stores.


    This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)

    enter image description here


    To use Power Query follow the steps:




    let
        Source = Excel.CurrentWorkbook(),
        #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Name], "Query")),
        #"Expanded Content" = Table.ExpandTableColumn(#"Filtered Rows", "Content", {"Column1"}, {"Column1"}),
        #"Removed Other Columns" = Table.SelectColumns(#"Expanded Content",{"Column1"}),
        #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
        #"Filtered Rows1" = Table.SelectRows(#"Grouped Rows", each [Count] = 3),
        #"Sorted Rows" = Table.Sort(#"Filtered Rows1",{{"Column1", Order.Ascending}}),
        #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Count"})
    in
        #"Removed Columns"
    

    enter image description here



    Notes: The second step in the Power Query, I am only taking the tables from the three sheets and not the table which will be an output in the Master Sheet, hence excluding it or filtering it out. So, when you name the Tables and while pasting the M-Code do exclude output query, otherwise whenever you add new data and refresh the imported table in the Master Sheet it will keep duplicating the desired output.

    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Name], "Query"))
    

    However, in the above formulas if you don't have access to TOCOL() function then you can replace that part with the following, but it has a drawback, since it uses FILTERXML() and TEXTJOIN(), the former will not work on web version of Excel if you happen to use it, and the later has character limitations -->32767 characters (cell limit) which when reached will return #VALUE! error, but with the existing example it works without any issue.


    =FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN(",",,Sheet1:Sheet3!A:A),",","</b><b>")&"</b></a>","//b")