powershellpowershell-5.0pscustomobject

pscustomobject: delete unique, keep duplicates


I have written a script that extracts all the duplicate emails in a WS2012 AD.

The Script has all the Users with duplicate emails in a pscustomobject

Now what i want to achieve is to delete the first unique email entry and keep all the duplicate email entries so i can build dummy emails for the duplicate ones, leaving the first unique email entry alone so every user in the AD can have a unique email even if it is a dummy email.

This is an example of what the pscustomobject exported into CSV looks like:

Name     Telephone    Email     Department

Max Smith 12345 max@gmail.com Billing
Max Jones 6789 max@gmail.com Facility
James Adams 52585 james@outlook.com Import
James Jones 46844 james@outlook.com Service
James Bones 68315 james@outlook.com Management

What i need to build out of the above is:

Name     Telephone    Email     Department

Max Jones 6789 max@gmail.com Facility
James Jones 46844 james@outlook.com Service
James Bones 68315 james@outlook.com Management

The first email entry is gone, all the duplicates are still there.

The dummy email would be telephoneNumber@company.com like 46844@microsoft.com for James Jones.

I am constantly failing to build a pscustomobject without the first unique email and consisting of duplicates only.

I hope the Wizards of Stack Overflow can help me.

Thank You and Best Regards.


Solution

  • In order to only "keep the duplicates", you need to keep track of email addresses you've already seen before.

    For this, I'd recommend using a HashSet<string> - a set only contains distinct values, and is very fast at determining whether a given value is already a member of the set in the first place - ideal for this use case.

    In the following, I assume that $data contains an array of pscustomobjects as described in the question:

    $alreadySeen = [System.Collections.Generic.HashSet[string]]::new()
    
    $duplicatesOnly = $data |Where-Object { -not $alreadySeen.Add($_.Email) }
    
    $duplicatesOnly |Export-Csv path\to\output.csv
    

    The first time you add a unique value to the set, Add() will return $true, but subsequent attempts to add the same value will return $false - meaning our Where-Object filter will only filter through objects where the Email column has already been seen at least once before.

    If the emails are not uniformly cased, supply a case-insensitive string comparer when creating the hashset:

    $alreadySeen = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::InvariantCultureIgnoreCase)