I'm attempting to find files in a folder of filenames that look like the following:
C:\XMLFiles\
in.blahblah.xml
out.blahblah.xml
in.blah.xml
out.blah.xml
I need to return results of only files that do not have it's "counterpart". This folder contains thousands of files with randomized "center" portions of the file names....the commonality is in/out and ".xml".
Is there a way to do this in Powershell? It's an odd ask.
Thanks.
Your question is a little vague. I hope I got it right. Here is how I would do it.
$dir = 'my_dir'
$singleFiles = [System.Collections.Generic.HashSet[string]]::new()
Get-ChildItem $dir -Filter '*.xml' | ForEach-Object {
if ($_.BaseName -match '^(?<prefix>in|out)(?<rest>\..+)') {
$oppositeFileName = if ($Matches.prefix -eq 'in') {
'out'
}
else {
'in'
}
$oppositeFileName += $Matches.rest + $_.Extension
$oppositeFileFullName = Join-Path $_.DirectoryName -ChildPath $oppositeFileName
if ($singleFiles.Contains($oppositeFileFullName)) {
$singleFiles.Remove($oppositeFileFullName) | Out-Null
}
else {
$singleFiles.Add($_.FullName) | Out-Null
}
}
}
$singleFiles
I'm getting all the XML files from the directory and I'm iterating the results. I check the base name of the file (the name of the file doesn't include the directory path and the extension) if they match a regex. The regex says: match if the name starts with in
or out
followed by at least 1 character.
The $Matches
automatic variable contains the matched groups. Based on these groups I'm building the name of the counter-part file: i.e. if I'm currently on in.abc
I build out.abc
.
After that, I'm building the absolute path of the file counter-part file and I check if it exists in the HashSet. if It does, I remove it because that means that at some point I iterated that file. Otherwise, I'm adding the current file.
The resulting HashSet will contain the files that do not have the counter part.
Tell me if you need a more detailed explanation and I will go line by line. It could be refactored a bit, but it does the job.