I'm parsing a file using powershell 5.1 and I almost get my goal but I need help in this part. The input file is like this:
CODE=MMH4
description=beg somedata - dduik
CODE=PPH2
description=beg Area1 - ABC=704&&DEF=03||ABC=706&&DEF=04
END
CODE=LGT6
description=beg somedata - yyu
END
CODE=KK7
description=beg Area4 - ABC=334&&DEF=030
END
This is my current script
(Get-Content file.txt) `
-replace '\|\|', "`r`n" `
-replace '.*description=.*- ', "" `
-replace '\&\&', "`t"
And I'm getting this output
CODE=MMH4
dduik
CODE=PPH2
ABC=704 DEF=03
ABC=706 DEF=04
END
CODE=LGT6
yyu
END
CODE=KK7
ABC=334 DEF=030
END
I'm new to powershell and I'd like to get the CODE value for each block and put that code next to each internal line of the block like this:
CODE=MMH4
dduik MMH4
CODE=PPH2
ABC=704 DEF=03 PPH2
ABC=706 DEF=04 PPH2
END
CODE=LGT6
yyu LGT6
END
CODE=KK7
ABC=334 DEF=030 KK7
END
To filter the lines that contains "ABC" like this
ABC=704 DEF=03 PPH2
ABC=706 DEF=04 PPH2
ABC=334 DEF=030 KK7
and finally get the output with ABC= and DEF= removed to have only the numbers, like this
ABC DEF CODE
704 03 PPH2
706 04 PPH2
334 030 KK7
I know that to filter desired lines I need to pipe | Select-String -Pattern "ABC" but I don't know how to get previous step that is the code for internal lines of each block. I hope make sense.
Thanks in advance
Perhaps the following can help you get your desired final output, it is a more manual and classic approach, it doesn't intent to solve everything with regex but it does help with the logical conditions:
Get-Content file.txt | ForEach-Object {
# if the line starts with CODE= followed by a code
if ($_ -match '^CODE=(.+)') {
# capture the code (e.g. MMH4, PPH2, etc)
$code = $Matches[1]
# and go to next line
return
}
# if we have captured a code in previous lines
if ($code) {
# loop thru each match of a line containing a combination of `ABC=...&&DEF=...`
# we use [Regex]::Matches(..) here as there could be multiple `||` conditions
foreach ($match in [regex]::Matches($_, '(?:ABC=\d+&&DEF=\d+)+')) {
# replace `&&` with a TAB concatenated with TAB and the captured code
"$($match.Value.Replace('&&', "`t"))`t$code"
}
# then we clear this variable to look for a new line starting with CODE=
# NOTE: See the end of the answer to understand the usage of this
$code = $null
}
}
The above, using your sample text would produce the following output:
ABC=704 DEF=03 PPH2
ABC=706 DEF=04 PPH2
ABC=334 DEF=030 KK7
You might notice the use of $code = $null, this is a tiny optimization to not enter the if ($code) condition more than once for each appearance of a line starting with CODE=, it could be removed and removing it could have a different effect depending on the input, taking this as a small sample:
CODE=KK7
description=beg Area4 - ABC=334&&DEF=030
description=beg Area4 - ABC=335&&DEF=031
END
If left as-is, the result for the sample would be:
ABC=334 DEF=030 KK7
Whereas if that line of code is removed, the logic would match both inner lines:
ABC=334 DEF=030 KK7
ABC=335 DEF=031 KK7
Looking at the latest edit on your question, you're seemingly wanting to convert your text into a TSV, for this I'd recommend to use a function where you can output the headers in its begin block; you could technically use ForEach-Object -Begin {...} too if you like.
So the code doesn't change that much from what was provided above, the regex pattern will use 2 capturing groups to get the value of ABC= and DEF=... see here for details: https://regex101.com/r/qvGBZy/1.
function ConvertTo-Tsv {
param(
[Parameter(ValueFromPipeline, Mandatory)]
[string] $Line,
# using a parameter instead of hardcoding them in `begin {..}`
# gives more flexibility, you can provide new headers when needed
[Parameter(Position = 0)]
[string[]] $Header = ('ABC', 'DEF', 'CODE')
)
begin {
# output the header joined by TAB
$Header -join "`t"
$re = [regex]::new('(?:ABC=(?<abc>\d+)&&DEF=(?<def>\d+))+')
}
process {
if ($Line -match '^CODE=(.+)') {
$code = $Matches[1]
return
}
if ($code) {
foreach ($match in $re.Matches($Line)) {
# output values of ABC= and DEF= with the value of CODE=
"$($match.Groups['abc'])`t$($match.Groups['def'])`t$code"
}
$code = $null
}
}
}
Then the usage is pretty straight forward, read the file content, pipe to our function and lastly pipe to file storage:
Get-Content myFile.txt | ConvertTo-Tsv | Set-Content myOutput.tsv
The expected output using the function given the sample input text would be:
ABC DEF CODE
704 03 PPH2
706 04 PPH2
334 030 KK7
And we can tell this is a valid TSV given that a conversion from it produces the following objects:
PS> Get-Content sample.txt | ConvertTo-Tsv | ConvertFrom-Csv -Delimiter "`t"
ABC DEF CODE
--- --- ----
704 03 PPH2
706 04 PPH2
334 030 KK7