I want to edit plain text files (MT940 Standard).
Here is an example file with dummy data
-
:20:296535/00000010
:21:ABNADK2AXXX
:25:ABNADK2AXXX/DK88ABNA0496434500
:28C:42/00002
:60M:C230228EUR124792,65
:61:2302280228C1750,88NTRFC1165-23-00120//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK47ABNA0243508514/BIC/ABNADK2A/NAME/
LOOP BV/REMI/AV-RUN 24022023/202301918/EREF/C1165-23-00120
:61:2302280228C4695,98NTRF6381310605374038//656
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK14ABNA0456766324/BIC/ABNADK2A/NAME/
DEV BV/REMI/ID16145 DEB. 1657139 FACT. 202303668 20
2303685 202303689/EREF/638131060537403857-311-2
:61:2302280228C1349,25NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK46ABNA0513892443/BIC/ABNADK2A/NAME/
EXAMPLE COM/REMI/202303656/EREF/NOTPROVIDED
:61:2302280228C55845,96NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK35ABNA0442867689/BIC/ABNADK2A/NAME/
BATH COMPANY DK/REMI/INV. 202228255-8426, OUR REF 2022611
73-79/EREF/NOTPROVIDED
:61:2302280228D105000,NTRFNOTPROVIDED//658
:86:/TRTP/SEPA OVERBOEKING/IBAN/DK98INGB0657624985/BIC/INGBDK2A/NAME/
TEST/REMI/OVERBOEKING/EREF/NOTPROVIDED
:62F:C230228EUR83434,72
:64:C230228EUR83434,72
:86:/ACSI/ABNADK2AXXX
-
:20:STARTUMS TA FW
:25:28020050/0521322890
:28C:017/01
:60F:C230228GBP1473111,27
:61:2302280228D1919,29N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20COMPANY?21S LT
D?22TRN AZV2023022800746?23URSP.-BETR.1.900,00 GBP?24KURS 0,87716
0 EUR ZU GBP?25GEGENWERT 2.00,08 EUR?26PROVISION FIX 7
,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?28FREMDE GEB. 12,50 E
UR?2917.02 413337?3028020050?310537246190?32HOMETESTEXAMPLE?33S
LTD?34003
:61:2302280228D16988,81N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20BODO GU COM?21NOT A TEST?22TRN
AZV2023022800749?23URSP.-BETR.16.980,48 GBP?24
KURS 0,877160 EUR ZU GBP?25GEGENWERT 19.358,48 EUR?26PROVISIO
N FIX 7,50 EUR?27SWIFT-/TELE-SPESEN 2,00 EUR?2830.01 INV-278
0?29*LOREM*?3028020050?310537246190?32GOLL
?33GOL COM?34003?60INFO 0800-1234
*GEB-FREI*
:61:2302280228D867,06N020NONREF
:86:206?00AUSL-ZAHLUNG?100004649?20NOTACOMPANY?21LTD?2
2TRN AZV2023022800752?23URSP.-BETR.858,73 GBP?24KURS 0,877160 EUR
ZU GBP?25GEGENWERT 978,99 EUR?26PROVISION FIX 7,50 E
UR?27SWIFT-/TELE-SPESEN 2,00 EUR?2828.01 A221322?3028020050?31053
7246190?32KOLL?33LTD?34003
:62F:C230228GBP1453336,11
-
The script should search for lines that start with :86: and have not a slash then 4 characters and another slash following.
The regex for this is: ^:86:(?!/..../)
From this matched line the script should go up and find the next line with just a "-" and mark this as the start of the section, that should be erased. And from the matched regex line it should also go further in the file, to find the next line with only a "-" and use this (including the -) als end marker for the section, that should be erased.
this algorithm should loop through the whole file.
I have this script. And it works almost perfectly. BUT, I does not use the "-" before the matched pattern. Instead it uses the pattern-line itself as start for the section, that should be erased.
Can someone tell me what the problem is?
# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\TestKopie.A01"
# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"
# Function to remove sections based on pattern and "-"
function RemoveSections($content) {
$outputContent = @()
$eraseMode = $false
$previousLine = ""
for ($i = 0; $i -lt $content.Length; $i++) {
$line = $content[$i]
if ($line -match "^:86:(?!/..../)") {
$eraseMode = $true
# Find the previous "-" line
$previousLineIndex = $i - 1
while ($previousLineIndex -ge 0 -and $content[$previousLineIndex] -ne "-") {
$previousLineIndex--
}
if ($previousLineIndex -ge 0) {
$outputContent += $content[$previousLineIndex]
}
}
if ($eraseMode -and $line -eq "-") {
$eraseMode = $false
# Find the next "-" line
$nextLineIndex = $i + 1
while ($nextLineIndex -lt $content.Length -and $content[$nextLineIndex] -ne "-") {
$nextLineIndex++
}
if ($nextLineIndex -lt $content.Length) {
$i = $nextLineIndex + 1 # Skip the section between "-" lines, including the next "-"
continue
}
}
if (!$eraseMode) {
$outputContent += $line
}
}
return $outputContent
}
# Read the input file content
$inputContent = Get-Content $inputFilePath
# Initialize variables
$iteration = 0
$linesRemoved = 0
# Remove sections based on pattern and "-" until no more changes occur
do {
$iteration++
Write-Host "Iteration: $iteration"
Write-Host "Lines removed: $linesRemoved"
$linesRemoved = 0
# Remove sections and count the lines removed
$outputContent = RemoveSections $inputContent
$linesRemoved = ($inputContent.Length - $outputContent.Length)
# Output progress
Write-Host "Lines removed in this iteration: $linesRemoved"
Write-Host "----------------------------"
# Update the input content for the next iteration
$inputContent = $outputContent
} while ($linesRemoved -gt 0)
# Save the modified content to the output file
$outputContent | Out-File $outputFilePath -Force
Write-Host "Process complete. Modified content saved to $outputFilePath"
EDIT: Here is the working script based on the regex-pattern of @wiktor-stribiżew :-)
# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"
# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"
# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw
# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'
# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force
You can use
(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*
to simply remove the whole block of text from the entire text contents once you load it into memory as a single string.
See the regex demo. Details:
(?sm)
- regex flags that tell the regex engine to make ^
and $
match start/end of any line (m
) and to make the .
match newlines, too^
- matches start of a line-
- a -
char(?:(?!^-\r?$).)*?
- any char, zero or more but as few as possible occurrences, that is not a single -
on an entire line^:86:
- start of a line and :86:
(?!/..../)
- immediately to the right, there must be no /
+ four any chars + /
(?:(?!^-\r?$).)*
- any char, zero or more but as many as possible occurrences, that is not a single -
on an entire line.In PowerShell, you can use
# Specify the path to the input file
$inputFilePath = "V:\Temp\finance\Test.A01"
# Specify the path to the output file
$outputFilePath = "V:\Temp\finance\Hacked.A01"
# Read the input file content
$inputContent = Get-Content $inputFilePath -Raw
# Perform the replacement
$modifiedContent = $inputContent -replace '(?sm)^-(?:(?!^-\r?$).)*?^:86:(?!/..../)(?:(?!^-\r?$).)*'
# Save the modified content to the output file
$modifiedContent | Set-Content $outputFilePath -Force
NOTE: Since the s
flag is in use, you probably want to replace /..../
with /[^\r\n]{4}/
to match any four chars that are not carriage returns nor line feed chars.