I have 50 files in a folder that has to be loaded to SQL. In most of the file(but not in all) the very first character is a hashtag that i would like to delete because it creates difficulty in the column mapping.
So I would not want to loop through all the lines of the file just delete the very first character in case it is a hashtag and save the file.
So i need to analyse also if the first character is a hashtag. If it is, then it should be deleted, if it is not, then i should simply exit from the file.
What is the fastest method for this?
I tried these:
$Path = "D:\folder\myfolder"
$OldText = "#"
$NewText = ""
--> this loops through all the file looking for hashtags
(Get-Content 'D:\folder\myfoldermyfile.txt' -raw) -replace '^.' | Set-Content 'D:\folder\myfoldermyfile.txt'
--> this removed the first character of each line, so its not good for me.
To complement Theo's helpful answer with an eye toward performance:
The only immediate problem with your code was the use of regex '^.'
, because the .
metacharacter matches any character (except a newline, by default); therefore, using '^#'
instead of '^.'
fixes your problem by only matching files whose first character is #
(and by not specifying a substitution operand to the -replace
operator, the matched #
is replaced with the empty string and therefore effectively removed).
Therefore, applied to all *.txt
files in D:\folder\myfolder
:
# Note:
# * '^#' instead of '^.'
# * -NoNewLine to prevent Set-Content from writing an extra newline.
# * CAVEATS:
# * UPDATES THE FILES IN-PLACE - to be safe, keep backup copies.
# * The *character encoding may change*, because Set-Content applies
# its default encoding, irrespective of the input encoding.
# USE -Encoding AS NEEDED.
Get-ChildItem 'D:\folder\myfolder\*.txt' |
ForEach-Object {
($_ | Get-Content -Raw) -replace '^#' |
Set-Content -NoNewLine -LiteralPath $_.FullName
}
Performance-wise, there are two problems with the above:
Even though using -Raw
with Get-Content
is the most efficient way to read a file in full, as a single, multiline string (and analogously, so is writing the file content in full as a single string), you don't need to do that if all you want to do is to examine the first character.
By using the results of the -replace
operation unconditionally, you end up rewriting even the files that don't need it, i.e. those that do not start with #
(if -replace
cannot find a match, it passes the input string through).
You can address these issues as follows, building on mclayton's suggestion to (initially) only read the first line of each file:
# Same CAVEATS as above apply.
Get-ChildItem 'D:\folder\myfolder\*.txt' |
ForEach-Object {
if (($_ | Get-Content -First 1) -match '^#') {
($_ | Get-Content -Raw).Substring(1) |
Set-Content -NoNewLine -LiteralPath $_.FullName
}
}
Note: