I'm curious if there's a way to extract email:password
from a big list.
It is listed in the text in that format but with a few other unuseable parts in front (such as name, last name).
The format is mostly:
xx:Mxx:Support:xx:support@xx.com:x19000
But sometimes can be like this as well:
xxxx::gexrge@xxnt.com:111111
I have tried with EmEditor and if I search for
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).*$
it does find it. I have then to replace with \1
- however this takes literally ages and finally crashes (the file is 17GB).
Knowing that powershell could do this too, I'm looking for the right command.
The switch
statement allows combining efficient line-by-line processing of files (via the -File
parameter), optionally combined with regex-matching (via the -Regex
option):
& {
switch -regex -file in.txt {
'(?<=:)[^@:]+@[^:]+:.*' { $Matches[0] }
}
} | Set-Content -Encoding utf8 out.txt
Adjust the -Encoding
argument as needed; note that in Windows PowerShell utf8
creates a file with BOM, whereas PowerShell [Core] v6+ creates one wihout BOM. By default, Set-Encoding
uses the system's active ANSI code page in Windows PowerShell, whereas PowerShell [Core] v6+ consistently defaults to BOM-less UTF-8, across all cmdlets.
The above extracts the email-password pairs extracted from file in.txt
as individual lines to file out.txt
.
Note: Even though the above performs line-by-line processing, an out-of-memory exception can apparently still occur in Set-Content
with very large input files; the .NET-based solution in the next section should fix that, while also significantly speeding up the operation.
Performance caveat: While the above is memory-efficient, it will be slow with large files; to address that, you must make direct use of the .NET framework, via a System.IO.StreamWriter
instance:
# Create the output file.
# Note:
# * Be sure to use a *full* path, because .NET's current dir. usually differs
# from PowerShell's
# * UTF-8 *without a BOM* is used as the character encoding by default,
# but you may pass a [System.Text.Encoding] instance as needed.
$sw = [System.IO.StreamWriter]::new("$PWD/out.txt")
switch -regex -file in.txt {
'(?<=:)[^@:]+@[^:]+:.*' { $sw.WriteLine($Matches[0]) }
}
$sw.Close()