How to untar multiple files with an extension .tar.gz.aa, .tar.gz.ab..... until .tar.gz.an each file being around 10 GB in Windows?
I've tried the following commands in my powershell(with admin rights):
cat <name>.tar.gz.aa | tar xzvf -
cat : Exception of type 'System.OutOfMemoryException' was thrown.
At line:1 char:1
+ cat <name>.tar.gz.aa | tar xzvf –
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Get-Content], OutOfMemoryException
+ FullyQualifiedErrorId : System.OutOfMemoryException,Microsoft.PowerShell.Commands.GetContentCommand
cat *.tar.gz.* | zcat | tar xvf -
zcat : The term 'zcat' is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:18
+ cat *.tar.gz.* | zcat | tar xvf -
+ ~~~~
+ CategoryInfo : ObjectNotFound: (zcat:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Thanks in advance! Would be happy to know of any solutions for linux as well, if anyone else might be facing a same difficulty.
You are calling cat
(an alias for Get-Content
) to enumerate the contents of a single file and then attempting to pass the parsed file content to tar
. You were getting the OutOfMemoryException
due to this. Get-Content
is not designed to read binary files, it's designed to read ASCII and Unicode text files, and certainly not 10GB of them. Even if you had the available memory I don't know how performantly Get-Content
would handle single files that large.
Just pass the file path to tar
like this, adding any additional arguments you need such as controlling output directory, etc.:
tar xvzf "$name.tar.gz.aa"
You can extract all of the archives with a loop in one go (with some helpful output and result checking). This code is also 100% executable in PowerShell Core and should work on Linux:
Push-Location $pathToFolderWithGzips
try {
( Get-ChildItem -File *.tar.gz.a[a-n] ).FullName | ForEach-Object {
Write-Host "Extracting $_"
tar xzf $_
if( $LASTEXITCODE -ne 0 ) {
Write-Warning "tar returned $LASTEXITCODE"
}
}
} finally {
Pop-Location
}
Let's break this down:
$pathToFolderWithGzips
should be set to the full path to the directory containing your tarballs.Push-Location
works like cd
but uses the location stack. You can return to previous directories with Pop-Location
. We change directories to the location we want to extract the files to.
cd -
and cd +
try
block so we can go back to the previous folder location after the try
completes.( Get-ChildItem -File *.tar.gz.a[a-n] ).FullName
enumerates all files in the current directory matching the globbing pattern, but making sure the last letter is one of a
through n
. Accessing the FullName
property gives us only the fully-qualified paths for each file which is all we need to pass down the pipeline. | ForEach-Object { ... }
will pipe all of the filenames from the FullName
values of the previous expression and iterate over each fully-qualified path.Write-Host
outputs information to the console via the information stream. This text is not programmatically accessible within the current PowerShell session. Write-Warning
is used further on for a similar effect but is visually distinct.
Write-Output
instead if you do want the text to be processed within the same session later on, but usually we want to operate on objects over strings if we can.$_
is an alias for $PSItem
, which is an automatic variable used for pipeline context. Every file path iterated over in the ForEach-Object
loop will be referenced as $PSItem
. We pass the archive path to tar
with this variable.$LASTEXITCODE
is set when the last executable finishes running. This works similarly to how $?
works in bash
(though don't confuse this for PowerShell's $?
). -ne
is the operator for "not equals"finally
is used after closing the try
block to Pop-Location
back to the previous directory. The finally
block is always executed *regardless of whether the try
code succeeds or fails.
tar
executable so if you know how to control folder output without being in the current directory, you can omit the Push-Location
,Pop-Location
, try
, and finally
bits and just run what is inside the current try
block, modifying the tar
command appropriately. You will also need to prefix*.tar.gz.a[a-n]
with $pathToFolderWithGzips
(e.g. $pathToFolderWithGzips\*.tar.gz.a[a-n]
) in this case too.