I am trying to reduce the size of a JSON which is 700MB. It's a slightly smaller version of this: https://kaikki.org/dictionary/All%20languages%20combined/by-pos-name/kaikki_dot_org-dictionary-all-by-pos-name.json
I'm doing that by removing unnecessary information. The keys I don't need are: hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms.
I've tried
$Obj =
[System.IO.File]::ReadLines((Convert-Path -LiteralPath namesonly.json)) |
ConvertFrom-Json
$foo = Select-Object $Obj -ExcludeProperty hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms
$foo | ConvertTo-Json -Depth 100 > namesonlycleaned.json
But this results in an empty file. How do I fix it so I'll get a new JSON without those unnecessary fields?
Edit: Suggested in comments to add an asterisk - If I got it right then
$Obj =
[System.IO.File]::ReadLines((Convert-Path -LiteralPath namesonly.json)) |
ConvertFrom-Json
$foo = Select-Object $Obj * -ExcludeProperty hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms
$foo | ConvertTo-Json -Depth 100 > namesonlycleaned.json
Returns the error
A positional parameter cannot be found that accepts argument '*'.
Your immediate problems are the ones pointed out by Mathias R. Jessen:
Unfortunately, in Windows PowerShell the use of Select-Object
's -ExcludeProperty
alone does not work as intended (outputs empty objects) and requires combining with -Property *
- this problem has been fixed in PowerShell (Core) 7+
Input objects must be provided to Select-Object
via the pipeline:
$Obj | Select-Object -Property * -ExcludeProperty hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms
However, this alone will not solve your problem:
Judging by the linked data source and the array of properties you're trying to exclude, some of those properties are those of nested objects, i.e. you're looking to remove properties from each object's object graph.
Select-Object
doesn't support this, but the custom Remove-Property
function (source code in the bottom section) does.
Use the following (make sure you've defined the Remove-Property
function from the bottom section first):
[System.IO.File]::ReadLines((Convert-Path -LiteralPath large.json)) |
ConvertFrom-Json |
Remove-Property -Recurse -Property hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms |
ConvertTo-Json -Compress -Depth 100 > namesonlycleaned.json
Note:
This will run for quite some time, but by using a single pipeline it avoids unnecessary memory use due to intermediate storage of results.
ConvertFrom-Json
reads all input up front before producing output; in terms of runtime performance, however, this part finishes fairly quickly.For troubleshooting - say to limit output to the first 10 objects - you can insert Select-Object -First 10
as a pipeline segment before the ConvertTo-Json
segment.
Remove-Property
source code:
function Remove-Property {
<#
.SYNOPSIS
Removes properties from [pscustomobject] or dictionary objects (hashtables)
and outputs the resulting objects.
.DESCRIPTION
Use -Recurse to remove the specified properties / entries from
the entire object *graph* of each input object, i.e. also from any *nested*
[pscustomobject]s or dictionaries.
Useful for removing unwanted properties / entries from object graphs parsed
from JSON via ConvertFrom-Json.
Attempts to remove non-existent properties / entries are quietly ignored.
.EXAMPLE
[pscustomobject] @{ foo=1; bar=2 } | Remove-Property foo
Removes the 'foo' property from the given custom object and outputs the result.
.EXAMPLE
@{ foo=1; bar=@{foo=10; baz=2} } | Remove-Property foo -Recurse
Removes 'foo' properties (entries) from the entire object graph, i.e. from
the top-level hashtable as well as from any nested hashtables.
#>
param(
[Parameter(Mandatory, Position = 0)] [string[]] $Property,
[switch] $Recurse,
[Parameter(Mandatory, ValueFromPipeline)] [object] $InputObject
)
process {
if (-not (($isPsCustObj = $InputObject -is [System.Management.Automation.PSCustomObject]) -or $InputObject -is [System.Collections.IDictionary])) { Write-Error "Neither a [pscustomobject] nor an [IDictionary] instance: $InputObject"; return }
# Remove the requested properties from the input object itself.
foreach ($propName in $Property) {
# Note: In both cases, if a property / entry by a given name doesn't exist, the .Remove() call is a quiet no-op.
if ($isPsCustObj) {
$InputObject.psobject.Properties.Remove($propName)
}
else {
# IDictionary
$InputObject.Remove($propName)
}
}
# Recurse, if requested.
if ($Recurse) {
if ($isPsCustObj) {
foreach ($prop in $InputObject.psobject.Properties) {
if ($prop.Value -is [System.Management.Automation.PSCustomObject] -or $prop.Value -is [System.Collections.IDictionary]) {
$prop.Value = Remove-Property -InputObject $prop.Value -Recurse -Property $Property
}
}
}
else {
# IDictionary
foreach ($entry in $InputObject.GetEnumerator()) {
if ($entry.Value -is [System.Management.Automation.PSCustomObject] -or $entry.Value -is [System.Collections.IDictionary]) {
$entry.Value = Remove-Property -InputObject $entry.Value -Recurse -Property $Property
}
}
}
}
$InputObject # Output the potentially modified input object.
}
}