jsonpowershelllarge-files

Using Powershell, how do I filter a JSON to exclude certain key names?


I am trying to reduce the size of a JSON which is 700MB. It's a slightly smaller version of this: https://kaikki.org/dictionary/All%20languages%20combined/by-pos-name/kaikki_dot_org-dictionary-all-by-pos-name.json

I'm doing that by removing unnecessary information. The keys I don't need are: hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms.

I've tried

$Obj = 
  [System.IO.File]::ReadLines((Convert-Path -LiteralPath namesonly.json)) | 
  ConvertFrom-Json
$foo = Select-Object $Obj -ExcludeProperty hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms
$foo | ConvertTo-Json -Depth 100 > namesonlycleaned.json

But this results in an empty file. How do I fix it so I'll get a new JSON without those unnecessary fields?

Edit: Suggested in comments to add an asterisk - If I got it right then

$Obj = 
  [System.IO.File]::ReadLines((Convert-Path -LiteralPath namesonly.json)) | 
  ConvertFrom-Json
$foo = Select-Object $Obj * -ExcludeProperty hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms
$foo | ConvertTo-Json -Depth 100 > namesonlycleaned.json

Returns the error

A positional parameter cannot be found that accepts argument '*'.

Solution


  • Use the following (make sure you've defined the Remove-Property function from the bottom section first):

    [System.IO.File]::ReadLines((Convert-Path -LiteralPath large.json)) | 
      ConvertFrom-Json |
      Remove-Property -Recurse -Property hypernyms,pos,categories,alt_of,inflection_templates,hyponyms,meronyms,source,wikipedia,holonyms,proverbs,head_templates,etymology_text,lang_code,hyphenation,forms,synonyms,antonyms |
      ConvertTo-Json -Compress -Depth 100 > namesonlycleaned.json
    

    Note:


    Remove-Property source code:

    function Remove-Property {
      <#
      .SYNOPSIS
      Removes properties from [pscustomobject] or dictionary objects (hashtables)
      and outputs the resulting objects.
      
      .DESCRIPTION
      Use -Recurse to remove the specified properties / entries from 
      the entire object *graph* of each input object, i.e. also from any *nested* 
      [pscustomobject]s or dictionaries.
    
      Useful for removing unwanted properties / entries from object graphs parsed
      from JSON via ConvertFrom-Json.
    
      Attempts to remove non-existent properties / entries are quietly ignored.
      
      .EXAMPLE
      [pscustomobject] @{ foo=1; bar=2 } | Remove-Property foo
    
      Removes the 'foo' property from the given custom object and outputs the result.
    
      .EXAMPLE
      @{ foo=1; bar=@{foo=10; baz=2} } | Remove-Property foo -Recurse
    
      Removes 'foo' properties (entries) from the entire object graph, i.e. from
      the top-level hashtable as well as from any nested hashtables.
      #>
      param(
        [Parameter(Mandatory, Position = 0)] [string[]] $Property,
        [switch] $Recurse,
        [Parameter(Mandatory, ValueFromPipeline)] [object] $InputObject
      )
      process {
        if (-not (($isPsCustObj = $InputObject -is [System.Management.Automation.PSCustomObject]) -or $InputObject -is [System.Collections.IDictionary])) { Write-Error "Neither a [pscustomobject] nor an [IDictionary] instance: $InputObject"; return }
        # Remove the requested properties from the input object itself.
        foreach ($propName in $Property) {
          # Note: In both cases, if  a property / entry by a given name doesn't exist, the .Remove() call is a quiet no-op.
          if ($isPsCustObj) {        
            $InputObject.psobject.Properties.Remove($propName)
          }
          else {
            # IDictionary
            $InputObject.Remove($propName)
          }
        }
        # Recurse, if requested.
        if ($Recurse) {
          if ($isPsCustObj) {
            foreach ($prop in $InputObject.psobject.Properties) {
              if ($prop.Value -is [System.Management.Automation.PSCustomObject] -or $prop.Value -is [System.Collections.IDictionary]) {
                $prop.Value = Remove-Property -InputObject $prop.Value -Recurse -Property $Property
              }
            }
          }
          else {
            # IDictionary
            foreach ($entry in $InputObject.GetEnumerator()) {
              if ($entry.Value -is [System.Management.Automation.PSCustomObject] -or $entry.Value -is [System.Collections.IDictionary]) {
                $entry.Value = Remove-Property -InputObject $entry.Value -Recurse -Property $Property
              }
            }
          }
        }
        $InputObject # Output the potentially modified input object.
      }
    }