bashsedpii

Bash: Need to replace different email addresses within a file


I'm trying to mask PII in a file (.json).

The file contains different email addresses and I would like to change them with other different email addresses.

For example:

"results":

[{ "email1@domain1.com",

"email2@domain2.com",

"email3@domain3.com",

"email4@domain4.com",

"email5@domain5.com" }]

I need to change them to:

"results":

[{ "mockemail1@mockdomain1.com",

"mockemail2@mockdomain2.com",

"mockemail3@mockdomain3.com",

"mockemail4@mockdomain4.com",

"mockemail5@mockdomain5.com" }]

Using sed and regex I have been able to change the addresses to one of the mock email addresses, but I would like to change each email to a different mock email.

The mock email addresses are stored in a file. To get a random address I use:

RandomEmail=$(shuf -n 1 Mock_data.csv | cut -d "|" -f 3)

Any ideas? Thanks!


Solution

  • input.json You've got your JSON file (add an extra breakline at the end that does not appear in this example or read function in bash won't work correctly)

    "results":
    
    [{ "email1@mockdomain1.com",
    
    "email2@mockdomain2.com",
    
    "email3@mockdomain3.com",
    
    "email4@mockdomain4.com",
    
    "email5@mockdomain5.com" }]
    

    substitutions.txt (add an extra breakline at the end that does not appear in this example or read function in bash won't work correctly)

    domain1.com;mockdomain1.com
    domain2.com;mockdomain2.com
    domain3.com;mockdomain3.com
    domain4.com;mockdomain4.com
    domain5.com;mockdomain5.com
    

    script.sh

      #!/bin/bash
      while read _line; do
      unset _ResultLine
    
      while read _subs; do
        _strSearch=$(echo $_subs | cut -d";" -f1)
        _strReplace=$(echo $_subs | cut -d";" -f2)
    
        if [ "$(echo "$_line" | grep "@$_strSearch")" ]; then
          echo "$_line" | awk -F"\t" -v strSearch=$_strSearch -v strReplace=$_strReplace \
          '{sub(strSearch,strReplace); print $1}' >> output.json
          _ResultLine="ok"
        fi
      done < substitutions.txt
    
      [ "$_ResultLine" != "ok" ] && echo "$_line" >> output.json
    done < input.json
    

    ouput.json

    "results":
    
    [{ "email1@mockdomain1.com",
    
    "email2@mockdomain2.com",
    
    "email3@mockdomain3.com",
    
    "email4@mockdomain4.com",
    
    "email5@mockdomain5.com" }]