pythonjsonpandasdataframeserialization

How to avoid pandas to_json escaping forward ashes in urls


I am trying to load JSON file data into a dataframe, filter a few records, and write it back to file again. My file contains one JSON record per line and each one has a URL in it. This is the sample data in the input file.

{"site_code":"111","site_url":"https://www.site111.com"}
{"site_code":"222","site_url":"https://www.site333.com"}
{"site_code":"333","site_url":"https://www.site333.com"}

Sample code I used

import pandas as pd
sites = pd.read_json('sites.json', lines=True)
modified_sites = sites[sites['site_code']!=222]
modified_sites.to_json('modified_sites.json',orient='records',lines=True)

But the generated file contains escaped forward slashes

{"site_code":111,"site_url":"https:\/\/www.site111.com"}
{"site_code":333,"site_url":"https:\/\/www.site333.com"}

How can I avoid it and get the following data in the generated file?

{"site_code":111,"site_url":"https://www.site111.com"}
{"site_code":333,"site_url":"https://www.site333.com"}

Note: I referred to these but not helpful for my case

  1. pandas to_json() redundant backslashes

Solution

  • You can try to format escaped slashes directly and save result to file:

    import pandas as pd
    import numpy as np
    
    sites = pd.read_json('sites.json', lines=True)
    modified_sites = sites[sites['site_code']!=222]
    modified_sites.to_json('modified_sites.json',orient='records',lines=True)
    formatted_json = modified_sites.to_json(orient='records',lines=True).replace('\\/', '/')
    print(formatted_json, file=open('modified_sites.json', 'w'))