jsonbashbioinformaticsdata-extractiontext-manipulation

Data extraction for specific string


I have a long list of JSON data, with repeats of contents similar to followings.

Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.

Direct link to the JSON file

I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below. Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.

eQTLs   firstResult, secondResult, thirdResult, ...

In this example, the desired output is:

eQTLs   EIF3S8, EIF3CL

I've tried using a python script but was unsuccessful.

import json
with open('file.json') as f:
    f_json = json.load(f)
    print 'f_json[0]['"method": "eQTLs"'] + "\t" + f_json[0]["value"]

Thank you for your kind help.


Solution

  • Try this:

    cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'