I'm trying to parse a curl
request and parse the output and store it on a file called res.txt
Here is my bash cmd line:
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' >> grep -Po '"scan_id":.*?[^\\]",' res.txt
The output is something like this:
{"permalink": "https://www.virustotal.com/gui/url/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/detection/u-17f485d68047604e61b4067310ab716ae6fddc774bb46ffab06d081613b28e49-1595992331", "resource": "http://abcde.xyz/", "url": "http://abcde.xyz/", "response_code": 1, "scan_date": "2020-07-29 03:12:11", "scan_id": "000000000000000000000000000000000000000", "verbose_msg": "Scan request successfully queued, come back later for the report"}`
I want to store scan_id code on res.txt, but it is not working, no errors! And I do not know if my regex is correct
Can you help me?
The core of the question is about extracting values from JSON data (created by curl, in this specific case).
While it is possible to parse specific JSON data using regular expressions (assuming a particular structure of while spaces/line breaks), it is very hard (impossible ?) to write regular expression that will cover all possible formatting. This is similar to parsing XML data - some formats can be parsed with regex, but extremely hard to write generic parser.
Instead of regex, consider using JSON specific tool, e.g., jq
Also, there construction of the pipe (curl to grep) should use '|' and not '>>', and the '>' should be used to specify the name of the file result. See below:
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' |
jq .scan_id > res.txt
To remove the quotes from the res.txt ,use the 'raw-output format of jq (
jq -r .scan_id`)
If not possible to use jq for any reason, consider the following modification. It is using 'sed' (instead of grep) to extract the scan_id value (0000...
in this case). It assumes that that the "scan_id" tag and value are on the same line.
curl --request POST --url 'https://www.virustotal.com/vtapi/v2/url/scan' --data 'apikey=XXXXXXXXXXXXXXX' --data 'url=abcde.xyz' |
sed -n -e 's/.*"scan_id": *"\([^"]*\)".*/\1/p' > res.txt