jq

How to extract a list of URLs from JSON using jq


Microsoft Clarity returns JSON. I am able to extract the following portion of JSON using JQ:

$ echo $JSON | jq '.[] | select(.metricName == "PopularPages") | .[]'
"PopularPages"
[
  {
    "url": "https://www.mslinn.com/blog/2023/09/14/boost.html",
    "visitsCount": "8"
  },
  {
    "url": "https://www.mslinn.com/blog/2021/02/11/javascript-named-arguments.html",
    "visitsCount": "8"
  },
  {
    "url": "https://www.mslinn.com/blog/2023/08/20/pytest.html",
    "visitsCount": "7"
  },
  {
    "url": "https://www.mslinn.com/blog/2022/01/10/wsl-backup.html",
    "visitsCount": "7"
  },
  {
    "url": "https://www.mslinn.com/av_studio/640-music21.html",
    "visitsCount": "5"
  },
  {
    "url": "https://www.mslinn.com/llm/4000-wsl.html",
    "visitsCount": "4"
  },
  {
    "url": "https://www.mslinn.com/blog/2021/04/28/buildah-podman.html",
    "visitsCount": "4"
  },
  {
    "url": "https://www.mslinn.com/blog/2020/10/27/installing-a-new-ssh-key-on-awc-ec2-with-user-data.html",
    "visitsCount": "3"
  },
  {
    "url": "https://www.mslinn.com/av_studio/570-ableton-push-standalone.html",
    "visitsCount": "3"
  },
  {
    "url": "https://www.mslinn.com/git/600-partial-clone.html",
    "visitsCount": "3"
  }
]

However, I cannot figure out how to reduce the above JSON to a list of url values, one per line, like this:

https://www.mslinn.com/blog/2023/09/14/boost.html
https://www.mslinn.com/blog/2021/02/11/javascript-named-arguments.html
https://www.mslinn.com/blog/2023/08/20/pytest.html
https://www.mslinn.com/blog/2022/01/10/wsl-backup.html
https://www.mslinn.com/av_studio/640-music21.html
https://www.mslinn.com/llm/4000-wsl.html
https://www.mslinn.com/blog/2021/04/28/buildah-podman.html
https://www.mslinn.com/blog/2020/10/27/installing-a-new-ssh-key-on-awc-ec2-with-user-data.html
https://www.mslinn.com/av_studio/570-ableton-push-standalone.html
https://www.mslinn.com/git/600-partial-clone.html

What incantation might yield the desired result?

The JSON comes back in a structure like this:

[
    ...
    {
        "metricName": "PopularPages",
        "information": [
            {
                "url": "https://www.mslinn.com/blog/2023/09/14/boost.html",
                "visitsCount": "8"
            },
            {
                "url": "https://www.mslinn.com/blog/2021/02/11/javascript-named-arguments.html",
                "visitsCount": "8"
            },
            {
                "url": "https://www.mslinn.com/blog/2023/08/20/pytest.html",
                "visitsCount": "7"
            },
            {
                "url": "https://www.mslinn.com/blog/2022/01/10/wsl-backup.html",
                "visitsCount": "7"
            },
            {
                "url": "https://www.mslinn.com/av_studio/640-music21.html",
                "visitsCount": "5"
            },
            {
                "url": "https://www.mslinn.com/llm/4000-wsl.html",
                "visitsCount": "4"
            },
            {
                "url": "https://www.mslinn.com/blog/2021/04/28/buildah-podman.html",
                "visitsCount": "4"
            },
            {
                "url": "https://www.mslinn.com/blog/2020/10/27/installing-a-new-ssh-key-on-awc-ec2-with-user-data.html",
                "visitsCount": "3"
            },
            {
                "url": "https://www.mslinn.com/av_studio/570-ableton-push-standalone.html",
                "visitsCount": "3"
            },
            {
                "url": "https://www.mslinn.com/git/600-partial-clone.html",
                "visitsCount": "3"
            }
        ]
    }
]

Solution

  • You have selected the object with this part.

    .[] | select(.metricName == "PopularPages")
    

    Now you need to drill down to the urls on that object

    .[] | select(.metricName == "PopularPages").information[].url