bashweb-scraping

Extract two values from website source code


I'm trying to extract, coding a bash script, two different values "vendor" and "product", from a CVEdetails source code, and store each one in one bash variable. This is vendor=$(requested code) and product=$(requested code).

The code snippet that contais the information I need is:

                    <tr>
                        <th>
                            Vendor
                        </th>
                        <th>
                            Product
                        </th>
                        <th>
                            Vulnerable Versions
                        </th>
                    </tr>
                                                <tr>
                                <td>
                                    <a href="/vendor/45/Apache.html" title="Details for Apache">Apache</a>                              </td>
                                <td><a href="/product/66/Apache-Http-Server.html?vendor_id=45" title="Product Details Apache Http Server">Http Server</a></td>
                                <td class="num">
                                     34                             </td>
                            </tr>

                                        </table>

With this, the information I need is Vendor=Apache and Product=HTTP Server, but the closest code I was able to do by myself is:

wget https://www.cvedetails.com/cve/CVE-2017-3169 &>/dev/null; grep -C 6 "Vulnerable Versions" CVE-2017-3169

Any idea about how to get such info? Thanks in advance!


Solution

  • See an example of how it is simple, when using an API and an appropriate parser:

    #!/usr/bin/env bash
    
    API_URL='https://cve.circl.lu/api'
    
    cve_id='CVE-2017-3169'
    
    # Read parsed JSON data
    IFS=: read -r _ _ _ vendor product _ < <(
      # Perform API request
      curl -s "$API_URL/cve/$cve_id" |
    
      # Parse JSON data returned by the API to get only what we need
      jq -r '.vulnerable_product[0]'
    )
    
    # Demo what we got
    printf 'CVE ID: %s\n' "$cve_id"
    printf 'Vendor: %s\n' "${vendor^}"
    printf 'Product: %s\n' "${product}"
    

    Sample output:

    CVE ID: CVE-2017-3169
    Vendor: Apache
    Product: http_server