google-sheetsweb-scrapinggoogle-sheets-formula

Using IMPORTXML to scrape dropdown data from webpage


I am not well versed in using advanced datascraping and coding. However, there is a webpage ("https://www.ksestocks.com/HistoryHighLow") which carries a dropdown menu with various options. I can extract the full list of option values using the following on Google Sheets:

=IMPORTXML("https://www.ksestocks.com/HistoryHighLow", "//select[@id='selscrip']/option/@value")

However, I am unable to scrape the table data for each option in Google Sheets. For example, I need data for the following xpath for each option value:

 /html/body/div[2]/div/table/tbody/tr/td/div[2]/table/tbody/tr[3]/td/table/tbody/tr[3]/td[2]

Can someone help me scrape data into Google Sheets for specific dropdown option values?

I have tried to use the following to generate URLs for each dropdown option value:

 ="https://www.ksestocks.com/HistoryHighLow?selscrip=" & A2

where column A has the values from the following: =IMPORTXML("https://www.ksestocks.com/HistoryHighLow", "//select[@id='selscrip']/option/@value")

For certain webpages, this sort of thing helps you get the data for each option value but I think this might only be when there are separate webpages for each option value. I do not think that is the case with the above situation.


Solution

  • When I saw the URL and your expected table, I noticed that the table was retrieved using multipart/form-data with the POST method. Unfortunately, in the current stage, MPORTXML can use only the GET method. So, in this case, it is required to use Google Apps Script.

    When this is reflected in Google Apps Script, it becomes as follows.

    Sample script:

    Please copy and paste the following script into the script editor of Google Spreadsheet and save the script. When you use this script, please put a custom function =sample("Abbot Laboatories (Pakistan) Limited") into a cell. By this, the script is run and the expected table is returned. Here, please input the title of the dropdown list correctly. It's like Abbot Laboatories (Pakistan) Limited. Please be careful about this.

    function sample(select, url = "https://www.ksestocks.com/HistoryHighLow") {
      const res1 = UrlFetchApp.fetch(url);
      const options = [...res1.getContentText().matchAll(/<option.*?>/g)];
      if (options.length == 0) {
        throw new Error(`No value of "${select}".`);
      }
      const option = options.find(([e]) => e.includes(select));
      if (!option) {
        throw new Error(`No value of "${select}".`);
      }
      const value = option[0].match(/value\="(.*?)"/)[1].trim();
      const res2 = UrlFetchApp.fetch(url, {
        payload: { selscrip: Utilities.newBlob(value) },
      });
      const tables = [...res2.getContentText().matchAll(/<table[\w\s\S]*?<\/table>/g)];
      if (tables.length == 0) {
        throw new Error(`No value of "${select}".`);
      }
      const table = tables.pop();
      const [, ...data] = XmlService.parse(table[0]).getRootElement().getChildren("tr").map(r => r.getChildren("td").map(c => c.getValue()));
      return data;
    }
    

    Testing:

    When this is used, the following results are obtained.

    =sample("Abbot Laboatories (Pakistan) Limited")

    enter image description here

    =sample("Abdullah Shah Ghazi Sugar Mills Limited")

    enter image description here

    Note:

    If you want to directly put the table values without using the custom function, you can also use the following script. When you use this script, please copy and paste the following script into the script editor and enable Sheets API at Advanced Google services. And, set the sheet name and the value of the dropdown list. Please directly run the function sample2 with the script editor. By this, the table is put into the sheet.

    function sample2() {
      const sheetName = "Sheet1"; // Please set your sheet name.
      const select = "Abbot Laboatories (Pakistan) Limited"; // Please set the value of dropdown list.
      const url = "https://www.ksestocks.com/HistoryHighLow";
    
      const res1 = UrlFetchApp.fetch(url);
      const options = [...res1.getContentText().matchAll(/<option.*?>/g)];
      if (options.length == 0) {
        throw new Error(`No value of "${select}".`);
      }
      const option = options.find(([e]) => e.includes(select));
      if (!option) {
        throw new Error(`No value of "${select}".`);
      }
      const value = option[0].match(/value\="(.*?)"/)[1].trim();
      const res2 = UrlFetchApp.fetch(url, {
        payload: { selscrip: Utilities.newBlob(value) },
      });
      const tables = [...res2.getContentText().matchAll(/<table[\w\s\S]*?<\/table>/g)];
      if (tables.length == 0) {
        throw new Error(`No value of "${select}".`);
      }
      const table = tables.pop();
      const ss = SpreadsheetApp.getActiveSpreadsheet();
      const requests = { requests: [{ pasteData: { html: true, data: table[0], coordinate: { sheetId: ss.getSheetByName(sheetName).getSheetId() } } }] };
      Sheets.Spreadsheets.batchUpdate(requests, ss.getId());
    }
    

    I can confirm that this approach can be used for the current site (March 31, 2025). But, if the specification for retrieving the table data is changed in the future, this approach might not be able to be used. Please be careful about this.

    References: