I've seen this asked many times here, but this is very different. The other requests for help using IMPORTXML, the op was asking about pulling standard HTML tags like a/href/img/src/ul/li
, etc. What I'm looking to extract is embedded in the style tag, and it isn't extracting as expected.
On this page https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas (and a series of similar pages) I am trying to extract the URL of the logo of various nonprofits.
I highlight the logo in Chrome dev tools and copy the Xpath, or the full Xpath - which does not return the expected logo url using IMPORTXML.
The xapth is //\*\[@id="org-logo"\]/
span
and the full Xpath is /html/body/app-root/mat-sidenav-container/mat-sidenav-content/div\[2\]/app-organization/div/app-org-page/div/div/div\[1\]/app-banner/div/div\[3\]/span
The image is stored in this element:
<span _ngcontent-ng-c332306571="" role="img" applazyload="" class="image set-background ng-star-inserted" aria-label="The Salvation Army North Texaslogo" style="opacity: 0; animation: 0.7s ease-out 0s 1 normal forwards running lazy-load-anim1; background-repeat: no-repeat; background-size: cover; width: 100%; height: 100%; background-position: center center; background-image: url("https://imagecdn.mightycause.com/36be848e-03bb-47c3-8616-6d51b811e38d/-/format/auto/-/progressive/yes/-/stretch/off/-/preview/");"></span>
I can see the image url in that element found in the style section under
background-image:
I've tried both Xapth options in these formulas:
IMPORTXML("https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas", "//\*\[@id='org-logo'\]/span" )
or
IMPORTXML("https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas", "/html/body/app-root/mat-sidenav-container/mat-sidenav-content/div\[2\]/app-organization/div/app-org-page/div/div/div\[1\]/app-banner/div/div\[3\]/span" )
These are not working. Can you guys help me figure this out?
When I saw the HTML of your URL, I noticed that the URL https://imagecdn.mightycause.com/36be848e-03bb-47c3-8616-6d51b811e38d/-/format/auto/-/progressive/yes/-/stretch/off/-/preview/
is created by Javascript. In this case, unfortunately, IMPORTXML cannot be directly used. But, fortunately, I found the image URL from JSON data in the HTML as https://imagecdn.mightycause.com/36be848e-03bb-47c3-8616-6d51b811e38d/
. It seems that this image is the same as your expected URL. So, in this answer, I would like to propose to retrieve the URL.
In this case, it is required to use Google Apps Script. So, please open the script editor of your Spreadsheet, copy and paste the following script, and save the script.
When you use this script, please put a custom function =SAMPLE("https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas")
into a cell. By this, the image URL is returned.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url);
const str = res.getContentText().match(/<script id\="ng-state" type\="application\/json">(.*?)<\/script>/);
const obj = JSON.parse(str[1]);
const ar = Object.values(obj).flatMap(v => v?.body?.org_info?.circle_pic || []);
return ar.length > 0 ? ar[0] : "";
}
When this script is tested, the following result is obtained. In this sample, =SAMPLE("https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas")
is put into cell "A1". And, =IMAGE(A1)
is put into cell "B1".
This sample script is for your URL of https://www.northtexasgivingday.org/organization/Salvationarmynorthtexas
. And, I confirmed that this script worked now. But, when you change the URL, this script might not be able to be used. Also, when the specification of the site is changed, this script might not be able to be used. Please be careful about this.
If you are required to retrieve the URL like https://imagecdn.mightycause.com/36be848e-03bb-47c3-8616-6d51b811e38d/-/format/auto/-/progressive/yes/-/stretch/off/-/preview/
, please test the following script.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url);
const str = res.getContentText().match(/<script id\="ng-state" type\="application\/json">(.*?)<\/script>/);
const obj = JSON.parse(str[1]);
const ar = Object.values(obj).flatMap(v => v?.body?.org_info?.circle_pic || []);
if (ar.length > 0) {
return `https://imagecdn.mightycause.com/${ar[0].split("/")[3]}/-/format/auto/-/progressive/yes/-/stretch/off/-/preview/`;
}
return "";
}