I am trying to write an Xpath to extract URLs used in both @href or @src attributes that are relative (URLs that don't start with http:// or https://).
I have used the below but it's not working:
//*[not(starts-with(@src, 'https:')) and not(starts-with(@href, 'https:'))]
Example node:
<script async="" src="//d.impactradius-event.com/A2421746-f56c-44ad-9e09-bcf28112e9951.js"></script>
I wish to pull src URL. Can someone please help? Thanks.
You can try the following XPath-1.0 expression. It checks both attributes for both strings and then merges the output with the |
operator.
//*[not(starts-with(@src, 'https:')) and not(starts-with(@src, 'http:'))]/@src | //*[not(starts-with(@href, 'https:')) and not(starts-with(@href, 'http:'))]/@href
This expression could be simplified with RegEx'es, but XPath-1.0 doesn't support this.