pythonweb-scrapingcss-selectorslxml

Parsing meta tags efficiently with lxml?


I'm parsing HTML pages with lxml. The pages have meta tags as follows:

<meta property="og:locality" content="Detroit" />
<meta property="og:country-name" content="USA" />

How can I use lxml to find the value of the og:locality meta tag on each page, efficiently?

I've currently got the following, which just manually matches up meta tags by property:

for meta in doc3.cssselect('meta'):
    prop = meta.get('property')
    if prop === 'og:locality':
        lat = meta.get('content')

But it doesn't feel very efficient.


Solution

  • I think lxml supports most CSS selectors, so you could use an attribute selector:

    doc3.cssselect('meta[property="og:locality"]')[0].get('content')