I have output of API request in given below. From each atom:entry I need to extract
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-17T00:00:00Z</c:series-order>
<f:assessment-low precision="0">980</f:assessment-low>
I tried to extract them to different list with BeautifulSoup, but that wasn't successful because in some entries there are dates but there isn't price (I've shown example below). How could I conditionally extract it? at least put N/A for entries where price is ommited.
soup = BeautifulSoup(request.text, "html.parser")
date = soup.find_all('c:series-order')
value = soup.find_all('f:assessment-low')
quot=soup.find_all('c:series')
p_day = []
p_val = []
q_val=[]
for i in date:
p_day.append(i.text)
for j in value:
p_val.append(j.text)
for j in quot:
q_val.append(j.get('href'))
d2={'date': p_day,
'price': p_val,
'quote': q_val
}
and
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:a="http://company.com/ns/assets" xmlns:c="http://company.com/ns/core" xmlns:f="http://company.com/ns/fields" xmlns:s="http://company.com/ns/search">
<atom:id>http://company.com/search</atom:id>
<atom:title> COMPANYSearch Results</atom:title>
<atom:updated>2022-11-24T19:36:19.104414Z</atom:updated>
<atom:author>COMPANY atom:author>
<atom:generator> COMPANY/search Endpoint</atom:generator>
<atom:link href="/search" rel="self" type="application/atom"/>
<s:first-result>1</s:first-result>
<s:max-results>15500</s:max-results>
<s:selected-count>212</s:selected-count>
<s:returned-count>212</s:returned-count>
<s:query-time>PT0.036179S</s:query-time>
<s:request version="1.0">
<s:scope>
<s:series>http://company.com/series/product/123</s:series>
</s:scope>
<s:constraints>
<s:compare field="c:series-order" op="ge" value="2018-10-01"/>
<s:compare field="c:series-order" op="le" value="2022-11-18"/>
</s:constraints>
<s:options>
<s:first-result>1</s:first-result>
<s:max-results>15500</s:max-results>
<s:order-by key="commodity-name" direction="ascending" xml:lang="en"/>
<s:no-currency-rate-scheme>no-element</s:no-currency-rate-scheme>
<s:precision>embed</s:precision>
<s:include-last-commit-time>false</s:include-last-commit-time>
<s:include-result-types>live</s:include-result-types>
<s:relevance-score algorithm="score-logtfidf"/>
<s:lang-data-missing-scheme>show-available-language-content</s:lang-data-missing-scheme>
</s:options>
</s:request>
<s:facets/>
<atom:entry>
<atom:title>http://company.com/series-item/product/123-pricehistory-20200917000000</atom:title>
<atom:id>http://company.com/series-item/product/123-pricehistory-20200917000000</atom:id>
<atom:updated>2020-09-17T17:09:43.55243Z</atom:updated>
<atom:relevance-score>60800</atom:relevance-score>
<atom:content type="application/vnd.icis.iddn.entity+xml"><a:price-range>
<c:id>http://company.com/series-item/product/123-pricehistory-20200917000000</c:id>
<c:version>1</c:version>
<c:type>series-item</c:type>
<c:created-on>2020-09-17T17:09:43.55243Z</c:created-on>
<c:descriptor href="http://company.com/descriptor/price-range"/>
<c:domain href="http://company.com/domain/product"/>
<c:released-on>2020-09-17T21:30:00Z</c:released-on>
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-17T00:00:00Z</c:series-order>
<f:assessment-low precision="0">980</f:assessment-low>
<f:assessment-high precision="0">1020</f:assessment-high>
<f:mid precision="1">1000</f:mid>
<f:assessment-low-delta>0</f:assessment-low-delta>
<f:assessment-high-delta>+20</f:assessment-high-delta>
<f:delta-type href="http://company.com/ref-data/delta-type/regular"/>
</a:price-range></atom:content>
</atom:entry>
<atom:entry>
<atom:title>http://company.com/series-item/product/123-pricehistory-20200910000000</atom:title>
<atom:id>http://company.com/series-item/product/123-pricehistory-20200910000000</atom:id>
<atom:updated>2020-09-10T18:57:55.128308Z</atom:updated>
<atom:relevance-score>60800</atom:relevance-score>
<atom:content type="application/vnd.icis.iddn.entity+xml"><a:price-range>
<c:id>http://company.com/series-item/product/123-pricehistory-20200910000000</c:id>
<c:version>1</c:version>
<c:type>series-item</c:type>
<c:created-on>2020-09-10T18:57:55.128308Z</c:created-on>
<c:descriptor href="http://company.com/descriptor/price-range"/>
<c:domain href="http://company.com/domain/product"/>
<c:released-on>2020-09-10T21:30:00Z</c:released-on>
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-10T00:00:00Z</c:series-order>
for example here is no price
<f:delta-type href="http://company.com/ref-data/delta-type/regular"/>
</a:price-range></atom:content>
</atom:entry>
May try to iterate per entry, use xml
parser to get a propper result and check if element exists:
soup = BeautifulSoup(request.text,'xml')
data = []
for i in soup.select('entry'):
data.append({
'date':i.find('series-order').text,
'value': i.find('assessment-low').text if i.find('assessment-low') else None,
'quot': i.find('series').get('href')
})
data
or with html.parser
:
soup = BeautifulSoup(xml,'html.parser')
data = []
for i in soup.find_all('atom:entry'):
data.append({
'date':i.find('c:series-order').text,
'value': i.find('f:assessment-low').text if i.find('assessment-low') else None,
'quot': i.find('c:series').get('href')
})
data
Output:
[{'date': '2020-09-17T00:00:00Z',
'value': '980',
'quot': 'http://company.com/series/product/123'},
{'date': '2020-09-10T00:00:00Z',
'value': None,
'quot': 'http://company.com/series/product/123'}]