I am unable to remove the tab spaces from the web data, which I want to enter into an excel sheet.
import requests as r
from bs4 import BeautifulSoup
url='https://www.screener.in/screens/41109/all-stocks/?limit=100&page=1'
response = r.get(url)
soup=BeautifulSoup(response.text, 'html.parser')
table=soup.find_all('table')
table_len=soup.find_all(len('table'))
scrnr_table=soup.find_all('th')
header_tags=[header.text.strip('\n') for header in scrnr_table]
data_rows = soup.find_all('tr')
row_values = [dr.text.strip() for dr in data_rows]
for header in header_tags:
h_values=header.strip('\n')
print(header)
Answer :
S.No.
Name
CMP
Rs.
P/E
Mar Cap
Rs.Cr.
Div Yld
%
NP Qtr
Rs.Cr.
Qtr Profit Var
%
Sales Qtr
Rs.Cr.
Qtr Sales Var
%
ROCE
%
S.No.
Name
CMP
Rs.
P/E
Mar Cap
Rs.Cr.
Div Yld
%
NP Qtr
Rs.Cr.
Qtr Profit Var
%
Sales Qtr
Rs.Cr.
Qtr Sales Var
%
ROCE
%
The spaces that are populating doesn't seems to be affected by the str.strip('\n')
or str.strip('\t')
method.
Kindly help me out on this.
Use bs4.BeautifulSoup.get_text
method to control the symbol of separator and the strip flag. Here how to edit the corresponding line:
header_tags = [header.get_text('\n', strip=True) for header in scrnr_table]
Here the first few results of the output
S.No.
Name
CMP
Rs.
P/E
Mar Cap
Rs.Cr.
Div Yld
%
NP Qtr
#[...]
Notice that no empty line is present.