pythonhtmlbeautifulsoupstrip

Removing blank spaces and lines is not working for me


I am unable to remove the tab spaces from the web data, which I want to enter into an excel sheet.

import requests as r
from bs4 import BeautifulSoup


url='https://www.screener.in/screens/41109/all-stocks/?limit=100&page=1'

response = r.get(url)

soup=BeautifulSoup(response.text, 'html.parser')
table=soup.find_all('table')
table_len=soup.find_all(len('table'))

scrnr_table=soup.find_all('th')
header_tags=[header.text.strip('\n') for header in scrnr_table]
data_rows = soup.find_all('tr')
row_values = [dr.text.strip() for dr in data_rows]


for header in header_tags:
        h_values=header.strip('\n')
        print(header)

Answer :

  S.No.
Name
                    CMP
                    Rs.
                    P/E
                    
                    Mar Cap
                    Rs.Cr.
                    Div Yld
                    %
                    NP Qtr
                    Rs.Cr.
                    Qtr Profit Var
                    %
                    Sales Qtr
                    Rs.Cr.
                    Qtr Sales Var
                    %
                    ROCE
                    %
S.No.
Name
                    CMP
                    Rs.
                    P/E
                    
                    Mar Cap
                    Rs.Cr.
                    Div Yld
                    %
                    NP Qtr
                    Rs.Cr.
                    Qtr Profit Var
                    %
                    Sales Qtr
                    Rs.Cr.
                    Qtr Sales Var
                    %
                    ROCE
                    %

The spaces that are populating doesn't seems to be affected by the str.strip('\n') or str.strip('\t') method.

Kindly help me out on this.


Solution

  • Use bs4.BeautifulSoup.get_text method to control the symbol of separator and the strip flag. Here how to edit the corresponding line:

    header_tags = [header.get_text('\n', strip=True) for header in scrnr_table]
    

    Here the first few results of the output

    S.No.
    Name
    CMP
    Rs.
    P/E
    Mar Cap
    Rs.Cr.
    Div Yld
    %
    NP Qtr
    #[...]
    

    Notice that no empty line is present.