pythonhtmlweb-scrapingtext-extractioninformation-extraction

Need help extracting date from text in Python


I have data that comes in every day via python code as such:

id="ContentPlaceHolder1_cph_main_cph_main_SummaryGrid">\r\n\t\t<tr class="tr-header">\r\n\t\t\t<th scope="col">&nbsp;</th><th class="right-align" scope="col">Share<br>Price</th><th class="right-align" scope="col">NAV</th><th class="right-align" scope="col">Premium/<br>Discount</th>\r\n\t\t</tr><tr>\r\n\t\t\t<td>Current</td><td class="right-align">$19.14</td><td class="right-align">$21.82</td><td class="right-align">-12.28%</td>\r\n\t\t</tr>

I need to extract the 2 prices and percentage values, in this example the "$19.14" "$21.82" and "-12.28%", but I am having trouble figuring out how to parse through and pull, is there a way to do this by looping through and searching for the text before/after?

The text before and after is always the same but the date changes. If not possible by this method, is there another way? Thank you very much!


Solution

  • Here is the desired output:

    from bs4 import BeautifulSoup
    
    markup = """
    <div class="row-fluid">
     <div class="span6">
      <p class="as-of-date">
       <span id="ContentPlaceHolder1_cph_main_cph_main_AsOfLabel">
        As of 9/24/2021
       </span>
      </p>
      <div class="table-wrapper">
       <div>
        &lt;table class="cefconnect-table-1 table table-striped" cellspacing="0" cellpadding="5" 
    Border="0
       </div>
      </div>
     </div>
    </div>
    
    """
    
    soup = BeautifulSoup(markup, 'html.parser')
    #print(soup.prettify())
    
    tags= soup.select_one('#ContentPlaceHolder1_cph_main_cph_main_AsOfLabel').get_text()
    print(tags.replace('As of ', ' '))
    

    Output:

    9/24/2021