pythonurlbeautifulsouphtml-parsingword-count

Word count in python


I want to calculate the word count of the text taken from the website. I am trying the following code below:

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen

def get_text(url):
  page = urlopen(url)
  soup = BeautifulSoup(page, "lxml")
  text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
  return soup.title.text, text

number_of_words = 0

url = input('Enter URL - ')
text = get_text(url)

I want to calculate the word count for this text variable

Taking https://www.ibm.com/in-en/cloud/learn/what-is-artificial-intelligence as the URL, everything works well, except for getting the word count of text variable.

P.S. - The word_count count variable entered as a parameter, and the word count of the summary generated differs.

Also I have managed to get the text character length of original text retrieved from URL using the following code

print('Text character length - ', len(str(text)))

Solution

  • len(str(text)) will count letters not words, to count total words you will have to split the text len(str(text).split()):

    import requests
    from bs4 import BeautifulSoup
    from urllib.request import urlopen
    
    
    def get_text(url):
        page = urlopen(url)
        soup = BeautifulSoup(page, "lxml")
        text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
        return soup.title.text, text
    
    
    url = input('Enter URL - ')
    
    text = get_text(url)
    number_of_words = len(str(text).split())
    print(number_of_words)
    

    output:

    1080