pythonhtmlpython-3.xpandasweb-scripting

How to pull the vedio link name imbeded in a web page along with vedio name


I am trying to pull all the video link reference from the web page along with video names, i have tried below code.

#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for line in acc_link.find_all('a'):
    print(line.get('href'))

Output:

https://www.ansible.com/?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/how-ansible-works?hsLang=en-us
https://www.ansible.com/products/automation-platform?hsLang=en-us
https://www.ansible.com/use-cases?hsLang=en-us
https://www.ansible.com/use-cases/provisioning?hsLang=en-us
https://www.ansible.com/use-cases/configuration-management?hsLang=en-us
https://www.ansible.com/use-cases/application-deployment?hsLang=en-us
https://www.ansible.com/use-cases/continuous-delivery?hsLang=en-us
https://www.ansible.com/use-cases/security-automation?hsLang=en-us
https://www.ansible.com/use-cases/orchestration?hsLang=en-us
https://www.ansible.com/integrations?hsLang=en-us

HTML source code for example:

<h4><a href="https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista?hsLang=en-us">Ansible Network Automation with Arista CloudVision and Arista Validated Designs</a></h4>

Like above is just an example for the HTML source code of the link https://www.ansible.com/resources/videos i want link name as https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista and vedio name Ansible Network Automation with Arista CloudVision and Arista Validated Designs .

Below is just another example where i want href before ? and a value ie Scale-out Clustering with Tower 3.1.

<h4><a href="https://www.ansible.com/scale-out-clustering-tower?hsLang=en-us">Scale-out Clustering with Tower 3.1</a></h4>

Desired output:

Vedio Name: Ansible Network Automation with Arista CloudVision and Arista Validated Designs

Vedio Link: https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista

Thanks for the help in advanced.


Solution

  • If you want the href from all anchors then you can use a css select 'a[href]' which will only find anchor tags that have href attributes:

    You indeed tweak a little bit as follows,

    #!/usr/bin/python3
    from bs4 import BeautifulSoup
    import requests
    import urllib
    
    url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
    acc_link = BeautifulSoup(url, features="lxml")
    
    for article in acc_link.find_all('div', class_='card-body'):
            # this will grab the name of the video article
            headline1 = article.h4.a.text
            # this will get your video link
            headline2 = article.select_one('a[href]')['href'].split('?')[0]
            #headline2 = headline2.split('?')[0]
            print(headline1)
            # I have placed the condition as few of the link address do not have
            #  the site link prefix www.ansible.com.
            if 'www' in headline2:
                print(headline2)
            else:
                print('https://www.ansible.com' + headline2)
            print()
    

    Result:

    Automating Monitoring with the Sensu Go Ansible Collection
    https://www.ansible.com/resources/webinars-training/automating-monitoring-with-the-sensu-go-ansible-collection
    
    How to load balance a hybrid cloud using Red Hat Insights,  Red Hat Ansible, and Red Hat AMQ Interconnect
    https://www.redhat.com/en/about/videos/road-to-open-hybrid-cloud-part-2
    
    British Army speeds service delivery with Red Hat
    https://www.redhat.com/en/about/videos/british-army-speeds-service-delivery-red-hat
    
    Zero To 100 - Rapid deployment with Ansible Tower
    https://www.ansible.com/zero-to-100
    
    Scale-out Clustering with Tower 3.1
    https://www.ansible.com/scale-out-clustering-tower
    
    What's New In Tower 3.1
    https://www.ansible.com/whats-new-tower-3-1
    
    Amelco - Continuous Delivery with Ansible Tower
    https://www.ansible.com/success-stories/amelco
    
    Runnable - Getting Started with Ansible
    https://www.ansible.com/success-stories/runnable
    
    Fatmap - App Deployment with Ansible
    https://www.ansible.com/success-stories/fatmap
    
    Splunk and Ansible Tower
    https://www.ansible.com/success-stories/splunk
    
    Siemens - Delivering Automation to the Cloud
    https://www.ansible.com/success-stories/siemens
    
    Ansible Tower 10 min demo
    https://www.ansible.com/products/tower/demo
    
    Ansible Tower 3.1
    https://www.ansible.com/tower-workflows-demo
    
    Ansible Tower 2-min Overview
    https://www.ansible.com/tower-overview
    
    Ansible Quick Start
    https://www.ansible.com/resources/videos/quick-start-video
    
    Ansible + AWS - Serverless Deploys
    https://www.ansible.com/resources/videos/ansible-aws-automate-serverless-application-deploys-with-ansible
    
    Ansible + AWS - EC2 Provisionling
    https://www.ansible.com/resources/videos/ansible-aws-automate-ec2-provisioning-with-red-hat-ansible-engine-and-red-hat-ansible-tower
    
    Network Automation For Beginners
    https://www.ansible.com/resources/videos/network-automation-with-red-hat-ansible-engine-for-beginners
    
    Agnostic Network Automation Examples with Ansible and Juniper NRE Labs
    https://www.ansible.com/blog/agnostic-network-automation-examples-with-ansible-and-juniper-nre-labs
    
     How useful is Ansible in a cloud-native Kubernetes environment
    https://www.ansible.com/blog/how-useful-is-ansible-in-a-cloud-native-kubernetes-environment
    

    I hope this will helpful.