pythonpandasweb-scrapingbeautifulsoup

How Can I Scrape Event Links and Contact Information from a Website with Python?


I am trying to scrape event links and contact information from the RaceRoster website (https://raceroster.com/search?q=5k&t=upcoming) using Python, requests, Pandas, and BeautifulSoup. The goal is to extract the Event Name, Event URL, Contact Name, and Email Address for each event and save the data into an Excel file so we can reach out to these events for business development purposes.

However, the script consistently reports that no event links are found on the search results page, despite the links being visible when inspecting the HTML in the browser. Here’s the relevant HTML for the event links from the search results page:

<a href="https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k" 
   target="_blank" 
   rel="noopener noreferrer" 
   class="search-results__card-event-name">
    13th Annual Delaware Tech Chocolate Run 5k
</a>

Steps Taken:

  1. Verified the correct selector for event links:
soup.select("a.search-results__card-event-name")
  1. Checked the response content from the requests.get() call using soup.prettify(). The HTML appears to lack the event links that are visible in the browser, suggesting the content may be loaded dynamically via JavaScript.

  2. Attempted to scrape the data using BeautifulSoup but consistently get:

Found 0 events on the page.
Scraped 0 events.
No contacts were scraped.

What I Need Help With:

Current Script:

import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_event_contacts(base_url, search_url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    event_contacts = []

    # Fetch the main search page
    print(f"Scraping page: {search_url}")
    response = requests.get(search_url, headers=headers)

    if response.status_code != 200:
        print(f"Failed to fetch page: {search_url}, status code: {response.status_code}")
        return event_contacts

    soup = BeautifulSoup(response.content, "html.parser")
    # Select event links
    event_links = soup.select("a.search-results__card-event-name")


    print(f"Found {len(event_links)} events on the page.")

    for link in event_links:
        event_url = link['href']
        event_name = link.text.strip()  # Extract Event Name

        try:
            print(f"Scraping event: {event_url}")
            event_response = requests.get(event_url, headers=headers)
            if event_response.status_code != 200:
                print(f"Failed to fetch event page: {event_url}, status code: {event_response.status_code}")
                continue

            event_soup = BeautifulSoup(event_response.content, "html.parser")

            # Extract contact name and email
            contact_name = event_soup.find("dd", class_="event-details__contact-list-definition")
            email = event_soup.find("a", href=lambda href: href and "mailto:" in href)

            contact_name_text = contact_name.text.strip() if contact_name else "N/A"
            email_address = email['href'].split("mailto:")[1].split("?")[0] if email else "N/A"

            if contact_name or email:
                print(f"Found contact: {contact_name_text}, email: {email_address}")
                event_contacts.append({
                    "Event Name": event_name,
                    "Event URL": event_url,
                    "Event Contact": contact_name_text,
                    "Email": email_address
                })
            else:
                print(f"No contact information found for {event_url}")
        except Exception as e:
            print(f"Error scraping event {event_url}: {e}")

    print(f"Scraped {len(event_contacts)} events.")
    return event_contacts

def save_to_spreadsheet(data, output_file):
    if not data:
        print("No data to save.")
        return
    df = pd.DataFrame(data)
    df.to_excel(output_file, index=False)
    print(f"Data saved to {output_file}")

if __name__ == "__main__":
    base_url = "https://raceroster.com"
    search_url = "https://raceroster.com/search?q=5k&t=upcoming"
    output_file = "/Users/my_name/Documents/event_contacts.xlsx"

    contact_data = scrape_event_contacts(base_url, search_url)
    if contact_data:
        save_to_spreadsheet(contact_data, output_file)
    else:
        print("No contacts were scraped.")

Expected Outcome:


Solution

  • Use the API endpoint to get the data on upcoming events.

    Here's how:

    import requests
    from tabulate import tabulate
    import pandas as pd
    
    url = 'https://search.raceroster.com/search?q=5k&t=upcoming'
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    }
    
    events = requests.get(url,headers=headers).json()['data']
    
    loc_keys = ["address", "city", "country"]
    
    table = [
        [
            event["name"],
            event["url"],
            " ".join([event["location"][key] for key in loc_keys if key in event["location"]])
        ] for event in events
    ]
    
    columns = ["Name", "URL", "Location"]
    print(tabulate(table, headers=columns))
    
    df = pd.DataFrame(table, columns=columns)
    df.to_csv('5k_events.csv', index=False, header=True)
    

    This should print:

    Name                                         URL                                                                                         Location
    -------------------------------------------  ------------------------------------------------------------------------------------------  ----------------------------------------------------------------------------------------------------------------------------
    Credit Union Cherry Blossom                  https://raceroster.com/events/2025/72646/credit-union-cherry-blossom                        Washington, D.C. Washington United States
    Big Cork Wine Run 5k                         https://raceroster.com/events/2025/98998/big-cork-wine-run-5k                               Big Cork Vineyards, 4236 Main Street, Rohrersville, MD 21779, U.S. Rohrersville United States
    3rd Annual #OptOutside Black Friday Fun Run  https://raceroster.com/events/2025/98146/3rd-annual-number-optoutside-black-friday-fun-run  Grain H2O, Summit Harbour Place, Bear, DE, USA Bear United States
    Ryan's Race 5K walk Run                      https://raceroster.com/events/2025/97852/ryans-race-5k-walk-run                             Odessa High School, Tony Marchio Drive, Townsend, DE Townsend United States
    13th Annual Delaware  Tech Chocolate Run 5k  https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k         Delaware Technical Community College - Charles L. Terry Jr. Campus - Dover, Campus Drive, Dover, DE, USA Dover United States
    Builders Dash 5k                             https://raceroster.com/events/2025/99146/builders-dash-5k                                   Rail Haus - Beer Garden, North West Street, Dover, DE Dover United States
    The Ivy Scholarship 5k                       https://raceroster.com/events/2025/96874/the-ivy-scholarship-5k                             Hare Pavilion, River Place, Wilmington, DE Wilmington United States
    39th Firecracker 5k Run Walk                 https://raceroster.com/events/2025/96907/39th-firecracker-5k-run-walk                       Rockford Tower, Lookout Drive, Wilmington, DE Wilmington United States
    24th Annual John D Kelly Logan House 5k      https://raceroster.com/events/2025/97364/24th-annual-john-d-kelly-logan-house-5k            Kelly's Logan House, Delaware Avenue, Wilmington, DE, USA Wilmington United States
    2nd Annual Scott Trot 5K                     https://raceroster.com/events/2025/96904/2nd-annual-scott-trot-5k                           American Legion Post 17, American Legion Road, Lewes, DE Lewes United States
    

    Bonus:

    To get more events data, just paginate the API with these parameters: l=10&p=1. For example, https://search.raceroster.com/search?q=5k&l=10&p=1&t=upcoming Also, note there's a field in meta -> hits that holds the number of found events. For your query that's 1465.