pythonibm-cloudwatson-assistant

Watson Assistant: How can I check all the URLs referenced still work?


I have a large skill which has URL references in the context variables and responses to the end user.

I would like to be able to check all these URLs and see if they still work. So that if one fails we can fix it as quickly as possible. Is there a way to do this?


Solution

  • The following code snippet will do what is mentioned above. You need to change SKILL_FILE_NAME_HERE with the downloaded json file of the Skill.

    It should work with dialog and action based skills.

    import re
    import requests
    from requests.exceptions import ConnectionError
    import pandas as pd
    from tqdm import tqdm
    
    file_name = 'SKILL_FILE_NAME_HERE'
    
    with open(file_name, 'r') as file:
        data = file.read()
    
    urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', data)
    
    records = []
    print('Checking URLS')
    for url in tqdm(urls):
        try:
            response = requests.get(url)
            status_code = response.status_code
        except ConnectionError as e:
            status_code = 'Error'
    
        records.append({
            'url': url,
            'status': status_code
        })
    
    df = pd.DataFrame(records)
    
    df.to_csv(f'{file_name.replace(".json", ".csv")}', index=False)
    

    It does the following: