databrickskaggle

Import dataset from Kaggle to Databricks with Kaggle's API


Here trying to import a dataset from **Kaggle **to **DataBricks **(community) with their Kaggle' API, but I'm falling and lost 3 days. Please a kind soul can help me.

Trying 1:

!pip install kaggle

import os
import kaggle

os.environ['KAGGLE_USERNAME'] = 'xxxxxx'
os.environ['KAGGLE_KEY'] = 'xxxxx'

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020')

Trying 1 error:

Error trying 1

Trying 2:

import os
import kaggle

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)

api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)

Trying 2 error:

Error trying 2

My kaggle.json credential in Databricks:

kaggle credential

I try two types of connections but its missing something or my credentials are wrong because the error is:

"Reason: Unauthorized".


Solution

  • Try the following:

    import os
    import kaggle
    
    from kaggle.api.kaggle_api_extended import KaggleApi
    api = KaggleApi()
    api_token_path = '/FileStore/tables/Kaggle_token/kaggle-2.json'
    os.environ['KAGGLE_CONFIG_DIR'] = os.path.dirname(api_token_path)
    api.authenticate()
    
    api.dataset_download_files('taricov/mobile-wallets-in-egypt-2020', path='/FileStore/mobile-wallets-in-egypt-2020',unzip=True)
    
    

    It seems that in your first snippet there is missing the environment variables and in the second one there is no api.authenticate() being called. Inside this method there is the read_config_environment method that is responsible to get those keys.

    Authenticate method in the api

    enter image description here