Configuring Kaggle Python API and downloading data from a competition
The Ubiquitous Titanic Competition makes an appearance
Opening Spiel
Many an article has been written using datasets from Kaggle and many a fledgling data scientist has used the Titanic datasets as a first project.
I will show you how to configure the Kaggle public API in Python and download these datasets (from the comfort of your IDE/command line)
Installing the Kaggle API
Simply install through PIP as follows:
pip install kaggle
Authentication
You will need to generate an API token through your Kaggle account.
Log in to Kaggle and click on account (in the top right menu)
Navigate to the ‘API’ section and click ‘Create New API Token’
This will download a file called ‘kaggle.json’, depending on your operating system, you’ll need to put this is one of 2 places.
Windows
For windows, you’ll need to go to C:\Users\<username> and create a folder called .kaggle (the dot is important) and copy the file into this new directory.
Linux
For Linux you’ll need to add a new directory in /home/<username> called .kaggle and copy the file into the new directory.
Listing Competitions
You can list all competitions using the following code in Python
import kaggleapi = kaggle.apiapi.competitions_list()Out[8]:
[contradictory-my-dear-watson,
gan-getting-started,
store-sales-time-series-forecasting,
tpu-getting-started,
digit-recognizer,
titanic,
house-prices-advanced-regression-techniques,
connectx,
nlp-getting-started,
spaceship-titanic,
otto-recommender-system,
nfl-big-data-bowl-2023,
g2net-detecting-continuous-gravitational-waves,
novozymes-enzyme-stability-prediction,
competitive-data-science-predict-future-sales,
dfl-bundesliga-data-shootout,
scrabble-player-rating,
lux-ai-2022-beta,
tabular-playground-series-nov-2022,
feedback-prize-english-language-learning]
This will print out a list of competitions (this is done in pages and the default page is 1)
You can also search for competitions using a search term:
api.competitions_list(search='titanic')Out[10]: [spaceship-titanic, titanic]
Listing Datasets
You may want to have a look at the datasets within a competition before downloading.
api.competitions_data_list_files('titanic')Out[11]:
[{'nameNullable': 'test.csv',
'descriptionNullable': 'test data to check the accuracy of the model created\n',
'urlNullable': 'https://www.kaggle.com/',
'ref': 'test.csv',
'name': 'test.csv',
'hasName': True,
'description': 'test data to check the accuracy of the model created\n',
'hasDescription': True,
'totalBytes': 28629,
'url': 'https://www.kaggle.com/',
'hasUrl': True,
'creationDate': '2018-04-09T05:33:22.3963227Z'},
{'nameNullable': 'train.csv',
'descriptionNullable': 'contains data ',
'urlNullable': 'https://www.kaggle.com/',
'ref': 'train.csv',
'name': 'train.csv',
'hasName': True,
'description': 'contains data ',
'hasDescription': True,
'totalBytes': 61194,
'url': 'https://www.kaggle.com/',
'hasUrl': True,
'creationDate': '2018-04-09T05:33:22.3963227Z'},
{'nameNullable': 'gender_submission.csv',
'descriptionNullable': 'An example of what a submission file should look like. \n\n*These predictions assume only female passengers survive.*',
'urlNullable': 'https://www.kaggle.com/',
'ref': 'gender_submission.csv',
'name': 'gender_submission.csv',
'hasName': True,
'description': 'An example of what a submission file should look like. \n\n*These predictions assume only female passengers survive.*',
'hasDescription': True,
'totalBytes': 3258,
'url': 'https://www.kaggle.com/',
'hasUrl': True,
'creationDate': '2018-04-09T05:33:22.3963227Z'}]
This will give you a lot of information on all the datasets in the competition.
Downloading All Datasets
Use the following command to download all datasets for a competition:
api.competition_download_files('titanic')
This downloads a .zip file into the current working directory. You will need to extract the archive before being able to use the files. You can extract .zip files in Python using the zipfile
package.
import zipfilewith zipfile.ZipFile('titanic.zip', 'r') as zip_ref:
zip_ref.extractall('C:/Users/Jam/Documents/Python Scripts/Medium/medium_articles/kaggle_api')
If you check the working directory now, you will see that the datasets have been extracted and you now have 3 .CSV files:
import osos.listdir()Out[16]:
['gender_submission.csv',
'kaggle_api.py',
'test.csv',
'titanic.zip',
'train.csv']
Conclusion
Now you can configure the Kaggle API for Python in both Windows and Linux, search for competitions and download the datasets within a competition. (With the added bonus of being able to extract a .zip archive)
The code for this article is available at https://github.com/jammage13/medium_articles/tree/main/kaggle_api