Configuring Kaggle Python API and downloading data from a competition

4 min readNov 2, 2022

The Ubiquitous Titanic Competition makes an appearance

Opening Spiel

Many an article has been written using datasets from Kaggle and many a fledgling data scientist has used the Titanic datasets as a first project.

I will show you how to configure the Kaggle public API in Python and download these datasets (from the comfort of your IDE/command line)

Installing the Kaggle API

Simply install through PIP as follows:

pip install kaggle

Authentication

You will need to generate an API token through your Kaggle account.

Navigate to the ‘API’ section and click ‘Create New API Token’

This will download a file called ‘kaggle.json’, depending on your operating system, you’ll need to put this is one of 2 places.

Windows

For windows, you’ll need to go to C:\Users\<username> and create a folder called .kaggle (the dot is important) and copy the file into this new directory.

Linux

For Linux you’ll need to add a new directory in /home/<username> called .kaggle and copy the file into the new directory.

Listing Competitions

You can list all competitions using the following code in Python

import kaggleapi = kaggle.apiapi.competitions_list()Out[8]: 
[contradictory-my-dear-watson,
 gan-getting-started,
 store-sales-time-series-forecasting,
 tpu-getting-started,
 digit-recognizer,
 titanic,
 house-prices-advanced-regression-techniques,
 connectx,
 nlp-getting-started,
 spaceship-titanic,
 otto-recommender-system,
 nfl-big-data-bowl-2023,
 g2net-detecting-continuous-gravitational-waves,
 novozymes-enzyme-stability-prediction,
 competitive-data-science-predict-future-sales,
 dfl-bundesliga-data-shootout,
 scrabble-player-rating,
 lux-ai-2022-beta,
 tabular-playground-series-nov-2022,
 feedback-prize-english-language-learning]

This will print out a list of competitions (this is done in pages and the default page is 1)

You can also search for competitions using a search term:

api.competitions_list(search='titanic')Out[10]: [spaceship-titanic, titanic]

Listing Datasets

You may want to have a look at the datasets within a competition before downloading.

api.competitions_data_list_files('titanic')Out[11]: 
[{'nameNullable': 'test.csv',
  'descriptionNullable': 'test data to check the accuracy of the model created\n',
  'urlNullable': 'https://www.kaggle.com/',
  'ref': 'test.csv',
  'name': 'test.csv',
  'hasName': True,
  'description': 'test data to check the accuracy of the model created\n',
  'hasDescription': True,
  'totalBytes': 28629,
  'url': 'https://www.kaggle.com/',
  'hasUrl': True,
  'creationDate': '2018-04-09T05:33:22.3963227Z'},
 {'nameNullable': 'train.csv',
  'descriptionNullable': 'contains data ',
  'urlNullable': 'https://www.kaggle.com/',
  'ref': 'train.csv',
  'name': 'train.csv',
  'hasName': True,
  'description': 'contains data ',
  'hasDescription': True,
  'totalBytes': 61194,
  'url': 'https://www.kaggle.com/',
  'hasUrl': True,
  'creationDate': '2018-04-09T05:33:22.3963227Z'},
 {'nameNullable': 'gender_submission.csv',
  'descriptionNullable': 'An example of what a submission file should look like.  \n\n*These predictions assume only female passengers survive.*',
  'urlNullable': 'https://www.kaggle.com/',
  'ref': 'gender_submission.csv',
  'name': 'gender_submission.csv',
  'hasName': True,
  'description': 'An example of what a submission file should look like.  \n\n*These predictions assume only female passengers survive.*',
  'hasDescription': True,
  'totalBytes': 3258,
  'url': 'https://www.kaggle.com/',
  'hasUrl': True,
  'creationDate': '2018-04-09T05:33:22.3963227Z'}]

This will give you a lot of information on all the datasets in the competition.

Downloading All Datasets

Use the following command to download all datasets for a competition:

api.competition_download_files('titanic')

This downloads a .zip file into the current working directory. You will need to extract the archive before being able to use the files. You can extract .zip files in Python using the zipfile package.

import zipfilewith zipfile.ZipFile('titanic.zip', 'r') as zip_ref:
    zip_ref.extractall('C:/Users/Jam/Documents/Python Scripts/Medium/medium_articles/kaggle_api')

If you check the working directory now, you will see that the datasets have been extracted and you now have 3 .CSV files:

import osos.listdir()Out[16]: 
['gender_submission.csv',
 'kaggle_api.py',
 'test.csv',
 'titanic.zip',
 'train.csv']

Conclusion

Now you can configure the Kaggle API for Python in both Windows and Linux, search for competitions and download the datasets within a competition. (With the added bonus of being able to extract a .zip archive)

The code for this article is available at https://github.com/jammage13/medium_articles/tree/main/kaggle_api