zaterdag 2 maart 2019

LESSONS FROM KAGGLE(3)

The good thing of Kaggle from the perspective of data journalism is that Kaggle holds an interesting collection of datasets, ranging from the Olympics to Economic Freedom and Sovereign debt.
The formats of the datasets vary from csv to sdl and json.The datasets can be downloaded used for private analysis. However once you selected a database you can search for the kernel; that is the analysis of the data.

Let's take an example: search for data using the tag 'economics'; select the 'Economic Freedom Report'(2018). Scroll down and get an idea of the data. Next top left and select 'open in': either in google sheets or google data studio; use copy API command to download to your own machine.

But before you start working, check what others have been doing with the data. Click on kernels. There are 9 kernels available; in R and in Python. Opening the first one, focussing on 'IS AN ECONOMICALLY FREE COUNTRY A BETTER PLACE TO LIVE?' Perhaps it is useful for your writing, perhaps not and you start analyzing yourself; however looking at these example gives you a nice intro in how to do data analysis.

When you have chosen a dataset you can start a new kernel, and your own analysis. This is interesting because all the software you are using is in the cloud; you are working on a virtual machine in the cloud with R and Python installed, and it runs the code you are entering....all for free. You can save your work; make it public, save with others and ask for comment. I think this is a very strong point of Kaggle.

Here is my example. I uploaded some old data about the Dutch municipalities and used R for some analysis. Have a look at the results:
Dataset in csv:https://www.kaggle.com/peterverweij/data-about-dutch-municipalitiestest
Kernel in R: https://www.kaggle.com/peterverweij/gemeente-test

Geen opmerkingen:

Een reactie posten

Opmerking: Alleen leden van deze blog kunnen een reactie posten.