The answer is YES, there is data journalism in machine learning. For example there are sessions at IRE 2019 training for machine learning. Here are a few other links relating data journalism and machine learning:
2. Predicting using a simple Linear Model: using two variables in the data set: income and house value. A plot of the data show that there is a strong relationship, shown also by the regression line. Creating a linear model for these variables makes it possible to predict for example income from the value of the house
3. Predicting using randomForrest: The linear model is simple and works for interval variables. But predicting nominal values, for example gender or political party of the mayor, requires a different approach. RandomForrest provided interesting outcomes.
Here is the link to both predicting models: https://www.kaggle.com/peterverweij/prediction-simple-machine-learning
Dat journalists have to dig deep into statistics, but these example show that there is added value for reporting. These example are of course limited; there is a whole set of different machine learning algorithms in R available; i have only tried two. Here is the list:
- https://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html
- https://www.r-bloggers.com/what-are-the-best-machine-learning-packages-in-r/
Is
there any use of the learning in the newsroom?
- Reuters: https://www.wired.co.uk/article/reuters-artificial-intelligence-journalism-newsroom-ai-lynx-insight
- Paul Bradshaw:
https://onlinejournalismblog.com/2017/12/14/data-journalisms-ai-opportunity-the-3-different-types-of-machine-learning-how-they-have-already-been-used/
- Example1:
https://medium.com/journalism-innovation/exploring-machine-learning-in-newsrooms-7ec0b320c994
What could be the output for data journalist working with machine learning?
I have done some experiments in R at Kaggle using a data set about Dutch municipalities.
1. Clustering using kmeans: makes it possible to generate meaningful clusters of municipalities based on income, population, house value, unemployment etc. Although not very impressive, it is possible to create various centers or clusters in the data: large municipalities; high income municipalities, and high unemployment. Here is the kernel with the coding and the results: https://www.kaggle.com/peterverweij/clustering-gemeentedata-using-kmeans
2. Predicting using a simple Linear Model: using two variables in the data set: income and house value. A plot of the data show that there is a strong relationship, shown also by the regression line. Creating a linear model for these variables makes it possible to predict for example income from the value of the house
3. Predicting using randomForrest: The linear model is simple and works for interval variables. But predicting nominal values, for example gender or political party of the mayor, requires a different approach. RandomForrest provided interesting outcomes.
Here is the link to both predicting models: https://www.kaggle.com/peterverweij/prediction-simple-machine-learning
Dat journalists have to dig deep into statistics, but these example show that there is added value for reporting. These example are of course limited; there is a whole set of different machine learning algorithms in R available; i have only tried two. Here is the list:
- https://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html
- https://www.r-bloggers.com/what-are-the-best-machine-learning-packages-in-r/