vrijdag 8 oktober 2021

Off the Charts The best of our data journalism Economist Data journalism News Letter

 Why “R” is the best coding language for data journalism

By  James Fransham Data Journalists
From the Economist Data Journalism  newsletter

Data journalism is a pursuit whose success relies on being able to crunch numbers. Lots of them. For many years data journalism—a term that was popularised beginning about 2010—mostly relied on the power of spreadsheets alone. But pivot tables, vlookups and other spreadsheet functions only get you so far. With more and more data available, being able to perform more powerful and flexible operations is now a vital part of the data journalist’s toolkit. Two programming languages, Python and R, vie for data journalism supremacy. But which is best? Here is my case for the latter. 

R is an open-source language, it is free and open for use by everyone. It was spun out of another programming language called “S” in the early 1990s by two academics working at the University of Auckland in New Zealand. It was developed to focus on statistical-based problems and so it naturally handles data.

Although R has lots of in-built functions, what gives R its versatility is its packages. These nifty extensions bundle up code, data and documentation and they can be imported into R with a single line of code. And there are lots of them. The number of packages has increased from 2,700 in 2010 to 18,335 today. Want to know the weather in New York City yesterday? There are packages for that. Or Tesla’s share price? There are packages for that . Want to perform some esoteric statistical function? There are packages for those too. R’s packages are organised into task views, so if there’s a new subject you are taking on that’s a good place to start.

All versions of R and the packages that have been developed for it are available on the Comprehensive R Archive Network, more commonly known as CRAN, a network of servers that are hosted by academic institutions around the world. The basic interface of the R software is byzantine. But the R Studio application—built by a for-profit organisation in America, although the basic version is also open-source—provides a sleek “integrated development environment” (IDE) that sits atop of R. R Studio helps you learn R’s syntax quickly and has good debugging tools which makes programming almost(!) headache-free.

The popularity of R owes a lot to the pioneering work of a handful of individuals. Perhaps the most celebrated is Hadley Wickham, a Kiwi academic and an employee of the company that owns R Studio. Mr Wickham has developed a number of packages, known as the “Tidyverse”, that make handling data and number crunching in R more intuitive to new users. 

Perhaps what really sets R apart is its visualisation library, ggplot2. This library, which was also developed by Mr Wickham, allows you to create different charts and visualisations with just a few lines of code. The “gg” stands for “grammar of graphics”: visualisations can be changed or augmented with additional simple blocks of code, much like constructing a Lego model. On our data team, ggplot allows us to iterate through visual ideas and quickly see what might be most suitable. 

R makes coding fun and flexible. You can become competent in just a few months with no previous programming experience. And there is a whole community of like-minded programmers who can help you along the way, such as those found on Stack Overflow, R-bloggers, R-weekly and R-Ladies. So if you ever get stuck there are plenty of places to turn to.

Geen opmerkingen:

Een reactie posten

Opmerking: Alleen leden van deze blog kunnen een reactie posten.