woensdag 21 juli 2021

Is Windows transforming into Linux?

My first machine was running MSDOS is operating system(OS). On top of it you run a simple word processor, spreadsheets and database  software. It worked but a graphical user interface(GUI) was still under development. Things got slightly better with the introduction of Windows; Windows 95 was the big take-off. And of course adding to the OS the Office software: Windows became the tool of preference. Although Windows software was installed on  90% of the PC's., it was generally no fun. 

For example, you could get a blue screen or a completely frozen screen. You had to restart and if you did not saved your work you could start all over. Then the was the continuous updating by Microsoft; sometimes in the middle of your work. In the middle of a training Bill Gates stepped in to make things better.  And finally after all the updates, you had to find out where you could changed settings, the designers had changed the position and placed them under new headings.

Balmer

Finding an alternative OS was in the beginning not easy and demanded skill and knowledge of Linux. But Linux adapted fast and developed  a complete OS with different tools. There was a cold war between Microsoft and Linux. Microsoft CEO Steve Ballmer described Linux at that time as “cancer ”. Although Linux was developing in the direction of a serious alternative for Microsoft., and with no cost: Linux is open source and therefore free. It was never accepted. Running and installing Ubuntu or Mint is not rocket science, and working on these machines is slightly different from a windows machine. It never convinced Microsoft die hearts.

Microsoft may have won the battle for desktops and office software, but the servers and network software is all Linux. So the problem is to enable Windows to talk to all the Linux machines. Finally Microsoft saw the light and start embracing Linux.

Azure, Microsoft successful cloud computing is running Linux. So Microsoft needed it own  Linux version. Secondly this Linux version should also work on Windows machines. Let’s have a look at the various options.

 Linux on Windows

1. Installing Linux on windows is called WSL- Windows Subsystem Linux.The WSL comes in three  steps. Microsoft first introduced a Linux terminal in a distribution of choice, for example Ubuntu. Later it introduced a GUI, so you can run linux apps wit GUI.WSL1 only Linux terminal. Here is the documentation of Microsoft or read my blog posting in 2016. 

2. Later Microsoft introduced WSL2  with a GUI  using Xserver. If you want to give it try, here is a nice video showing how to install. Running Linux inside Windows is one thing; but it works also the other way. From Linux you have access to Window files and programs. Type for example explorer.exe at the Ubuntu prompt and you have the file explorer show the directory you in.



3. The latest is WSLg the GUI for Linux in Windows and it makes it possible to run directly Linux graphical apps. But you need to registers for the insider program and install windows developer version. It is to be expected that in fall with a new windows update WSLg will be implemented. 

Microsoft Linux

Microsoft own Linux version-CBL(Common Based Linux-Mariner). To install Microsoft Linux you can download the distribution and transform it into an ISO. To the get the feeling run it in a virtual machine. It comes  not with desktop and  GUI. CBL-Mariner is the Linux kernel used for running other Linux distributions


What is next?

Yes the cold war is over and peace broke out between Microsoft and Linux. Microsoft is definitely embracing Linux and incorporating it in windows. And also Microsoft is heavily involved in the development of open source software since it acquired Github. But it is not to be expected that Windows will be replaced by Linux. There is too much invested in other related Windows software to start a transition like this. But as a user you could run smoothly Linux apps.

However, now I have a Microsoft Linux distribution (CBL-Mariner) and I can run Linux apps graphical in Windows, and start Windows apps from the Linux prompt. Nice but so what…. ? I am not a developer. And as  user I prefer to run either Linux app or Windows apps, avoiding not the get mixed up. At the end of the day it is still running Linux in Windows in container. For the moment I stick to Linux(Ubuntu or Mint), which both give the ultimate control over the machines and the installed software.

                                                         

donderdag 15 juli 2021

DATA JOURNALISM 2.0

Version 1.0

Analyzing and visualizing a table with figures for an article in a newspaper is not exceptional anymore. Take for example local taxes per municipality. Once you have downloaded the figures in a spreadsheet is not so difficult to to notice which of the local taxes generates the highest income for municipalities. A simple bar graph will do; or platting the tax income per municipality on a map will draw attention to the most expensive municipalities. Her are two examples made by Flourish about taxation for garbage collection in municipalities in the province of Gelderland. A bar graph: https://public.flourish.studio/visualisation/5152134/ and a map: https://public.flourish.studio/visualisation/5152371/ . Her is a dashboard made in Tableau: https://public.tableau.com/app/profile/verweijpjc/viz/reiniging/Dashboard1

Sometimes the data provider makes downloading data easy (http://d3-media.blogspot.com/2021/04/r-data-journalism-helpers.html ), bit often also provides analysis and visualisation(https://www.cbs.nl/nl-nl/nieuws/2021/04/gemeenten-begroten-11-3-miljard-euro-aan-heffingen-in-2021 ).

Here is the reporting of a local/regional newspaper about the taxes: https://www.gelderlander.nl/arnhem-e-o/kaart-deze-gemeenten-zijn-het-duurst-om-in-te-wonen~a217736c/

This is data journalism 1.0 and it is not the rocket science anymore it looked like 25 years ago. Data journalism 1.0 is almost 25 years old and the original beta version .0 is more tha 50 year old.(http://d3-media.blogspot.com/2018/07/new-steps-in-data-journalism.html ) A lot has been changed, most important is the tools are more easy to handle and require less skill and training. Data from a web page for example can directly imported in excel for example and visualizing based on a template from Flourish for example is almost standard procedure for journalists.

Version 2.0

On the other hand data software is developing fast. Data science is creating more software for analyzing data. Interesting for application in journalism is software generally known as Artificial Intelligence(AI) or Machine Learning(ML). You don’t need special machines for running those software, nor do you have to pay huge amounts for the use. A good laptop with a fast processor enough memory and a nice video device will do to run the software,which comes in two favors: R or Python. Skipping the discussion which is best, the difference in general is that both are capable of analyzing data; Python is a full programming language and R is more focused on statistics. I work mostly with R (http://d3-media.blogspot.com/2014/04/vijf-redenen-om-r-te-gebruiken-in-data.html ), running on a Linux operating system(OS)(http://d3-media.blogspot.com/2011/09/linux-voor-journalisten.html ). But Windows or Apple will also work.

R has a steep learning curve. Here is a howto start: http://d3-media.blogspot.com/2019/03/learning-r-for-data-journalists.html There is no graphical user interface(GUI), so you work from a terminal typing in commands or merge a set of command  into a small program. R has libraries for special job, and one set of libraries is dedicated to ML AI and or ML has lots of application.

Here is an example analyzing a dateset of municipalities in NL . I will use this data set also for explaining Machine learning. This example shows a standard analysis of the data in R;

https://www.kaggle.com/peterverweij/gemeente-test

This example is shown in Kaggle; more about this interface: http://d3-media.blogspot.com/2019/02/data-journalists-what-do-you-know-about.html

Machine Learning

ML has for data journalism has various area’s for implementation or application:

Automated content production or robot journalism (https://memeburn.com/2014/03/what-a-californian-earthquake-can-teach-us-about-the-future-of-journalism/ ) is one of them which drawing at the moment much attention. Another area is content optimalization: optimizing the content for a specific user.

For data journalism the second area data mining is the most interesting. The following chart gives an overview of the possibilities:




((chart from: https://nl.mathworks.com/discovery/machine-learning.html )


I will not discuss all these programs in detail.

First I will discuss the basic idea of machine learning and second I will show some examples. On data set based on data about dutch municipalities I will use to show: linear regression; decision trees and neural networks. An other dataset based on twitter I will use for showing K means.

The core of AI or ML is a black box: you give the box data input, next the box starts doing complicated statistical operations (the algorithm), which you fine tune with various option, and finally you have the output. Take for example the Titanic, there is complete data set about the passengers. With ML it is possible to calculate your changes to survive the shipwreck. Or to predict whether the mayor of a dutch municipality will be male or female.

For using the Ml you don’t need exact knowledge about the black box, the algorithm itself. That is for the coders or data scientist how design this software. The basic question for your research is what do you want to do?

From the chart we see that there are two entries: supervised and unsupervised learning. Supervised learning means that model(algorithm) which predicts the gender of the mayor of the changes on survival, must be trained on a known data set. When the model is trained, and you know then the margins of error, it can be apllied to complete data set to the predictions.

In unsupervised learning the data is immediately read into the an algorithm which makes the best of it. For example, analyzing tweets from two populist members of the Dutch house of representatives (Wilders and Baudet) show that their tweets have different clusters. Meaning they are both right wing populist but focus on different issues. Creating the clusters is a mathematical operation with no control data.


Under supervised learning we have to areas: classification and regression.

Unsupervised learning focuses on clustering.


Examples of machine learning


0. Kaggle

I will use Google Kaggle interface to show the code of the various algorithms. Here is a general intro to kaggle.

- machine learning in kaggle: http://d3-media.blogspot.com/2019/03/kaggle-is-there-data-journalism-in.html


1. Regression means that in a simple case with two variables, the variation of one variable is related to the variation and the other variable. Average income in a city for example will relate to the average price of houses; the relative rich cities houses will be more expensive. This relationship between the variables based on the variation they have in common can be expressed in a number: correlation. Or by a line a in scatter diagram, that is linear model, commonly called the trend(line)


When the number of variables increases the prediction of the outcome of one variable becomes more complicated. An the we have to use an other algorithm or model for the prediction. With decision trees it is possible to predict the gender of a mayor based om population, income, house price and unemployment; or predict the political party of the mayor based on the same variables. Here the predicted outcome is a category, nominal level of measurement


Neural networks make it possible to estimate the value, the quantity of a variable, ratio level of measurement. For example an estimation of the income of city, based on the other variables.


- regression: decision trees - random forrest: https://www.kaggle.com/peterverweij/prediction-simple-machine-learning


- regression: decision trees with rpart: https://www.kaggle.com/peterverweij/gender-rpart-predict-gem


- regression: neural networks with tensor flow: https://www.kaggle.com/peterverweij/kernel-tensorflow-woz


2. Clustering

The goal of clustering is to group or cluster observations that have similar characteristics, This is an example of unsupervised learning, so we have no control or check. Inspection of the output is the test. In the example below we regroup municipalities.

- clustering: https://www.kaggle.com/peterverweij/clustering-gemeentedata-using-kmeans


3. Classification- classification with voyant using nearest neighbor :

http://d3-media.blogspot.com/2020/03/two-faces-of-twitter-populism-in.html




Literature

More background and detail on machine learning for data journalists:

https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-should-know-3cc96e0eeee9

and

https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ )

and

http://d3-media.blogspot.com/2019/07/journalism-as-algorithm.html review of Automating the News. How Algorithms Are Rewriting the Media. By Nicholas Diakopoulos

  1. ISBN 9780674976986 ; Harvard University Press 2019