Data

Tim Hart, in an article at The Financial Times, wrote an interesting article about the use of spreadsheets in business and science. I just had to save this quote:

Or say you’re a genetics researcher typing in the name of a gene such as “Membrane Associated Ring-CH-Type Finger 1”, or March1 for short, or perhaps the Sept1 gene. You can imagine what Excel does next. It turns those gene names into dates. One study estimated that 20 per cent of all genetics papers had errors caused by Excel’s autocorrect Microsoft’s defence is simple enough: the default settings are intended to work in everyday scenarios. Which is the polite way of saying: Guys, Excel wasn’t designed for genetics researchers. It was designed for accountants (…) And yet when the genetics research community were wrestling with the autocorrecting genes issue, they resigned themselves to the hard truth that they would never wean people off Excel. Instead, the folks in charge — the Hugo Gene Nomenclature Committee — decided to change the names of the genes in question.

Apparently the U.K. was not undercounting COVID-19 cases in the country due issues with the Excel file format they were using.

According to the BBC, the error was caused by the fact that Public Health England developers stored the test results in the file format known as .XLS. These .XLS files were then sent to the NHS after uploading to a central system. .XLS is an outdated file format, however, and each spreadsheet can have only 65,000 rows. By contrast, the .XLSX file format, which was first released in 2007, allows for more than 1 million rows. Because of the limited number of rows, each spreadsheet could contain about 1,400 cases, leaving excess cases off the file altogether. Although the issue was reportedly fixed by splitting the files into smaller batches, many are slamming Public Health England. “Why are critical databases in a national pandemic posted on Excel spreadsheets?” Jonathan Ashworth, the Labour Party’s shadow health secretary, said. “Why aren’t they using specialist data-based software?”

Read the full article from Slate here. The article also gives some more examples of problems caused by using the spreadsheet software.

Vox made this important video where they “deconstruct one particularly popular chart of covid-19 cases around the world which uses a logarithmic scale, and explain how to avoid being misled by it.”.

Al Jazeera made a nice short documentary about the work of data journalist and artist Mona Chalabi: Journalism Through Art. Her hand drawn style is really interesting as it can challenge dominant narratives and styles of representing data. And the style itself shows how there is always some uncertainty in the data:

She believes that drawing can make data more accessible to people - and more transparent. “Part of the purpose of creating hand-drawn illustrations is that I want people to look at it and question the illustration that they see in front of them because the truth is that there is a high degree of imprecision in data,” she says. “I think there’s something to be said for an understanding that for every statistic that you see the truth lies somewhere in the parametres around that number.”