/ #data quality 

Spreadsheets of mass destruction

The first writings of human beings were not stories, they were spreadsheets. The oldest marks of writing we can find concern accounting. And soon enough, spreadsheets became the main way of representing it.

The tablet Plimpton 322, from -1800 (source wikipedia)

Humankind created at the same time its most disastrous invention: spreadsheet errors!

The tablet above already contained mathematical errors that survived to our time).

Tools evolved since these old ages. Today Microsoft Excel or Google Sheets prevent us to making errors… or not.

The European Spreadsheet Risk Interest Group mission is to analyse Spreadsheet errors risks. Their website references horror stories on how a single copy and paste can make you lose millions.

A 2016 study analysed 35175 excel files containing data on genetic research. They discovered that almost 20% of these spreadsheets contained gene name errors. A 2008 litterature review estimates that over 90% of spreadsheets contain errors.

Strategic decision with a spreadsheet

Today, financial services, Executives use spreadsheets to make strategic decisions. Spreadsheet exports is most of the time a mandatory feature our customer ask to work with us. This format is now the B2B data communication standard, not for the best.

This popularity grew because spreadsheets are easy to use to manipulate data. Today, this argument doesn’t hold, many programming languages are easier to use. For example, Python and Pandas are great alternatives.

As an example, let us say you have a spreadsheet with the quantity of elements purchased and their price. If you want to know your total spending, you’ll write:

SUMPRODUCT(F1:F18,H1:H18)

In Pandas, you can write:

total_spending = ( data['quantity'] * data['price'] ).sum()

All these tools are available in the browser without any installation thanks to Google Colab.

Data Reliability

Changing the tool you are using to take strategic decisions won’t solve everything. Errors while implementing math are not new. Good practices emerged thanks of decades of programming mistakes. A Spreadsheet user is most of the time respecting none of them:

  • One excel file to rule them all! Let’s have this enormous spreadsheet with all our data and computation.
  • Copy and paste. Let’s copy and paste formulas and the errors they contain.
  • Manual operation. To take action on the data, let’s always apply the same manual steps, betting on no human errors.
  • No testing. Why testing, it’s an average, how an average can be wrong?

Today modern coding patterns and methodologies apply best practices to limit these problems:

We still allow our clients to export data as a spreadsheet. But we interpret this usage as a missing feature in our product. It means they need more analysis to take action. As data specialist, it’s our role to deliver this analysis while ensuring no disastrous error.

Cornis can help you to convert your Microsoft Excel based data anaysis to modern data pipeline, contact us!