Spreadsheets of mass destruction
The first writings of human beings were not stories, they were spreadsheets. The oldest marks of writing we can find concern accounting. And soon enough, spreadsheets became the main way of representing it.
Humankind created at the same time its most disastrous invention: spreadsheet errors!
The tablet above already contained mathematical errors that survived to our time).
Tools evolved since these old ages. Today Microsoft Excel or Google Sheets prevent us to making errors… or not.
The European Spreadsheet Risk Interest Group mission is to analyse Spreadsheet errors risks. Their website references horror stories on how a single copy and paste can make you lose millions.
A 2016 study analysed 35175 excel files containing data on genetic research. They discovered that almost 20% of these spreadsheets contained gene name errors. A 2008 litterature review estimates that over 90% of spreadsheets contain errors.
Strategic decision with a spreadsheet
Today, financial services, Executives use spreadsheets to make strategic decisions. Spreadsheet exports is most of the time a mandatory feature our customer ask to work with us. This format is now the B2B data communication standard, not for the best.
This popularity grew because spreadsheets are easy to use to manipulate data. Today, this argument doesn’t hold, many programming languages are easier to use. For example, Python and Pandas are great alternatives.
As an example, let us say you have a spreadsheet with the quantity of elements purchased and their price. If you want to know your total spending, you’ll write:
SUMPRODUCT(F1:F18,H1:H18)
In Pandas, you can write:
total_spending = ( data['quantity'] * data['price'] ).sum()
All these tools are available in the browser without any installation thanks to Google Colab.
Data Reliability
Changing the tool you are using to take strategic decisions won’t solve everything. Errors while implementing math are not new. Good practices emerged thanks of decades of programming mistakes. A Spreadsheet user is most of the time respecting none of them:
- One excel file to rule them all! Let’s have this enormous spreadsheet with all our data and computation.
- Copy and paste. Let’s copy and paste formulas and the errors they contain.
- Manual operation. To take action on the data, let’s always apply the same manual steps, betting on no human errors.
- No testing. Why testing, it’s an average, how an average can be wrong?
Today modern coding patterns and methodologies apply best practices to limit these problems:
- Single Responsability Principle. A function should take care of only one part of the analysis.
- Don’t Repeat Yourself. Don’t copy-paste, use functions so if you spot an error you can correct it everywhere.
- Continous integration. Automatise the delivery process so that it’s faster and more reliable.
- Automatic testing. Test everything after every modification.
We still allow our clients to export data as a spreadsheet. But we interpret this usage as a missing feature in our product. It means they need more analysis to take action. As data specialist, it’s our role to deliver this analysis while ensuring no disastrous error.