We at Faraday are data science experts. We have a collective century of experience in the mystical arts of data whispering; we’ve transformed oceans of rows and columns, built and discarded thousands of databases, and molded far-seeing algorithms in a dozen programming languages.
But we mostly use SQL.
SQL is the old-as-the-hills workhorse that is interoperable with nearly everything, vaguely-consistent across ecosystems, and hella-flexible. We use it to count people over the age of 65, and we also use it to find local spatial autocorrelation across live-aggregated geographies.
SQL does it all, which is why we usually preference it over Excel or Google Sheets - and frequently over python and R - for our direct data work.
So what, you may ask, are the advanced actions that we find ourselves most-commonly performing in SQL rather than excel? What arcane rituals do we wizened elders enact by the light of the binary moon?
Gather close . . .
-
Reading dates
-
Recognizing ZIP codes as not-integers
-
Literally looking at tables with the gall to have more than 1GB of data.
You might think that these seem like pretty basic issues that should be fixed at the product level, and you’d be right! For better and worse, Excel is the de-facto engine of data processing worldwide, and on the “better” side of the equation, it’s been a great force for data democratization and literacy. Brian Timoney has those deets.
Anyone can do data analysis in excel, up to a point guarded by some now-pretty-trivial technical limitations, and therein lies the “worse” side:
You shouldn’t need a PhD and a decade of devops experience to get past that point.
You shouldn’t need SQL (at least not visibly) just because you have a large amount of data (which, increasingly, everybody has). Your spreadsheet should just carry on doing what you need it to do, like a professional. It’s therefore with joy that we see Google has announced something of a holy grail addressing one of our bugbears: BigQuery-SQL-backed spreadsheets are here, and they can handle a billion rows.
At Faraday we’ve relished using BigQuery for some time now, and we look forward to extra accessibility in Google Sheets for both our team and our clients. Our mission is to put the tools of predictive modeling in the hands of a lay audience - to be the Excel of machine learning - and Google’s move has just made our task a little bit easier.
Returning to the original point:
Data science belongs to everyone, not just the experts.