(Our consultancy turns four years old this February. To celebrate, we’ve written a blog series on lessons learned from four years of data science consulting. This is the first.)
It’s a question that inevitably comes up, usually from a CEO or a board member. How does our company’s use of data compare to others? Are we behind comparable companies, or are we ahead of the curve?
As a consulting firm, we get to work with companies across a lot of sectors, including media, education, healthcare, finance, government, e-commerce, non-profits, consumer tech. We’ve had a chance to build up a sense for how companies actually use data.
Every CEO asks: How does our company’s use of data compare to others?
Data is all about feedback and coordination, much like the feedback and coordination it takes to get around on your own two feet. Moving faster takes more coordination, more feedback, and more training to get there and stay there.
To keep this simple, we’re going to group all companies into four types based on how they use data: crawling, walking, running, and bicycling.
Crawling companies are at the most basic level of data usage. They collect data in spreadsheets or across multiple, separated systems that are not connected. Most tasks are done by people and most data is collected manually. Data analysis is done in Excel, if it’s done at all.
Most companies in the U.S. are Crawling, and are fine being there. If you’re a local construction firm or a corner store, your monthly and yearly P&L statement is enough analytics to keep you running for a long time.
Don’t let the simplicity of this data collection make you think it’s easy. Lots of work goes into collecting and maintaining accounting analytics. There are over six hundred years of theory and practice behind the basic metrics of running a business. Crawling companies definitely aren’t lying down.
For Crawling companies, often their monthly and yearly P&L statement is enough analytics to keep them running for a long time.
Many Crawling companies, especially ones with annual revenues over $10 million or so, suffer from their lack of data integration and question-answering ability. They find themselves with tantalizing bits of advanced tech like Google Analytics or other automated dashboard systems, and recognize that they could have much more than they currently do.
Walking companies get feedback at a steady pace. They have collected their data together into integrated systems. Most data is collected automatically as customers use the product or service. Business intelligence tools become useful as centralized data decouples collection from analysis. Definitions of “customer,” “sale,” “click,” and other atomic units are standardized and used the same way everywhere in the company.
Most technology companies are somewhere between Crawling and Walking. Their data comes in clean, for the most part, since it is collected by machine in the first place. Unless it’s baked in very early on (which can be a mistake, since early stage companies have so much else to worry about) it takes a few years of focused effort though to get all the way to Walking.
Lots of 1990’s innovations in data (like schemas, data warehouses, BI tools) were about getting companies to Walking. They’re still necessary. We think the data science community gives these kinds of interventions get a bum rap, since they’re not nearly as “cool” or “cutting edge” as predictive analytics things.
Getting all the way to Walking is an achievement. It means that someone making decisions can reliably count things (and take ratios of counts) in a timely manner.
If you believe, as we do, that getting feedback is crucial to being able to do experiments and make effective decisions, then getting to Walking is crucial for a modern business that wants to compete at scale. Data engineers and business analysts are important players in a Walking business.
“Big data” and NoSQL tools are often misused in ways that are trying to short-circuit getting to Walking. Technologists sometimes figure that as long as they can write Spark jobs across giant directories of denormalized log files, there’s no need to design consistent definitions or make querying easy for decision makers. It would be nice, but it ain’t so.
Getting all the way to Walking is an achievement. It means that someone making decisions can reliably count and take ratios of counts in a timely manner and with understandable results. For many businesses, this is as far as they’ll need to go, at least for the time being. Carl Anderson’s recent post makes a similar point as he argues that companies in this stage don’t need to hire people with advanced statistics skills.
Running companies are elite. They’re able to use data with sophistication. They regularly conduct experiments, and have the statistical expertise to analyze the results. They have a solid grasp of correlation and causation.
Running companies build predictive models and incorporate them into their decision making, often using workhorse techniques like linear regression and simulation. Here is where having a data scientist on staff really starts to pay off.
A company doesn’t have to be Running to make use of predictive modeling. Any company can have isolated predictive models that serve pieces of the product, or help with really important decisions. Our clients are most often Walking (or nearly Walking) companies that are looking to solve a few well-defined problems they know will add to their bottom line.
Running companies can reliably produce features for their products that are based on recommendation or prediction. Prediction and recommendation are often core parts of their product. Many of the big software companies are Running. LinkedIn is a good example of a Running company. Lots of modeling, many product features that incorporate prediction, and decision making internally that makes use of advanced analytics.
At the far end, companies go from being entirely human to some kind of human-machine hybrid. Bicycling companies have turned over large parts of their business to automated decision making. Their models are sophisticated enough that they’re willing to trust the machines to run the experiments and make critical calls.
Very few companies are Bicycling.
The most prominent examples are probably Google and parts of Microsoft. BuzzFeed is somewhere between Running and Bicycling. Our understanding is that products like Google Search, Bing, Adswords, and so on, are for the most part adaptive, where the product is tuned over time based on a closed feedback loop. Algorithms propose slightly different versions of the product, and boost versions that do better.
Humans are never out of the loop entirely, of course, just like how it takes a rider to make a bicycle go. Their decision-making role moves to a higher level. Bicycling companies can appear almost magical to most of us, since they seem to get better faster than other companies. That’s not an accident, nor is it cheap.
Maybe in ten or twenty years there will be lots of companies that are Bicycling. Someone will come along and invent the data science equivalent of cheap mountain bikes. Today even Bicycling companies aren’t that far ahead of penny farthings. Perhaps the tools will get easier and easier to deploy, and lots of businesses will be run this way. Or maybe there will be massive consolidation into companies that already are comfortable operating this way, and the next generation of executives will be experts at handing over more of the reins to the computers.
Every business uses data to some extent, even if it’s just to know if it’s making money. Some go further and make analytics easy and reliable. Others go further and master statistics and predictive models. And still others turn over large portions of their business directly to computers to make decisions. No matter what, you need to learn to walk before you can learn to run. And learning to run isn’t a good use of time or effort for every business.
At Polynumeral, we’ve had projects with clients from each of these types. Businesses that are Crawling need direction; businesses that are Walking and Running need to better understand their customers and improve their products; businesses that are Bicycling need outside perspectives to make sure they’re doing everything they can. If you’d like to talk about your next steps, drop us a line.