The rise of data science platforms – notes from Predict 2017
Written by Robert Coyle
Note: This article is taken from the yet unreleased Predict Book – notes from Predict Conference 2017.
At this year’s Predict conference, Creme Global Founder and CEO, Cronan McNamara, talked about the democratisation of data, finding ways to make predictive analytics available to a wide range of users inside organisations, not just data scientists. This year he delved into how it can happen by looking at the fast-growing market for data science platforms, and his company’s own Experts Model platform.
Wikibon forecasts that the Big Data market will double from $43 billion today to just over $84 billion by 2026. Six years ago it was under $8 billion, highlighting how spectacular the growth has been and will continue to be. The impact hasn’t been lost on business leaders – an Accenture survey reveals that 89 per cent believe Big Data will revolutionise business operations in the same way the internet did.
McNamara identified data science platforms as the enabler of this change. He used Gartner’s definition to summarise what they are: “A cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solutions, and for incorporating those solutions into business processes, surrounding infrastructure and products.”
Big names in the space include tech multinational likes IBM, SAS, SAP and Oracle. A signal of growing maturity in the sector is Gartner’s first Magic Quadrant for Advanced Analytics, where newer entrants like RapidMiner, Domino Data Lab, H2O and Dataku make an appearance. Gartner talks about the quadrant companies in terms of their vision as well as their ability to execute.
Another significant change has been the growth of open source tools. The large multinationals use their own proprietary code but there is a growing number of platforms and solutions written in python, now more popular than R as the programming language for data analysis. “It’s taken the lead over R for the first time and it’s accelerating faster. We’ve been using it a lot in Creme Global,” said McNamara.
His Experts Model platform evolved out of many years designing, building and deploying data science models. The goal was to develop user-friendly environments where non-programmers can access all the power of data. McNamara describes the platform as an R&D project that’s evolving all the time, but it’s very much open for business. Creme sources, acquires and curates global datasets – some are open, complex and hard to work with; others proprietary – and uses them in data crunching projects that serve the needs of multiple industries.
One example is looking at the risk of human exposure to fragrances from everyday products like shampoos, shower gels and face creams. The platform aggregates data from disparate market sources and presents it back to stakeholders for their own analysis. “We work very closely with the industry to deploy all of it in on the web in a user friendly environment,” he explained. “All the regulatory affairs people and toxicologists have the data at their fingertips.”
Another project for the US pesticide industry evaluates people’s risks of exposure. Like a lot of Creme’s work it’s highly collaborative and involves providing tools that can be used by multiple stakeholders – food and pesticide companies in this case, as well as the Environmental Protection Agency – enabling formulations to be uploaded onto a shared system for regulatory approval.
What McNamara believes Experts Models does well, and what lets other data science platforms down, is the way it successfully combines two workflows, model creation and delivery. It does the R&D and data acquisition and then puts the data science into production.
“We’ve looked at many of the data science platform out there to helps us deliver our projects. They try to do everything and they do neither one nor the other very well, “ he said. “They want to be able to help the R&D team do experimental work and explore data and collaboration, but they also think they can help put models into production. That’s a big stretch that involves two different cultures and two different teams.”
Creme Global addressed the challenge by cultivating a working dynamic between the data scientists that build the models and the engineers who put them into production. “It’s a fully functional platform that encompasses everything you’ll need to productionise analytics,” he said.
The endgame is to get models into deployment more easily, according to McNamara, replacing the “data wrangling” and constant tweaking needed to generate monthly reports. With Expert Models it becomes a real time process, so when data is refreshed or new data becomes available it’s automatically encoded in the system and you can rerun the models.
The huge benefit is, firstly, that it quickly gets the models into the hands of people who need them, and secondly, it eliminates the risks of bias. There is always the temptation for data scientists to make changes on a monthly basis if they see something they don’t like.
With Expert Models there’s a rigorous version control system that ensures traceability and transparency. It means important decisions are based on the data sets that were used when the model was created, as opposed to going off course because someone got their hands on an Excel sheet from a shared drive and edited it without anyone knowing.
The biggest benefit, however, is that a fully functioning data science platform eliminates bottlenecks. To get to insights from data, there’s a laborious workflow organisations have to go through after they’ve come up with a model, from the design, build and infrastructure phase to deployment and monitoring. “There is probably 18 months of work,” he said. “These are what we call production bottlenecks.”
Expert Models cuts to the chase, eliminating system design, build and test and most of the infrastructure requirements, enabling organisations to get to the insights they’re looking for more quickly and cost effectively. McNamara summed up the benefits of a platform approach: “If you can go from ad hoc data science reporting to a production environment where everybody can access your models in a user-friendly way… the winner will be your business.”
In summary, here are the six reasons why you need a data science platform
- Productionise your analytics – real-time not ad hoc reports)
- Customer centric – it’s always-on and collaborative
- Low barrier to entry – zero capital costs
- Eliminate operational risk – traceability and reproducibility
- Data pipeline process – version control and documentation
- Make better decisions – organisations act on evidence-based insights