There has been tremendous buzz around “Big Data” fueled by the exploding volume, variety and velocity of data from mobile devices and social networks on one hand, and the ability to crunch this data using high-density compute on the other hand. The impact of analyzing the data firehose could be very rewarding in many verticals. McKinsey estimates that over $300B in value can be extracted by the US healthcare system using big data, more than double the total annual healthcare spending of Spain, and $600B in consumer surplus can be potentially generated by using unstructured location data wisely.
But, has big data delivered on the promise yet? Some early examples have emerged, but they are few and far between compared to the promise. It has now been half a decade since talks around big data’s promise first emerged. It seems that the technology building blocks needed are finally maturing, but we still need to address the talent gap via self-service analytics and converge the analytics and transaction tier to fully realize the data-driven business value.
Technology Building Blocks are maturing
Oracle, Teradata and others built industry-leading franchises over the last 20 years, generating over $50B in annual revenues managing “structured,” relational data like customer purchases, inventory, user profiles and such. However, the vast troves of unstructured data generated from location data, log data from mobile app usage, and conversations in social networks need an architecturally different layer from the ground up, as they are often 10-50X the structured data volumes. Google, Yahoo and Facebook contributed key elements of their data management layer via open source projects such as Hadoop, Cassandra, MongoDB and Spark, which have established the building blocks for data processing. Intel’s $740M investment in Cloudera last year was a major milestone in endorsing Hadoop as being enterprise ready.
Self-Service Analytics Key to Addressing the Massive Talent Gap
There is still a major talent gap in the big data industry, as demand for data scientists and data-savvy business analysts who can uncover the golden nuggets in these vast troves of data outweighs the supply of professionals with these skills. LinkedIn, for instance, has over 300M professionals connected in an intricate graph, yet we are only scratching the surface with recruiting and ads-based solutions. The ability to find indicators of US economic growth or to optimize career paths for aspiring student graduates exists within the LinkedIn data goldmine, but the data science talent gap could be an inhibitor. Data scientists also seem to be spending up to 2/3rds of their time cleansing and preparing the data for analysis, which is major drag on their time. Solutions that automate the data collection, cleansing and handling schema drift could save significant resources and empower data scientists to focus on finding those golden nuggets in the data. Also, as we move from data at rest in Hadoop to handling streaming data, a new class of tools to cleanse and prepare the data streams could empower analysts to detect new insights in near real-time.
Convergence of Transactional and Analytical Stack Unlocks Significant Value
A major shift in attitudes and perhaps organizational structures may be necessary for data practitioners to collaborate with business decision makers to exploit the full value of big data analytics. Organizations have been trained to run business workflows and transactions separately, and periodically collect the data in a separate analytical tier to do customer segmentation, or discover fraudulent claims for instance. However, the promise of big data lies in the convergence of the transactional and analytical stack. For instance, I sit on the board of Reflektion, a company founded by the ex-Google AdSense team. Reflektion analyzes 125M users in real-time, while processing their purchase transactions, to personalize their entire journey through the site in the same way Google personalizes ads based on real-time click patterns. But that requires collaboration between the marketing and e-commerce teams, business analysts, and data scientists so that personalization can be applied while a shopper is on the company’s website, not offline and a week later. Similarly, data science is the new frontier in cyber-security whereby user activity streams are analyzed in real-time rather than detecting “signatures” later to detect anomalous behavior.
Digitization in Indian Market Represents a Massive Untapped Market for Big Data
In data science lingo, the higher the volume and frequency of user data, the deeper the machine learning capabilities (training models that work without prior knowledge) and the richer the analytics. By that measure, the 650M users of mobile phones expected to come online by 2020, and to drive online commerce penetration from 4% to 25% and $220B in online spend by 2030 (Goldman Sachs 2015 report) represents a goldmine of fresh data. Analyzing this data using a maturing tech stack, and emerging self-service tools will unleash a major opportunity for data scientists.
Ultimately, as the next billion online users in emerging economies like India engage in e-commerce, travel, payments, eHealth and finance, big data analytics represent potentially groundbreaking services in optimizing and personalizing the user journey throughout their online experience. At Battery Ventures, we strive to find and support these emerging vertical data-driven apps as well as the horizontal self-service building blocks that power the apps.
*This post originally appeared in the July 2015 issue of Silicon India.