Our Take on the Data Deluge, and What’s Next

Infrastructure Software

Dharmesh Thakker, Jason Mendel | November 9, 2021

There’s been a stunning rise lately in the number—and type—of private companies worldwide offering technology in various corners of the data sector. As Matthew Scullion, the CEO of our portfolio company Matillion*, put it in TechCrunch recently as his firm raised another $150 million in financing, data is “the new currency” for business.

Matillion, founded in the U.K., focuses on helping organizations extract, transform and migrate critical data from the disparate corporate silos in which it often resides. But that’s just one of roughly six data sub-sectors we’ve identified in this market, all of which hold the potential for rapid growth. Others include storing and managing data once it’s extracted; analyzing and modeling data to glean useful insights that could help a business make better decisions; and taking that process even further by visualizing and dashboarding that data in an easily understandable way.

Today, our company Collibra*, which focuses on data intelligence—particularly around areas like compliance—also hit a corporate milestone when it announced its latest $250 million financing. It all underscores just how detailed and granular the data market has become, and how much market value is up for grabs as companies both 1) increasingly seek out better data to make more-informed decisions, and 2) use data to improve customers’ experiences.

So what’s driving this data deluge? And how long can it continue? Our research and discussions with hundreds of companies over the last five or more years have highlighted six key factors driving the creation and growth of data and business-intelligence (BI) companies. They’ve also given us insights around how the market may shift in the coming years, so we’re sharing some predictions here too.

Literally, zettabytes of data

The first factor driving the growth of new, data-focused technology is simply the unbelievable volume of data being produced today—data that needs to go somewhere to be useful. Data is being produced from all around us whenever we interact with mobile applications, shop online or even through customer support interactions. If technology is being used, data is being created. Research firm IDC predicts that the global datasphere will grow to 143 zettabytes (for context, each zettabyte is 1 trillion gigabytes) by 2024—a 26% increase from the 45 zettabytes of data that were around in 2019.

It’s obvious, but important we say it anyway. The shift to the cloud is real!

We are still very early in the public-cloud adoption journey, as the majority of data still resides in legacy, on-premise data centers. By 2025, IDC estimates that approximately 46% of the world’s stored data will reside in public-cloud environments. This is a direct driver of the massive increase in data, and new data technologies, as the cost of compute and storage in the public cloud is much lower–there are no upfront capital-expenditure requirements, and access to data is often governed by reasonable, pay-as-you-go or consumption-based pricing. In addition, the automation that comes with the cloud allows companies to free up system engineers from worrying about customizing on-premise systems, and instead focus on other data-management priorities. The migration to cloud promotes flexibility, scalability, and cost efficiency in a way not previously possible with on-premise deployments.

Consumers need information, and they need it now.

Old-style, batch data sets historically have been used for many analytics needs; in this method, data is gathered over time prior to being analyzed. There are and will continue to be great use-cases for batch analytics, including managing payroll or customer billing. But with the advent of mobile computing and the Internet of Things, among other trends, there has been a pressing, new need for analyzing data in real time. Use cases here include fraud detection, tracking real-time ETAs on ridesharing applications, managing the temperature of your home as the day progresses, and many more. Per IDC, the market for real- time or continuous analytics is expected to grow to $4.4 billion by 2024. Aside from enabling a different set of applications, real-time analytics contributes heavily to the growth of data given the constant need for up-to-date data.

Data is messy, and it’s everywhere.

It’s clear that data is exploding in many different forms. The second-order problem is that the data lives everywhere. For example, an enterprise’s customer data may live in Salesforce, its employee data in Workday, log data in Sumo Logic*, event data in Segment—the list of potential repositories goes on and on. As a result of this massive data sprawl, a modern data-technology stack has emerged to effectively integrate and join these disparate data siloes, unify them in a central system of record and prepare the data living in them for batch or real-time analytics. These broad step-functions have created massive market opportunities, as shown below, for data companies as customers seek measures of unification.

The rise of the chief data officer (CDO)

The CDO was a position that barely existed a decade ago. Today, many organizations have one, wagering that data is a key asset that must be protected and mined correctly. The rise of the CDO has shaped an attractive go-to-market opportunity for data-infrastructure companies, as the CDO normally has a well-defined budget to buy new technologies. People in this position usually have teams staffed with data analysts, data scientists, data engineers, data architects and business analysts, all of whom also can advocate for new data technology. The rise of the CDO is a key reason many new data companies have flourished.

Removing sales friction with open-source software

Open-source software has provided an attractive way for data-infrastructure companies to get their products into the hands of engineers faster, driving the bottoms-up adoption of many data technologies. As we’ve written previously, including when we introduced our Battery Open-Source Software (BOSS) Index way back in 2017, open-source software helps many organizations struggling to manage huge volumes or structured and unstructured data as they can download and modify source code from relevant open-source projects and tailor it specifically to their own needs. This has resonated across the modern data stack through open-source platforms including Databricks*, Confluent*, dbt, Prefect and others and continues to be a preferred mode of consumption.

We don’t expect the data deluge to slow down anytime soon. In our recent, annual State of the OpenCloud presentation, we noted that there remains a 75% disruption potential for cloud, which serves as a leading indicator of the market opportunity for data management. The implication of the explosion of data is the explosion in the market opportunity for platforms sitting across the data toolkit, and we’re excited to continue investing heavily across this thesis.

This material is provided for informational purposes, and it is not, and may not be relied on in any manner as, legal, tax or investment advice or as an offer to sell or a solicitation of an offer to buy an interest in any fund or investment vehicle managed by Battery Ventures or any other Battery entity.

The information and data are as of the publication date unless otherwise noted.

Content obtained from third-party sources, although believed to be reliable, has not been independently verified as to its accuracy or completeness and cannot be guaranteed. Battery Ventures has no obligation to update, modify or amend the content of this post nor notify its readers in the event that any information, opinion, projection, forecast or estimate included, changes or subsequently becomes inaccurate.

The information above may contain projections or other forward-looking statements regarding future events or expectations. Predictions, opinions and other information discussed in this video are subject to change continually and without notice of any kind and may no longer be true after the date indicated. Battery Ventures assumes no duty to and does not undertake to update forward-looking statements.

*Denotes a Battery portfolio company. For a full list of all Battery investments, please click here.

Back To Blog

SHARE THIS ARTICLE

ARTICLE BY

Dharmesh Thakker

Dharmesh Thakker is a general partner at Battery Ventures, where he invests in early- and growth-stage companies in the cloud infrastructure, big data, security and next-generation enterprise applications markets.

Jason Mendel

Jason is a vice president who currently focuses on early-stage and growth-equity investments in areas including cloud infrastructure, big data, security and next-generation enterprise applications.

A monthly newsletter to share new ideas, insights and introductions to help entrepreneurs grow their businesses.

FOCUS AREAS

BUSINESS FUNCTIONS

Battery News & Market Trends Case Studies HR & Finance Leadership Sales & Marketing

Research

Danel Dayan, Dharmes...

OpenCloud 2023: Software’s AI-Driven Watershed Moment

For years, enterprise-software and infrastructure companies relied on the same, tried-and-true metrics to measure success…

Application Software

Adam Piasecki, Dalli...

Comply or Die: The Rise of the AI Governance Stack

Just two months after its initial release, OpenAI’s ChatGPT reached 100 million users, making it…

Infrastructure Software

Danel Dayan, Evan Wi...

When It Comes to Enterprise Tech Spending, Buyer Enthusiasm is…

Over the past year, we have navigated concerns of a potential enterprise-technology downturn, including significant…

Infrastructure Software

Barak Schoster

Refactoring or Recompiling? Streamlining the Transition Between Architectures with Wing…

The process of building software usually begins with a set of requirements; As engineers, we…

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
optimizelyRumLB	session	This cookie controls the AWSELB cookie's attributes (e.g., SameSite and Secure).
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	Persistent	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	Persistent	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	Persistent	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	Persistent	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sb	2 years	This cookie is used by Facebook to control its functionalities, collect language settings and share pages.

Cookie	Duration	Description
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
AWSELBCORS	session	This cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
pvc_visits[0]	1 day	This cookie is created by post-views-counter. This cookie is used to count the number of visits to a post. It also helps in preventing repeat views of a post by a visitor.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.

Cookie	Duration	Description
_cookie_id	session	No description available.
_scribd_session	3 years	No description available.
scribd_ubtc	100 years	No description available.
VISITOR_PRIVACY_METADATA	6 months	Description is currently not available.

SECTORS

PEOPLE

SERVICES

Literally, zettabytes of data

It’s obvious, but important we say it anyway. The shift to the cloud is real!

Consumers need information, and they need it now.

Data is messy, and it’s everywhere.

The rise of the chief data officer (CDO)

Removing sales friction with open-source software