Today Databricks*, a high-profile provider of technology fueling artificial-intelligence and data-analysis breakthroughs at big companies, announced it has raised $140 million from a group of investors led by Andreessen Horowitz, and including Battery Ventures. Powered by Battery sat down with Databricks Co-Founder and CEO Ali Ghodsi to talk about how the company got started, how it actually helps customers and what its plans are for the future.
Powered by Battery: So why don’t we start by talking about how this company got started. The core technology grew out of a research project at Berkeley, correct? How did it all come about?
Ali Ghodsi: Yes. Around 2009, all the cofounders of Databricks were at U.C. Berkeley, and we were academic researchers. We had this big insight: We realized computers are not going to get any faster. We’ve hit something called Moore’s Wall: Moore’s Law no longer applies. Basically this means computers are not going to get faster every 18 months anymore.
So that meant you can’t buy supercomputers anymore to keep up with your data-analysis demands. There’s a new computer—it’s the datacenter in the cloud. We thought this was a green-field opportunity and we were very excited to pursue it. We needed to figure out, how can we now use these hundreds of thousands of machines in the cloud to process all this data and get more insights out of it, and to do predictions on it using techniques like machine learning and other artificial intelligence approaches.
Four or five years in, around 2013, we created an Apache project called “Spark” to solve this. Spark had had some traction – I mean, for an academic project it had great success, I would say. But we decided if we wanted this technology to really take off, and if enterprises and the rest of the world were really going adopt it, there needed to be a company behind it. And that’s when we decided to start Databricks. There were six of us and we founded the company in the summer of 2013.
PBB: And what were you doing before this work at Berkeley?
AG: I grew up in Sweden. I was one of these geeks who started programming as a kid. I think I started at the age of seven. I went and did a computer science degree and, after that, a Ph.D. I got an assistant professorship in Sweden, at the university. It was around that time that I got the opportunity to work with the Berkeley team over here.
I said at first, I’ll come and visit U.C. Berkeley for one year and then go back to my professorship in Sweden. I was here for one year, and said, hey, this is really, really interesting. We’ve hit Moore’s Wall, there’s this great opportunity, and I’m not going to get another opportunity like this in my life again. So let me give it 12 more months. So I stayed one more year.
Then two years had passed, and I said, this opportunity is so great I should give it one more year. I stayed three years. So you see where this story is going.
PBB: So there were six co-founders, including you—and you never went back to Sweden.
PBB: How fast did the company grow?
AG: We’re about 220 employees now so it’s been fast growth over four years. I think we’ve grown fast partly because of our mission: At Databricks, we really want to simplify this big data problem and bring artificial intelligence to the rest of the Fortune 2000.
After four years, the Spark project now has over 300,000 meetup members worldwide. And in terms of number of people who have contributed to it, it’s the biggest (open-source) project in the big-data space.
Databricks itself has over 500 customers. Really we GA’ed our product, which is a SaaS offering in the cloud, about two-and-a-half years ago. So it’s quite a few customers in a short timeframe.
PBB: So tell me about how Databricks actually helps customers. I’ve heard people say that your technology finally makes data science real, or at least more accessible. Talk to us about that.
AG: So, it’s pretty simple. You can open any newspaper and everybody is talking about artificial intelligence breakthroughs. Everyone’s talking about these success stories. And it’s true that (AI) has great, fantastic potential. But what they don’t tell you is that there are about five to 10 companies that are really reaping those benefits. All those success stories are essentially with programs at those five to 10 companies. The rest of the world, the rest of the Fortune 2000 is essentially struggling, and they’re not seeing the same successes. So our mission is to bring that kind of technology, that kind of artificial intelligence, to the rest of the Fortune 2000. That’s really our mission.
PBB: Why are these companies struggling with that?
AG: They’re hitting basically three problems. One is it’s hard to get the different people that are involved in these projects actually working together and collaborating. So many AI problems you want to solve today require the involvement of different, distinct personas. I’ll give you an example. If you want to determine from an X-ray whether someone has a tumor, and you want to do that automatically with artificial intelligence, you probably need doctors involved to help you build that application. It can’t just be computer scientists with Ph.Ds. So it involves medical doctors, but also the computer scientists that are building the software, and the data scientists who can do machine learning and AI. But you also need data engineers who can get the data into the systems. And what we’re seeing in many of these big companies, it’s just hard to get these different teams to collaborate, work together, and share results. Politics often gets in the way. I call this the people problem.
The second problem we’re seeing is the process problem. This essentially means that there are a lot of things you have to get going to get machine learning or AI to work end-to-end. You have to get the data into the system, you have to clean the data, then you have to build predictive models–all the way to doing the predictions. And today, you have to stitch together lots of different software to make this work. There is no one, single piece of software you can use.
The third problem that you have to solve is the infrastructure problem. So how do you get the software loaded on those thousands of machines you’re using, and manage them, and make sure they’re secure? Companies have to hire lots of DevOps people to do that.
These are three distinct challenges that are creating hurdles. Because of this, there’s essentially a 1% problem. Meaning, only about 1% of companies are succeeding with AI. There’s this wide gap where the rest, the 99%, are struggling with these three problems.
PBB: So just so I understand . . . the five to 10 companies you talked about that are doing well with AI, these are mainly the big tech companies, correct?
AG: Yes, the big tech companies with armies of data scientists and vast amounts of data from the Internet are the only ones doing all these great things.
PBB: Can you talk a little more about the specific types of AI projects people are using Databricks’ software for? Or how specific industry sectors are using it?
AG: So, for instance, medical is one. The medical space is rich and I could talk endlessly about it. But that’s just one sector. If you go into the financial sector, there are a lot of problems too. Often those challenges tend to be centered around various types of anomalies you want to automatically detect. So for instance, a credit card just got swiped–was that a fraudulent charge or not? An attack just came through the network, or someone tried to enter the building, or the bank system. Is it a hacker? Or, here are billions and billions of transactions on some stock exchange. Is there any insider trading or collusion going on?
Then there’s of course industrial-IT. These are companies that have a lot of industrial equipment. And it turns out that in the last decade or so, they’ve been putting a lot of sensors out there on their equipment, and they’re collecting massive amounts of data. The equipment could be anything from jet turbines to drilling equipment, you name it. Now these sensors are reading all this data, and companies want to be able to make predictions based on that data. Like, is this wind turbine going to fail? If it is, I’d like to know, both for safety reasons, but also I could also replace some parts in advance, and we could avoid a failure.
These data sets are massive. They’re always in the petabyte scale.
PBB: Going back to the three main problems you outlined that companies generally face when trying to analyze large data sets, and implement AI–how does Databricks’ solution solve these problems?
AG: And to address the first problem, the people problem, we have provided a unified collaborative workspace that’s part of our cloud platform. This enables different personas in the organization to share results with each other. They can collaborate, come and look at the predictions, the data sets, the insights, and–at the same time–do that in a secure way, so the wrong data or the wrong results don’t get out to the wrong people in the organization. This is the first thing we built in the product, and when we talk to our customers, this is one of their favorite things that usually comes up. How much easier and simpler this makes it to share these results, to get the insights from the platform. The key word here is collaboration.
To address the process problem, this is really Spark itself and the platform we built around it. It unifies the different aspects of AI you would like to do. So instead of having lots of different tools, the platform we’ve built, using one single API and one single framework, allows you to do anything from getting access to different data sources, to ETL’ing the data (so extracting, transforming and loading that data), to building models around it, to even doing the predictions in production real-time for you. So this is really the key technical innovation that Databricks started.
This unification just makes it much simpler to do this end-to-end. Rather than having to say, OK, we’re going to stitch together software that comes from this vendor, and we’re going to use this other open-source thing that comes from over there, and try to glue them all together.
The final challenge is the infrastructure problem. To address this, we’ve automated all of this in the cloud. So rather than having people figure out what hardware they need, and get the software running on a particular hardware, and managing that, we say, you don’t need to do that. You don’t need to hire lots and lots of DevOps people to do this for you. We’ve automated this for you in the cloud. And because it’s automated for you in the cloud, you can just use it as software as a service.
Together our solution is called Unified Analytics Platform. It unifies end-to-end the analytics you need in your organization. It unifies the different people that need to work together. It unifies the different aspects of the process you need to get AI working. And it unifies the infrastructure with the software and the solutions.
PBB: Who, specifically, within your customers’ organizations typically uses the product? Are these data scientists, or other people in other types of functional roles?
AG: The primary person is data scientist. But recently we’ve also begun seeing more and more data engineers; I would say there’s maybe a 60/40 split for us, 60 percent data scientists, 40 percent data engineers.
Then a lot of the stuff they create is shared and collaborated on with other people in the organization. Those could be the MDs, or the other people who are looking at the results and insights, and commenting on them and asking questions. It might be an engineer at an IoT company, it might be a doctor at a healthcare organization. These are the domain experts.
PBB: Can you talk about any specific customers who are using Databricks in interesting ways? Names we might know?
AG: One of the big ones is Shell. They have a lot of equipment, and they have a lot of sensor data. Another one I would say is Salesforce. As you know they’re building Salesforce Einstein. Alexis Roos from Salesforce gave a talk about this at the recent Spark Summit, where he showed how Databricks is used to build Salesforces Inbox, which uses state-of-the-art AI techniques to figure out, based on your mailbox, information about meetings, customers, deals etc. Really innovating with all the massive data everyone already has in their inbox.
There’s also another use case that applies to every company that uses Databricks in almost any industry. Everyone has lots and lots of customer data, and they would love to use predictive mechanisms and AI to figure out which of those customers are potentially churning and leaving them. If you could figure this out in advance, this information is extremely valuable for companies because they can reach out to those customers and show them extra love, and possibly keep them. Gaming company Riot Games is an interesting, special case of that. They can actually track your behavior in their games when you’re playing. Based on how you’re playing the game in the first 30 seconds, they can predict, with Databricks, if things are not going well and a user is about to leave the game. Then they can do things to fix that.
PBB: Fascinating. I wanted to take a step back now and ask more about the underlying technology of Databricks. More broadly, in terms of helping people realize the power and data of AI, wasn’t Hadoop, another big-data technology, supposed to do this too?
AG: So Hadoop is first-generation technology. The Databricks co-founders were actually researchers working on the Hadoop project at U.C. Berkeley. So we have fond memories of those years. Databricks is the evolution of that. It’s the next generation technology. It can actually happily co-exist with Hadoop and, in many ways, actually be synergistic. But it also improves on it, in the sense that it can be orders of magnitude faster. It’s better at doing predictions with AI in particular, because Hadoop wasn’t really predictive technology that you could do AI with. Finally it’s much, much easier to use. It’s much more accessible to broader audiences than Hadoop. Hadoop is pretty complex to use.
PBB: How does being an open-source technology make your product better?
AG: That’s a great question. I think open source is key for large enterprises who are tired of being locked into proprietary software. I think open APIs are going to be a necessity in the future if you want to have developer traction. What I mean by that is, you can’t come up with a new API and then hope millions of developers around the world are going to build on your API when that software itself is proprietary. You’re not going to get that type of traction. That’s my conviction. For that reason, it’s crucial that the APIs are all open source.
However, Databricks’ business model is open core in the sense that while our APIs are open, and all our libraries are open, there are a lot of things that are proprietary that have to do with performance, reliability and security. But those don’t lock a customer in. So in the future, if a customer doesn’t like the performance we’re providing, or the security we’re providing, or the reliability we’re providing, they can write on another Spark-based open-sourced platform because all those APIs are open. That’s great for customers because they know they’re not locking themselves into Databricks.
PBB: Congratulations on your funding round being announced today. $140 million is a lot of money! What will you use it for?
AG: We’re going to use this funding to expand internationally, but also pursue new product innovations and solutions for new industries. There’s so much demand for the product that we’re seeing, we want to accelerate that and put it in the hands of more customers in more markets.
PBB: What’s next for Databricks? Where do you see the company in five years?
AG: I see us moving more and more up the stack, and enabling even more collaboration, and more democratization of AI for the enterprise. So that companies can more easily take this journey of becoming more data-driven and doing more and more predictions around data. There is a lot of product development that is happening at Databricks around that—things that make the product even easier to use and more accessible to more people, and enable the results to be shared even more widely.
We also want to help particular verticals be more successful. Because as you really accelerate these companies’ adoption of AI, you have to in some sense get closer to the domain they’re actually active in. So for us, that means focusing on helping companies be even more successful in healthcare and life sciences, financial services, media and entertainment, government and probably a few others over time.
PBB: Sounds great. Is there anything else we haven’t talked about?
AG: We’re just super excited to bring the product to more markets and focus on these verticals, and put the Databricks Unified Analytics Platform in the hands of as many companies as possible. And of course we’re very happy to be partnering with Battery Ventures on this journey.
*For a full list of all Battery investments and exits, please click here.