Understanding Blockchain Data: A Dialogue with Ganesh Swami Artwork

Decentralised.co

Decentralised.co is a research publication exploring the trends, numbers and firms that matter in Web3. The newsletter is operated by former head of venture investments at a hedge fund, an early BD lead at Polygon and a researcher from the Block. Sign up for the newsletter for unique perspectives on how the internet is evolving, block by block.

All Episodes

Decentralised.co

Understanding Blockchain Data: A Dialogue with Ganesh Swami

October 19, 2023 • Ganesh Swami • Season 1 • Episode 1

0:00 | 50:59

Send a text

Ever wondered how the data on apps like Etherscan or Metamask gets there? There is a complex process behind parsing, interpreting and relaying that data in an API for third-party apps to be able to visualise it. Ganesh Swami from Covalent, joins us in this issue to explain what that process looks like. Listen closely as Ganesh deciphers the intricate world of indexing and its primordial role in dealing with public blockchains.

Ganesh has been scaling Covalent - an index provider, to over 100 blockchains since 2019. He has seen the industry through multiple boom and bust cycles. His firm has also been a crucial player in enabling some of the most commonly used apps in the industry. He joins us in this episode to explain how on-chain data firms have evolved, the indexer's GTM strategies and what lies ahead for the sector.

Speaker 1: 0:01

Hi, this is the DCO podcast and I am your host, saurabh. I am joined by my co-host, siddharth Jain, who is the co-founder of DCO and he is also an angel investor. Today, we are thrilled to host Ganesh Swami, the co-founder of Covalent. Ganesh brings a wealth of experience in crypto, data and infrastructure.

Speaker 1: 0:22

We often take things like data and infrastructure for granted, but a lot happens behind the scenes to ensure that Web3 applications work smoothly. To give you an example, we know that composability between different applications is one of the promises of blockchains like Ethereum. It necessarily means that these applications need to constantly read data from each other. It is tempting to think that most blockchains, at least the ones that are important right now, are public, so reading the data should be trivial, right, but the reality is far from it. Extracting data from blockchains, normalizing it and delivering it requires sophisticated machinery. Ganesh will help us understand why specialized data providers are needed, how the data provider landscape has evolved over the years and where we are headed. Before moving on, a caveat for all of you DCO members or guests appearing on the show may own assets that will be discussed during the show. Nothing we say here is a financial advice. With that, I'll just try and shut up and let smarter people talk. Let's start with the brief intro of Ganesh and Covalent. Over to you, ganesh.

Speaker 2: 1:36

Thank you, sarab, sarab and Sid. What a great pleasure to be here amongst legends like yourself, very glad to share my perspective on this space. So, by the way of a quick intro, I'm Ganesh Swamy. I'm one of the co-founders and currently the CEO of Covalent. Covalent is a data infrastructure indexer, so what we do is we scrape on chain data and offer that data in a normalized, unified manner to a variety of different use cases. Today we indexed over 100 blockchains by far the biggest in the blockchain space and have over 3,000 applications that rely on the Covalent API for their data needs. So we're mission critical, you could say, for a lot of these applications, because without the on-chain data, the application is not usable. So that's a quick intro into Covalent. The word Covalent comes from Covalent bonds in chemistry, and I spent the first half of my career working on cancer research, and so that's a homage to my prior life.

Speaker 3: 2:41

I think both me and Sarab, being engineers, can totally relate to that. Right, we spent the better half of our lives preparing for engineering entrance examinations and chemistry was a big part of it all the way and leverage it to the tune of important milestones like cancer research. But, yeah, happy to have you for the first edition of the DECO podcast, ganesh. It's a pleasure to be associated with this and can't wait to kind of deep dive into the learnings that you've had while running Covalent for the past three to four years. Right, I still remember meeting you in person during each demo. I think back then, mapping was just taking off and Covalent was getting started as well. Right, it's been a long time since then. I would love to kind of take a stab at how your journey has evolved since then and for the better part of our readers right, who probably are not aware of what indexed blockchains are actually useful for, I think we could probably start off from there to kind of shed some light.

Speaker 2: 3:36

Absolutely so why do you need a data middleware or why do you need an indexed data service provider? So, fundamentally, blockchains are public data, which means it's a source of truth, source of settlement and so on. But these blockchains are what is known as operational machines, which means that they're tuned and optimized for execution of these transactions and settlement of these transactions. They're not really meant for pulling out the data and analyzing the data. They're not even usable for historic data lookup. For that you need another kind of system before the data is usable. So that's really where Covalent comes in.

Speaker 2: 4:19

And in the traditional space, when the web do space, what you have is you have transaction databases that execute transactions so these are known as OLTP trans-databases and then you have analytical databases, which are known as OLAP databases, which are meant for compliance and reporting and auditing purposes.

Speaker 2: 4:38

So they're completely different functions, completely different departments, completely different budgets within the enterprise. So it's the same kind of deal that's happening in the web do space, where these blockchains are more and more and more optimized for right throughput, which is how many transactions per second they can settle. That's why all of the layer 2, the side chains, the rollups, are all optimized along that dimension. But one thing that is often forgotten is that the data is only useful if it can be written or read back out. So Covalent is optimizing for something known as read scalability, which is how effectively, how efficiently, you can read the data out of the blockchain with the minimal loss of trust or the minimum increased latency. So that's Covalent in a nutshell, and there's a whole category of middleware application centralized, decentralized you know every flavor that you can imagine in this space serving a variety of different use cases and different customer target personas. I think you guys are muted, hey.

Speaker 1: 5:50

Yeah, yeah, yeah, yeah, yeah. I think there's a lot to unpack here. Can we just go back and start with what really indexing is, and why do we need an indexer in the first place as far as public blockchains are concerned?

Speaker 2: 6:07

Yeah, it's a great place to start. So these blockchains are optimized for writing data, which is the execution, which is the consensus, which is things of that nature, and this is why every single layer, one layer, two side chain roll up. The claim to fame is the number of transactions per second they can process. That's really what their benchmark is. What is not often envisioned or brought to the surface is a read scalability of these blockchains, because that's not been the focus of the industry. So what covalent solves is something known as a read scalability bottleneck of these blockchains, which is how effectively and how fast and how transparently you can extract data from the blockchain and read it for any downstream application. This could be for compliance reasons, for tax reasons, for building another system of record, any downstream application. So the two things go hand in hand. In an enterprise setting, they will never deploy an application unless you can read the data out, and so that's really the target persona or the target use cases that we're trying to address with indexed data.

Speaker 1: 7:29

Hmm, can you give us a short example of a dataset and how it is currently being used?

Speaker 2: 7:35

So one advantage of the blockchain data is that it's very diverse in its use cases. So it could be something as DeFi data, it could be NFT data, it could be like governance data. It's just a generalized data store that you could write transactions to. A good example here would be something like the GNOSaSafe MultiSig. So a MultiSig is a multi-signature wallet. So you need two people to sign your wallet before the transaction proceeds. So two out of three, three out of five that's commonly what is used there. So you're not going to have your entire treasury's assets under one wallet, for example, because that wallet could be stolen, that wallet could be hacked and it's just too much risk of rug pull. So generally, you want to have multiple signatories before a transaction proceeds. Now, if you go and look on a Block Explorer like EtherScan, it has no context on what this transaction really means because it's just like a bunch of hex numbers, a bunch of like contract addresses. It's very opaque, so the blockchains are not really meant for synthesizing or reading this data. So this is an example where indexed blockchain data can provide semantic meaning to that GNOSaSafe transaction. It can tell you that this person initiated a safe transaction and for this transaction to be successful it requires two out of three signatures. That's one example.

Speaker 2: 9:03

Another example could be like an NFT marketplace, where you know there's hundreds of these marketplaces on the blockchain across dozens and dozens of blockchains, but the data is very diverse in all of the different marketplaces.

Speaker 2: 9:17

There's no unified stamina and typically what you want from an NFT transaction it's like a sale transaction would be that this collection, this token, was sold for this price and this asset was used to make that purchase price, because in a lot of the marketplaces you can pay with Stamocoin, you can pay with ETH, you can pay with any other kind of native gas token, and then this is the historical fiat value of this transaction at that time. And so now if you had that data in a unified kind of structured format, then you can do all kinds of analysis on figuring out what are the total trades in the marketplace, which market has the biggest liquidity, what is actually corrupted with things like wash trades, all kinds of use cases. But the first blocker to doing any useful downstream analysis is getting that data. So that's just two examples. Within our customer base on the use cases, we have like thousands of different applications with thousands of different use cases, so it's just very diverse. When it comes to data on the blockchain, that's amazing.

Speaker 1: 10:26

So for context, I'm just going to sort of summarize what you said here. So blockchains are optimized for writing data, not reading it. So essentially, you need infrastructure which optimizes for reading data on blockchains as well. So the rate at which the data is getting stored into these blocks we would require something which matches that rate for storing that data in tables or put context around that data, so that downstream users, such as traders or investors or even DeFi applications, can make intelligent use of the data. Is that an okay summary here?

Speaker 2: 11:10

That is correct, and so the one thing I would add here is that how that data gets into that table is where all the trade-offs come in for the different use cases, whether you're trading off security, you're trading off latency, you're trading off speed, and that's really where the different use cases and the different vendors come in.

Speaker 1: 11:33

That's super interesting. So can you give me an example of what covalent is trying to maximize, whereas some other data provider which would be focusing on some other vertical that you are not focusing on?

Speaker 2: 11:47

Yeah, so outside of crypto and covalent, in the broader data ecosystem, which is a multi-hundred billion dollar industry in the web 2 world, there's typically three dimensions that these solutions cater to or optimize for, and you always pick two on three. Generally, that's how there's, like a tri-lemma. So the three dimensions are breadth, depth and latency. So these are the three dimensions. So breadth is like let's take the case of crypto data and covalent. Breadth is how many blockchains can you support? So in the case of covalent, we support a hundred plus blockchains. So all the major ones you can think of all the roll-ups, all the sidechains, all the old L ones, all of the L2s and L3s, so that we support a hundred plus blockchains. So that's one of the dimensions. The second dimension is the depth, which is that not only are you getting surface level blockchain data, but you're also annotating that with a lot of external semantic information. So this could be things like NFT assets. So NFT assets are actually stored off-chain. It's too expensive to store data on the blockchain, so you can store transaction data. The actual NFT assets, like the image or a video, should be stored off-chain. So extracting that data out, getting things like prices the most price discovery of crypto assets happened on centralized exchanges. So getting the price feed from the centralized exchange and then matching that with the trades that happen on-chain, that's an example of depth. So you want to go deep into decoding these tables, not just show superficial data like transactions and blocks. And then a third is a latency. So latency is how close to the tip of the blockchain is your data feed. So this again segments the market. So if you're a trader and you're looking for ARB opportunities, you're looking for things like flash bots, you're looking for things like access to the mempool, then you need super low latency. You need to have a complete view of the blockchain even before it hits the blockchain. So that's an example of some vendors that optimize for that use case.

Speaker 2: 14:09

Covalent does not optimize for that use case. Covalent optimizes for record keeping, historical data, access, patterns. It's meant for machine learning, for AI, for taxation, all those things, all those kinds of use cases. So in the case of Covalent, we're always two blocks behind the tip of the blockchain. So that's one example.

Speaker 2: 14:28

And then the fourth dimension would be something like security. So Covalent today is the only data provider, to my knowledge, that provides a cryptographic proof of all of the transformations that have undergone to that raw blockchain data. So everything from the extraction to the cleansing to the normalization, every step along the way has a cryptographic proof. So if there's any kind of corruption, either maliciously or accidentally, somewhere in your data pipeline suppose in your Excel sheet or something then you have a complete data provenance and you can trace back to the source of truth. So you're not relying on attestations like the ISO attestation or a SOC2 kind of compliance test. You're actually relying on cryptography to ensure that the data is secure. So that's generally how the data or the data vendors are segmented in the market and different vendors they pick different access to optimize for and therefore they have a different set of customers and Covalent has picked this set of optimizations and we have a different kind of like preview on the market.

Speaker 3: 15:40

Paneesh, the security bit is very interesting. I think you mentioned you keep cryptographic proofs of all the data transformations and I expect that to be very intensive in terms of storage. So is that understanding correct? Or, to put it in another way, does that cost of this record keeping go up, or probably latency increases because of any of these design choices?

Speaker 2: 16:03

It's a great question and it's always a trade off. So for things like getting balances within your game and showing you an NFT trade, for example, those do not require the high, intense cost of these cryptoproofs. And so there you probably want something that's likely, something like taxes, something like a system of record, for some ID requires a high like, let's say, proof count and high like core rate. So there it's a lot more expensive to go through that system. So Covalent helps you pick the number of like confirmations, essentially before you can pull the data. So not all reads of the data on the blockchain require that, and this mirrors how you write to the blockchain.

Speaker 2: 16:53

If you're having a game like a candy crush that is storing state on chain, you're not going to store that on an L1 like Ethereum. It's just too expensive. I would say even things like polygon, which is a side chain, it's too expensive. So you need something almost like a blockchain and then you need like rollups are basically block chain, basically databases at this point, and then you just roll up that state and then settle that on an on a L2 or a side shape. So you're exactly right. So there's a wide spectrum of use cases that require different security parameters.

Speaker 1: 17:30

Amazing. I'm just curious, like if you look at the normal retail user and you talk of crypto data, I think couple of names that pop up in their head is probably chain link and graph. What would you say is the difference between something like a graph, protocol and covalent?

Speaker 2: 17:51

Yeah, so there's two primary ways of extracting data from the blockchain and the two approaches I would say the philosophical approaches are covalent and the graph so these are competing and the directionally opposite approaches to the same problem and I think you know, in the solution space there are multiple exploratory paths and it so happens that one was the other. So let me dig into what these two approaches mean. So the underlying paradigm here is an acronym called ETL, so that stands for extract, transform, load, and how covalent and how the graph does it is diametrically opposite. So the graph follows a standard approach called ETL, so extract, transform, load while covalent addresses this via ELT extract, load and transform. So the acronym is jumbled up. So what this means in reality is that in the case of the graph, the transformation is written as a subgraph, so you need to recruit a developer to go out there and write a subgraph, which I believe today is written in an assembly script, so it's a bespoke language for writing these subgraphs. And once you have that subgraph, what happens is that when the subgraph is executed, the subgraph extracts data from the blockchain, transforms it during the subgraph, which is what the developer writes, is writing that transformation, and then loads it into a database that you can then query. So that's the graph's approach.

Speaker 2: 19:26

Covalence approach is actually the opposite. So what we do is we extract all data from the blockchain, from the Genesis block, every wallet balance, every transaction, every block, every transaction receipt, everything. Everything is scraped out, and this is hundreds of billions of rows and then load it directly into a data store. So that's E and then L, and then the transformation, the T in ELT, is done by the developer at query time. So think of this as an in situ ad hoc subgraph that is written by the developer when they query. So if they want token balances, that is, they have to craft the query in a special way. If they want to figure out the cost basis, if they want to figure out the NFT floor price, if they want to do the historical owners of an entry, all of this ingenuity in the query is crafted at query time. So there's no need to write a subgraph for any reason. So where this actually manifests itself in practice is that in the case of the graph, if there's any kind of downstream or any kind of upstream changes, what you have to do is that the developer has to go out there and rewrite the graph and re-index the data before it's usable. So in the case of Neuroswap, for example, it takes about seven days for this subgraph to re-index all the data. So, even small tweaks, you have to wait for it to re-index.

Speaker 2: 20:59

In the case of Covalent, because all the data is already there in your data store, then it's just a matter of crafting the query and iterating on the query until you get the results that you want. You don't have to change the indexing code. So in that aspect, covalent is a no-code kind of solution, because you don't have to write any indexing code. Everything is at query time. So that's the fundamental difference. This has a lot of downstream artifacts, so one example is that Covalent goes through the tedious effort of normalizing all of the blockchains, which means all the 100 blockchains that we've indexed today have the exact same schema. So the request and the response is exactly the same. So if a developer has to go and integrates with Polygon, for example, and then they want to move to, let's say, avalanche or Phantom or Optimism or one of the OP stacks or Arbitrum or even the Ethereum L1, they only have to change one character, which is the chain name, and everything works as it. So that's fundamentally the biggest benefit of Covalent's approach.

Speaker 2: 22:12

In the case of the subgraph, what happens is that there's a subgraph for Aave, for example, on Avalanche. There's a subgraph for Aave on Ethereum. There's a subgraph for Aave on, let's say, phantom, and on Optimism and Polygon. These are all completely different subgraphs. They all have their own schema, they all have their own data standards, because the underlying blockchains are very different. So what the developer has to do is, for every chain that they start to integrate, they have to go back and understand the nuances of the blockchain, and they have to understand how things like gas fees work, and they have to understand how the finality works and all of those nuances, which, in the case of Covalent, is quite transparent. So this is fundamentally the key difference between Covalent and the graph. So in the Covalent approach, you just have this giant database with all on-chain data, the entire world's blockchain data that is indexed and normalized, ready for any downstream application. In fact, a lot of these use cases you don't even know off today, because we just automatically index all data on the blockchain.

Speaker 1: 23:19

And how is the end user querying this data? I mean, like, do they need to know SQL or is it available in dashboards or these are APIs? How does that work?

Speaker 2: 23:30

So in the case of both the graph and Covalent, we started out as an API service. So in the case of the graph, you have a GraphQL API. In the case of Covalent, you have a REST API. But the underlying data model is so useful that we've also added a SQL kind of interface so you can type SQL and get charts. So you offer that as well. And data is data, whether you want to consume it in spreadsheets or as charts or through a developer dashboard, through REST. It's really that often up to the delicate.

Speaker 1: 24:03

Amazing and while answering, I think, the previous question, you mentioned that different blockchains have their nuances, so I'm quite interested in learning that. Which are the blockchains that are like very difficult to index data, and like what sort of a machinery goes into normalizing this data into tables that look sort of similar to? Because every blockchain follows different standard, right? I mean, aside from even compatible blockchains. If we are talking of Solana versus Ethereum, I guess that the way both blockchains store data is super. I mean, it's different, right? So how are you ensuring that the end format remains the same?

Speaker 2: 24:44

So, one caveat is that out of the 110 blockchains that we've indexed, probably 108 blockchains are EVM, compatible blockchains, so the world is overwhelmingly EVM today. Those are the blockchains that have any kind of traction today, and the only caveat there would be Bitcoin and Solana. To my knowledge I know there's always a long tail but primarily most of the blockchains we index are EVM based on the smart contact platforms. So that's like one overarching theme that we have at Covalent. So when it comes to actual EVM versus non-EVM, the execution models are completely different. They're not even slightly compatible completely different. So it's not really possible to have a normalized schema or a normalized approach to things like Solana transactions versus EVM transactions. Fundamentally that's not a thing. So there's no vendor out there, including Covalent, that can normalize both of these data sources. So we tried some attempts to do that. We have some success there, but these abstractions are always leaky and it leads to disappointment on the end-users aspect. We always end up missing something. So we don't do that anymore. On the EVM site there are lots of nuances. I'll give you a couple of examples off the top of my head. The first example would be something like gas. So in an optimistic roll-up like Boba. Boba is a roll-up that is attached to Ethereum and Avalanche and Moonbeam and Lens Chain. It's just a roll-up because a lot of these Avalanche and Moonbeam and BNB Chain does not have a roll-up ecosystem. So what Boba does is they attach this roll-up and so there you can pay gas in both the native token, which should be something like AVAX on Avalanche, or something like BNB and the BNB Chain, as well as Boba, so you can pay in either R and so there the toning gas, which the transaction fees are tax deductible. That's an example. You would look at the deduct from the gates. So there you want to sum up both of the values and both of their historic Fiat values to find out the actual cost of the transaction. That's one example. Another example is that we're seeing a lot of these EVM on non-EVM based chains. So a good example there would be on the Cosmos ecosystem. So you have M-MOS, you have Kronos, you have Beta Chain, which are all EVM chains that settle on the Cosmos. So there there's some nuances with how transactions work. There are some nuances with how the data is propagated across. So we normalize all of those differences away. Another example would be something like what's another example.

Speaker 2: 27:56

We also do things like map stable coins. So the stable coin deployments. Let's say you have USDC. There are two ways to get a stable coin on a blockchain. The most natural way is to do a native deployment, but these companies have a pretty long pipeline so it's going to take you six months to a year before you get a stable coin onto your blockchain. So, as a short-term fix, what you do is you do a wrapped stable coin through some kind of bridge, so we go out there and map out all of the wrapped assets so that you know that certain kinds of contracts are stable coin or stable coin like assets. So that's another example.

Speaker 2: 28:37

Another example would be masking the differences with the RPC. So one of the key differences behind these blockchains are that these nodes are all very different. They're like a very diverse node implementation. So the case of something like Ronin, which is the Axi-Infinite chain, the node is not even open source, so they ship it as a Docker container and you don't have access to the internal guts of the blockchain. So we work with that kind of system. For example, in a case of something like Arbitrum, your node is in Nodejs. In the case of something like Avalanche, the node is something like get with a completely different consensus engine. So fundamentally all of these blockchains are all very, very different. So it takes a lot of work to go out there and understand the nuances and then mask over these differences so that it makes the developers life that much easier.

Speaker 1: 29:34

I'm just wondering once you decide, okay, this is the chain for which there is some demand among users, when you decide that, okay, I want to index this chain, how long does it take for the process? Typically, I mean, I understand this would be a range and you can't give an absolute number, but I'm still curious as to how much time it takes.

Speaker 2: 29:55

Yeah, so there's two kinds of chains that you have to consider. One are these bespoke L1, l2s rollups, and the other things are the app chains. So with these bespoke chains, it typically takes us anywhere from a day to maybe like a week to index a chain. So this is to understand all the differences. We have the whole pre-flight checklist we have a set of, because we've done this like hundreds of times, so we know more or less how things break, how the blockchain behaves or the node behaves. Where's the data, is it on chain, is it off chain? Where's the availability of data, and so on. So typically it's less than a week, so it's not that heavy of a lift.

Speaker 2: 30:40

Now, what is interesting now is that you have these app chain ecosystems. So this is an example of like Polygon has the supernets and Avalanche has the subnets and, in Optimism, you have the OP stack. So there what is happening is you just take a off-the-shelf kind of component and then you just deploy a brand new chain, so think of it something like a Postgres database, and then you just launch a Postgres database instance on Amazon or Google Cloud or Azure. So this is a new architecture that's coming out. So if you know how to work with the OP stack or with a subnet, like a supernet or a subnet, then any new instance of that application is exactly the same.

Speaker 2: 31:24

There's no heavy integration work. In that case. The fastest we've done is integrate with a new essentially app chain in less than an hour, because you just have to point into a new set of endpoints and then the data just starts flowing, and so we've really optimized this to go to market really, really fast. A big punch is that by the end of next year, end of 2024, there'll be over 1,000 chains that are live out there for all kinds of use cases, and we want to scale to that world, and so that's why we've brought it down to almost like a tone key solution where we just give it a new set of endpoints and then it's ready to go.

Speaker 3: 32:03

You mentioned the go-to-market bit, right. One thing that I've been constantly impressed with is how the community kind of became an integral part of actually adopting the Covalent APIs and the whole product suite, right. One of these things that I actually really wanted to learn more about was the Alchemist program that you guys launched back in the day and how it helped you kind of gain traction, because this is one of the things that a lot of early stage startups struggled with in order to kind of get the community excited about using their products or probably just try to get them to show the value of the product in the first place. Right, because a lot of builders can build great products, but it also goes a long way in showing value to make sure that they actually get used.

Speaker 2: 32:47

It's a great question, sid, so let me expand upon this. Fundamentally, what entrepreneurs and builders need to think is a go-to-market that is, alongside product dev, because if you don't take your product to market, you know the start of Graveyard is with hundreds of projects out there that are super exciting, well-designed, well-functioning, but were never able to get to in the hands of customers or users or investors or whatever it is. So GTM is a critical part of building an organization. So, with that context, there's this phrase that first-time entrepreneurs focus on the product and second-time entrepreneurs focus on distribution, and so we've always had distribution in our DNA, from day one while building.

Speaker 2: 33:37

So when we started out, we started as a like, a self-serve kind of product. You go in, put in your credit card, sign up for like the product for 50 bucks and you're ready to go. And this is way back. This is actually how Joel got in touch with us, like a long time ago Now. The problem with that is the market was so small. I think in our first six months we made $200, like $200 a month, right, so it wasn't even enough to pay for our infra costs. Forget salaries and forget other kinds of costs, and the biggest thing is the opportunity cost. So what was clear is the bottoms up adoption didn't work. So what we did is basically 1,000 X the price, literally just 1,000 X the price. Just added 3 zeros to all the prices and then went top down. So go out and hunt for prospects and that's really how we make covalent profitable Exact same product, just change the go-to-market approach. Our first customer was actually Consensus, which is the world's biggest blockchain company, so that approach really worked for us. Now, what is fundamental about these crypto Web3 products is the community because of the token angle, because of a variety of aspects, and it's this shared ownership, this shared future that we're all building together. That's super exciting.

Speaker 2: 34:54

So the Alchemist program is an investment into the community kind of program. So the initial version of Alchemist program, which was run by my colleague, jackie, is a way to basically educate and level up community members and become data literate. They're not very familiar with data products, so we teach them not just covalent versus the graph, versus chain link. We would also teach them things like how to do pivot tables, how to create models and so on. So that's the first data Alchemist program, and so we've done a couple of iterations of that.

Speaker 2: 35:31

So last year we ran the Data Alchemist program, which is a purpose-built program where we teach you SQL, and what is also crazy about this community program is that we pay you money to go through the program. So the Data Alchemist, for example, paid you $2,000 to complete the course, and so there were actually people who took a month off from work, and the course work is quite intense. We are going to be running the next Data Alchemist in probably early June, mid June, and so the course work is pretty intense. There's usually four or five modules and you have like homework assignments and it's pretty intense. It requires 20, 25 hours a week of commitment.

Speaker 2: 36:13

I think with the Data Alchemist program, over 1,000 people signed up, 250 people who actually entered the program and 125 people actually graduated, and about half of them actually entered the Web3 space.

Speaker 2: 36:28

So it's incredible to see how effective these programs are In a long rounded way. What ends up happening is that because we train all of these guys on the covalent approach and the covalent stack, whatever company they go to, they take covalent with them, so it's almost like a deferred customer acquisition channel, which, if your time span is like three to five years, it ends up being incredible Because you can run this program every six months. And every six months if you have like 100 to 100 people who are graduating with your stack and are in love with your brand because they all want to move to Web3. They just don't know how to move into Web3. But you've given them all the crucial skills and so now you know it's crazy, but a lot of these guys actually worked for our competitors too. Today they got to define a job with our competitor and I'm OK with that, because at this point it's to level up the entire space and it's an investment into the space. So that's really one aspect of our go-to-market approach with the Camiri.

Speaker 3: 37:36

I think I can totally relate with that, because we followed a pretty similar approach at MATIC when we were trying to get a lot of early stage developers to come and start building on top of the early beta test then, and one of these strategies was to go to these hackathons and give away bounties and even help some of these developers come up with ideas and then get them to build on top of MATIC. I think we never taught them coding, so in this case you guys have gone the extra mile and now you're teaching them SQL. I can attest to that that can be really difficult, because in the past life I used to be a data analyst at a company called Muse Sigma, so I know how that goes. But I think, as this community has evolved, I think what would have been your key learnings Like in terms of our community? You can't have all of these cards falling in the right places. So if you were to go back and relaunch COVID, was there anything that you would do differently?

Speaker 2: 38:30

I think the market timing is something that I got wrong and, quite frankly, because we were so early to the market, we were more or less floundering for the first two years and it required a trip to Mount Everest where I had to introspect and rethink my life and how I was spending my time to shake me out of that funk. So I would say market timing is extremely hard. If I had started covalent two years later I think it's difficult In hindsight it's 2020, but I would say market timing I would like now. If I would start any kind of project or any kind of product, the first thing I would do is I would do market research, understand what the TAM is, go talk to some prospective customers, talk to competitors. Customers talk to competitors. They've been right. So now we have a very open kind of like view. We talk to all of our competitors, like on a quarterly basis, so very tight relationship with all of them. So I would say I would do more of a research.

Speaker 2: 39:39

The way covalent started is that I built covalent at a hackathon to scratch my own itch and end up winning that hackathon and just started a company right out of that, which is I'm not a company the place where right now is in a very good place.

Speaker 2: 39:53

But if I were to go back and change one thing, it would be the timing into the market and I would not enter the market unless I have a clear path to make either a million dollars in revenue to get any kind of traction or any other kind of metric for product market fit, because at the end of the day, the entrepreneur's time and opportunity cost is what is limited. So I think that's the only thing that would come to mind. You know, everyone makes mistakes. We made lots of mistakes and you just learn, and if you don't make any mistakes at all, you're not taking enough risks. So I think I'm comfortable with that aspect. 50% of the programs we launched at Covalent failed and I think that's a very positive thing. So we have like a big shown element in the approach. But that's how you keep the ideas fresh. But if one thing I would change, it would market timing.

Speaker 1: 40:45

That's a good segue to. I think you mentioned about competition, right, and I wanted to also ask you about so there are two worlds of data, right, for the crypto. One is on-chain and one is off-chain. So example of off-chain would be what happened at Binance, right, what were the liquidations during the day and so on. So something that would an investor or a trader would be keen on getting their hands on. So I think providers like Ico, amber data they started off with this off-chain data and now they are gradually getting into on-chain data as well. I just wanted to understand, I mean, will there be providers who stick to only off-chain or off-chain data, or you think the boundaries will be blurred and everyone will try and find a combination what works for them?

Speaker 2: 41:38

So there's a short-term timeframe and a long-term timeframe and I'm an on-chain maximalist. I think everything will move on-chain in the next 10 years. So with that regard, I don't think Covalent will ever add an off-chain component like market prices and things like that, and I think that means you're leaving some money on the table and I think that's fine. We partner with the Ico and Amber data and there's other vendors as well. But fundamentally, my issue with the off-chain data is that you can never trust that centralized API. You see, a lot of the market volumes are all fake today, a lot of these tier 2, tier 3 exchanges. There's no validity into the market volume. So there's just a lot of data provenance issues with these off-chain data sources. So we would never add an off-chain market data. Of course we add NFT assets and NFT metadata. Those are off-chain. We add other kinds of prices, but Covalent will never enter the off-chain thing. I think in the next 10 years everything will move on-chain and will be perfectly positioned for that With the off-chain data vendors. This is the challenge.

Speaker 2: 42:56

So most use cases of off-chain data is actually for hedge funds and trading use cases. What typically happens is that every time there's a market downturn, these customers actually go out of business. You see big showing rates Every time there's a market downturn. The market was pretty bad over the last I would say 6 months or so. The latter half of last year the market was pretty bad. Now it seems to have bounced back. But I think your customer base is a little volatile when you're focused on the short-term trader approach With the Covalent use cases.

Speaker 2: 43:34

We actually don't even focus on the trading use case. There are actually two blocks behind the blockchains. We don't even have fresh data. The use cases in our case are more stable. We have some massive enterprises, like one of the big fours in the US. All the US tax agent data comes from Covalent. That kind of use case will not really disappear because taxes will never disappear. As long as the data on-chain is increasing which it is. It is growing by leaps and bounds Then that use case becomes stronger and stronger. So fundamentally, that's how I see the divide. I think, when it comes to market data, off-chain or centralized market data is actually a bigger market than on-chain data. I have to admit that because most price discovery still happens on decentralized exchanges. But my hunch and my bet is that a lot of the price discovery and a lot of this activity will move on-chain as these blockchains become cheaper and cheaper over the coming years.

Speaker 1: 44:36

But isn't that like some time out in the future? My next question is I mean, you see the likes of Amber Data or Kaikou I mean Amber Data, I think acquired G-Wall, which is on-chain options data marketplace, and so they seem to be moving towards this on-chain future as well, while preserving their off-chain market data capabilities. So what are covalence modes given this scenario?

Speaker 2: 45:08

So fundamentally, I think, with any of the centralized providers, they don't really have a flywheel effect going on, which means that they can never compete on the security aspect because covalent, on the backend, the data extraction, is all decentralized. They don't really have that scale. They cannot really scale to 100 plus blockchains because they're what I would call as linear businesses, which means you have costs on one side and profits on the other side, and covalent doesn't really work as a linear business, not like a SaaS business model, and so when we go and index the chain, we're not really looking at profits per se, we are looking as a feeder into the flywheel. That's really how we think about it. So as we onboard more and more developers, then developers come in and unlock these different use cases. The different use cases come and pull in more blockchains. Basically, the last, I would say maybe like 30 or 40 blockchains that we've indexed have all been inbound. We never go and do prospecting, for example, because all these blockchains want to bootstrap their developer ecosystem and then the flywheel spins faster and faster.

Speaker 2: 46:21

So I would say that is the mode that we have. So in the case of like in Q1, we index 30 new blockchains, right, so we're the launch partner for Coinbase's chain. We were the launch partner for Polygon ZK AVM, for Tyco, for Scroll, for Consensus ZK AVM. So I think it's a different kind of game that we're playing and also because we have, like certain other aspects, that generates the profits, we can actually subsidize this side of the business, because it doesn't work as a linear business, and so I think, fundamentally, that's how we're different from any of the centralized providers. So Kudos to Amber and Kyco. I think they've done great. My bet is that in the next couple of years, all of the on-chain stuff will dominate, and so at that point it just comes down to who can do, who can be better, faster, cheaper on the on-chain stuff, and today Covalent is a leader by a long shot.

Speaker 1: 47:17

I think that was an excellent point, ganesh. I mean, we are all dreaming of that time when things move on-chain right, I mean with FTX and whatnot. We don't especially want the off-chain world to dominate for long. This was super interesting and I think I learned a lot. I hope that listeners will also learn a lot. What are the couple of exciting things coming up for Covalent and where can people read about them?

Speaker 2: 47:45

Yeah, so we have a stacked roadmap. You know one thing that we're a product of the previous bear market, which means that there are some people in this industry actually met Sandeep, who's one of the founders of Polygon and since previous boss, he's never scaling back his ambition With more success. His ambition just scales, you know, bigger and bigger, and I think Covalent isn't that relevant. So we'll never stop building. In fact, we picked up our pace this year with all the market downturn we see like a huge opportunity. So there's primarily, I would say, three aspects that you should keep an eye on. First is the decentralized network. Phase one of the decentralized network launched last year with the cryptographic data proofs. Phase two of the network is setting up this system where you can extract data without access to RPC blockchain nodes. So that phase two is launching in the next, I would say, 90 days. That's one aspect. The second aspect is on the blockchains themselves. So we're getting deep into the supernets, the subnets, the OP stack. We have crazy amounts of inbound on the OP stack. Everyone is launching on an OP stack. My team just came back from the Avalanche Summit last week, so tons of stuff going on there very tight with the Polygon team on the supernets.

Speaker 2: 49:05

So the whole app chain thesis is getting scaled out and I think we'll reach 1,000 chains by the end of next year. So keep an eye out on that. It's the same covalent treatment for all of these things. And then on the actual demand or usage side, the API better, faster, cheaper endpoints and more products. So we have a second product which is called increment, which is like a Dune-like product, so that's coming out around the market. We're thinking of building a third product and a fourth product. So we'll never stop building that demand side or the usage side of things, because the blockchain is the supply side and the blockchain data is only useful if you have actual products that use that data. So those are primarily the three dimensions that we are pushing hard and we have different keys that focus on those areas.

Speaker 1: 49:55

We're excited when you speak about building new products. It's palpable. Dennis, thanks a lot for spending time with us. Where can people learn more about you and follow what you and covalent are doing?

Speaker 2: 50:10

The best place is go to covalent underscore HQ, which is our Twitter handle. You should have a lot of information there. The website is covalenthdcom. Go sign up for our newsletter. You will know about the data alchemist program. You'll know about our data competitions. You'll get to like everything will be spread out through that.

Speaker 3: 50:31

Ganesh, was great having you Excited about you making a move downstream as well, right, with respect to launching applications which will help users query and like kind of interact with data in a faster manner, so excited about that. We're analysts at heart, so always ready to try out some of these new products and, yeah, rooting for you guys to kind of make sure that these thousand blockchains kind of skish to 2000 when the time comes, and developers have a real easy time while doing that as well.

Saurabh Deshpande

Host

Siddharth Jain

Co-host