Zephyrnet Logo

Snowflake IPO – a first look

Date:

Reading Time: 7 minutes

Introduction

Snowflake is a data warehouse-as-a-service provider in the United States. Its stated mission is “to enable every organization to be data driven.”

Based in California, it was founded by Benoit Dageville, Thierry Cruanes (who were both architects at Oracle) and Marcin Zukowsi in 2012. Today, the company is led by Frank Slootman, who was recruited to the company this year having previously been the Chairman (and previously CEO) of ServiceNow, the IT enterprise company.

Description

True to its name (snowflakes are supposedly born in the cloud), the company provides a ‘pure’ SaaS offering with is no hardware or software for customers to install or configure. It also unapologetically does not provide an on-premise solution (unlike many ‘lift-and-shift’ offerings) and it only works in the public cloud. It is also largely agnostic in terms of cloud provider – it can run off AWS, Azure and Google Cloud.

In some senses, its more of a “solution as a service’ since the ongoing maintenance, management and tuning is all handled by Snowflake. When a customer adopts Snowflake, the company effectively becomes the Data Warehousing IT team for the customer and is responsible for solving problems (like system errors in data loading) and even general questions on how to formulate queries. This will have important consequences for headcount and financials, more on that later.

It’s key differentiator, is its architecture. Its a hybrid between a traditional disk-based architecture and a shared-nothing architecture, which are essentially different approaches to clustering:

Source: http://www.benstopford.com/2009/11/24/understanding-the-shared-nothing-architecture/

In a shared nothing architecture, data is partitioned and spread across nodes. Each node is made up of a processor, main memory and disk. And each node has total autonomy over its subset of data. Nodes don’t interfere with each other because they don’t share memory or storage. The result is an architecture that is highly scalable – since you can keep adding nodes without interfering with existing ones (useful for businesses that need to scale rapidly).

In a shared-disk architecture, all data is available to all cluster nodes (the data is not partitioned). While nodes can have their own private memory, they can access all of the disk devices in the network. The result is an architecture that is highly available. The drawback is that it doesn’t scale well since the nodes are not independent. Also, two or more nodes must be restricted from updating data at the same time, which can limit performance.

Snowflake has taken a hybrid approach. It has achieved this by separating storage and compute in its architecture:

It uses a centralised data storage repository which is similar to a shared-disk architecture since its accessible to all nodes in the data warehouse.

However, it also uses MPP (massively parallel processing) compute clusters (it calls Virtual Warehouses) to process queries. Each node in the cluster stores a portion of the data set locally. Each Virtual Warehouse is an independent compute cluster that has no impact on the performance of other Virtual Warehouses.

The result is the availability of a shared-disk setup, with the performance and scalability of a shared-nothing architecture. This is its most important feature and is at the heart of what it does.

Other important features include:

    1. Elasticity. Under traditional data warehouse models, companies have to forecast and plan expected peak capacity versus peak data demand. Too little capacity, and they have to add new nodes that may involve shutting down the environment until data is repartitioned across new nodes. Too much capacity, and there might not be a way to scale back (which is an unnecessary cost). 

      Since Snowflake has separated storage and compute, the company is able to offer customers elasticity.  Through a pay-per-use pricing model, customers only pay for storage and compute that is actually used. A typical customer is not running queries 24/7, so there are periods of time when they are paying for compute time, that they are not actually using. Also, since the cost of storage and the cost of CPU is drastically different, this unbundling can provide real value to customers.

    2. Ability to handle semi-structured data. Generally companies that work with semi-structured data (like JSON and XML) either have to use complex Big Data technologies like Hadoop, MapReduce, Hive etc. Or they have to spend significant development effort in implementing transformation rules to get the unstructured data into a relational database. 

      However, Snowflake’s ‘Variant’ datatype allows users to load semi-structured data directly into a relationship database, without any data manipulation. The result can also be queried dynamically using the self-describing internal tags of the semi-structured data set. This is important, since an increasing amount of data is semi-structured and the ability to query it directly and rapidly for insight can become a source of competitive advantage (e.g. test-learn-improve processes in gaming industries where data is often in JSON format). 

    3. Concurrent load and queries. Under traditional data warehouses, a big challenge is to get new data in without getting in the way of existing queries and jobs. A general Extract-Transform-Load (ETL) process can severely impact the performance of a warehouse during its execution. This renders the warehouse, at times, unusable to other users. That’s why these processes are often scheduled during the night to avoid disruption to business users. 

      But this limits data loads to a daily frequency. And with the velocity of data increasing, this is often not enough. Again, since the company has separated storage and compute, each Virtual Machine can run multiple workloads independently (including ETL) with no impact to other users. 

    4. Fast Cloning. This feature allows users to clone data (including entire databases) without creating additional copies or replicating the data itself. This is incredibly powerful since it allows users to clone data with no cost or time penalty. (Normally copying a large dataset can take a long time and storing it isn’t cheap). 

      The major benefit is the speed in which data can be ported between production and test environments. This dramatically improves the speed in which product changes can be brought to market. 

    5. Automatic Query Optimization. Snowflake collects metadata on executed queries from all customers and automatically tunes these queries for performance. As a result, users do not need to worry about partitioning or indexing for performance which results in an experience the company calls ‘zero management.’

Its also worth noting that since Snowflake uses a relational database structure, SQL is used to query the underlying data. This skill-set is far more plentiful then, say, MapReduce is for Big Data technologies like Hadoop. This really helps improve customer adoption and makes for an easier sell to businesses that don’t have armies of tech-savvy data scientists.

Business Model

As mentioned above, the company operates a pay-per-use-policy, where it recognizes revenue based on platform consumption of storage, compute and data transfer resources. This is an important difference compared to other SaaS companies that recognize revenue ratabaly over the contract life. In other words, it’s revenue is more variable and less knowable/predictable than other SaaS companies. But only slightly so, because unused capacity is rolled over into future periods, as the company explains:

“Customers have the flexibility to consume more than their contracted capacity during the contract term and may have the ability to roll over unused capacity to future periods, generally on the purchase of additional capacity at renewal.”

Snowflake’s S-1

So it’s more the timing of revenue amount that is less knowable/predictable compared to subscription SaaS models rather than the actual amount.

For storage, the company calculates consumption based on the average amount of terabytes per month of all of the customers data stored on the platform. For compute, consumption is based on the type of compute resource used (greater horsepower, greater cost) and for some features, the volume of data processed. For data transfer, it’s based on the amount of terabytes of data transferred, the cloud provider and the region.

All three consumption metrics (storage, compute and data transfer) are treated as a single performance obligation since they are consumed as an integrated offering.

Financials

Taking a look at how its product and business model plays out in the financials we have the following:

    • Revenue in Q2 2020 was $133 million for the quarterly period, compared to $60 million in the year-ago period. That translates to 121% revenue growth (which is remarkable). Comparing the 6 months ended in 2020 (visible above) with the year ago period, leads to an even higher revenue growth rate of ~132%. 
    • Gross margins have also improved from ~49% last year to ~61.5% in the most recent period. Remember, cost of revenue will include the maintenance costs of supporting the ‘solution as a service’, since the company is basically replacing the Data Warehousing Team for each of its customers. There are benefits of scale of this approach through automation but also there is a limit to how many customers a single maintenance personnel can realistically service. As the company grows, headcount will have to increase and the gross margins will probably reflect some blend of a pure SaaS company’s margins and a managed service company’s margins.  
    • The company also reports a Net Revenue Retention rate of 158%. This is even higher than nCino’s reported rate of 147%. This implies that even if the company were to halt all sales & marketing, it would still grow by ~58% just by selling new services to existing clients. A very strong momentum that may become difficult to catch up with. As a side note, revenue retention was 223% in the first six months of 2019 – implying that the average customer was more than doubling their spend. This is a very strong vote of confidence/signal in the company’s product offering. 
    • It’s total customer count as of Q2 2020 was 3,117 customers, up from 1547 in the year ago period. Additionally, the company currently estimates its total-addressable market (TAM) at $81 billion.

Conclusion

This is clearly a very strong offering (perhaps the best i’ve covered), that will likely command an eye watering multiple. In a future post, I will take a closer look at the pricing details of the IPO (especially the lockup periods) and take look at valuation.

Stay tuned. 

Source: https://ipohawk.com/snowflake-ipo-a-first-look/?utm_source=rss&utm_medium=rss&utm_campaign=snowflake-ipo-a-first-look

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?