Created by three data warehousing professionals in 2012, Snowflake is a Cloud-Based Data Warehouse. It offers analytics and cloud-based data storage services. Corporate users use Snowflake to store and analyze data. In 2018, the company received a $450 million venture capital investment and was valued at $3.5 billion.
This article will cover the various Snowflake Features and what are the main benefits corporates generate from using this Cloud-Based Data Warehouse.
Snowflake is a cloud data warehouse that offers a platform for many data solutions and is offered as Database-as-a-Service (DaaS) to clients. Snowflake Features services like data engineering, data lake, data warehouse, data science, data applications, and data sharing from various data sources to data consumers. The platform gives you access to a world of data and services, helps in gaining modern data governance and security, and helps you to drive and build your business with the help of data.
The platform is more popular among its competitors due to its simplicity and it is designed in a very abstract and optimized way. Also, Snowflake Features a highly effective data warehouse that requires zero management.
In traditional data warehouses, the cloning of data can be a real pain. A whole new separate environment has to be created if you want to clone an existing database and then the data is loaded into it. This process costs a lot of money and requirement of additional storage because the process frequently needs to go through testing changes, creating test environments, and doing ad hoc analysis.
But Snowflake has a zero-copy cloning feature that allows you to copy a database without creating a wholly new environment. This happens because of Snowflake’s architecture of storing data as immutable in S3 and storing the changes as Metadata. Another benefit of this feature is that it allows you to create various independent copies of the same data without the requirement of any extra costs.
It is one of the unique features of Snowflake which allows you to track all the changes that the data goes through overtime. The feature is available free of cost for all accounts and also allows one to access the historical data related to a table within the last 90 days.
The 90 days feature is for enterprise accounts but for standard accounts a maximum of one day Time Travel retention period is given by default.
Snowflake also removes the worry of optimizing a table. It uses two principles to maintain table structures for best performance:
- Micro-partitions: The tables are automatically divided by grouping rows into micro partitions of storage size around 50-500 MB. This helps in creating more uniformly-sized partitions that help in limiting the data skewness. This structure helps in scanning only those partitions which are relevant for that specific query.
- Clustered Tables: Queries don’t perform well if the data in the table is not ordered perfectly. A subset of columns is used to co-locate the data so that the data stored in tables can be ordered, this subset is known as the clustering key. In Snowflake, for each micro partition that is created during the data loading process, clustering metadata is collected. This metadata is used to avoid any inessential scanning of micro-partitions.
This is a great feature in which even the mistakes like dropping a wrong table can be recovered. If we mistakenly drop a wrong table it takes a lot of time to recover and restore the data but with Snowflake, you can recover it immediately if you are still in the recovery window.
It is another important feature of snowflake that keeps in check that the historical data is protected even during the hardware disk and hardware failures. The platform provides Fail-Safe protection of data for seven days which begins when the time travel period comes to an end. This means if a table has 90 days time travel period it will have 97 days for the recovery.
Continuous Data Pipelines help in automating many of the manual steps which involve loading the data into a table and then transforming that data for analysis. It also provides us with features that help in the continuous ingestion of data, setting up repetitive tasks for building a continuous workflow in the data pipelines, and data tracking.
For the continuous ingestion of data, Snowflake uses Snowpipe, a data ingestion service that helps in loading the data in micro partitions when data reaches an external stage permitting you to access it rapidly within minutes.
Snowflake has the capability to combine structured and semi-structured data without the requirement of any advanced or complex technologies. Data can come from various sources like mobile devices, sensors, and even machine-generated and the platform also supports ingestion of this semi-structured data in different formats like Avro, XML,
It is an innovative feature of Snowflake which allows secure sharing of data. It helps you to share objects like tables into your account from a database with another Snowflake account preventing the creation of copies of that data. In this way, the data that is shared doesn’t take up extra storage space and in turn, prevents storage costs for the data consumer. The data sharing happens in the metadata store so the setup is done quickly and data consumers can access the data instantaneously.
The architecture of Snowflake has caching at various levels that help in speeding up the queries and minimizing the costs. If we execute a query in Snowflake it holds the end result for 24 hours and if the query is executed again from the same account, the results will already be available as the data hasn’t changed. This feature really comes in handy during the analysis work where we need to access the previous data and thus we don’t require to rerun those complex queries whenever we need to compare the results before and after the change.
- Availability and Security – The platform has different availability zones across which it is distributed and is designed in a way to efficiently handle network failures without significantly impacting the customers. Snowflake is SOC 2 Type II certified, and also has added levels of security. It also provides encryption across all levels of communication.
- Seamlessly Sharing of Data – The architecture of Snowflake is such that it allows anyone to seamlessly share data with its data consumers through reader accounts. This feature allows the service provider to create and manage a consumer’s Snowflake account.
- Accessibility and Concurrency – Concurrency issues like delays and failures can happen with a traditional data warehouse which has a large number of users with too many queries but Snowflake has a multicluster architecture that doesn’t allow queries from one virtual warehouse to affect the queries from another virtual warehouse.
- Performance and Speed – Snowflake has an elastic nature which means if you want to run a large number of queries at once and load a large amount of data you can scale up your virtual warehouse and then again scale down according to your requirement.
Also Read – Snowflake Encryption
As a cloud data warehouse Snowflake is among the strongest contenders, optimized in such a way that it requires almost zero management. Snowflake Features some very unique and innovative features and is very simple to set up and operate. In the long run too, it can prove to be highly beneficial in maintaining and building your analytics use cases easier.