Normal view

There are new articles available, click to refresh the page.
Before yesterdayTenable TechBlog - Medium

How to pass Snowflake Snowpro Core exam?

23 February 2023 at 15:17

Introduction

The Snowpro Core Certification exam is Snowflake’s entry level certification exam. I recently sat and passed the exam and wanted to share my notes for those of you looking to do the same. I found scope of the exam to be very broad in that it covered a lot of topics. Also, it could get very specific in certain questions. As an example, there were questions where you had to pick the right SQL statement to return a specified result. This is all to say it is a tough exam and worth preparing for.

Exam guide

As always, it is good to start with the official documentation, found here. The guide includes a comprehensive set of links to materials covering the different topics in the exam.

The exam guide is broken down into the domains below. The estimated percentages are how much each domain makes up the amount of questions in the exam.

  1. Snowflake Cloud Data Platform Features and Architecture (20–25%)
  2. Account Access and Security (20–25%)
  3. Performance Concepts (10–15%)
  4. Data Loading and Unloading (5–10%)
  5. Data Transformations (20–25%)
  6. Data Protection and Data Sharing (5–10%)

Exam duration

The exam for me was 100 questions with a pass rate of 750 out of 1000. I had 2 hours to complete it. I found that this was plenty of time as the questions were straightforward. You either knew them or you didn’t.

Fundamentals

To pass the exam, you do need a good understanding of the fundamental architecture underpinning Snowflake. How Snowflake has implemented micro-partitioning and their approach to decoupling storage and compute lays the foundation for most of their other features. Understanding how the 3 primary layers of their architecture support these is key to passing the exam.

Three primary layers

  • Cloud services layer — is the brain of your account. If it’s not storage or compute, the cloud services layer is responsible for it. When you log into Snowflake, it is the cloud services layer that authenticates your login. It stores metadata about all the micro-partitions in the account. With that information, it is responsible for generating query plans to hand off to the query processing layer to run. It spins up compute as and when you need based on your scaling parameters. It will take care of serverless and background services like snowpipe and auto-clustering.
  • Query processing / Compute layer — The compute engines that provide the RAM and CPU to run queries. Snowflake virtual warehouses comprise the compute layer.
  • Data storage layer — data stored in Snowflake is stored in a proprietary format. The data storage layer takes care of the organising that data into it’s own format prioritised to work efficiently with the other Snowflake layers. The data is stored in it’s own compressed, columnar format. The data storage layer is only accessible through Snowflake and can’t be accessed directly by any other means.

Micro-partitioning

A good place to start learning about micro-partitioning is the Snowflake documentation here.

Snowflake maintains clustering metadata in the cloud services layer for the micro-partitions in a table, including:

  • The total number of micro-partitions that comprise the table.
  • The number of micro-partitions containing values that overlap with each other (in a specified subset of table columns).
  • The depth of the overlapping micro-partitions.

Snowflake stores metadata about all rows stored in a micro-partition, including:

  • The range of values for each of the columns in the micro-partition.
  • The number of distinct values.
  • Additional properties used for both optimization and efficient query processing.

Other things to note are that:

  • Micro-partitions are immutable.
  • Each micro-partition contains between 50 MB and 500 MB of uncompressed data.

Snowflake Editions

There were a number of questions in the exam about what features were supported in which edition of Snowflake. As a reminder these three editions are:

Standard — cheapest level, provides an introductory level to most features.

Enterprise — provides all the features of Standard with additional features aimed at enterprises with larger scale and workloads.

Business Critical-provides all the features of Enterprise but with additional features for data protection and compliance standards and also for business continuity and disaster recovery.

Virtual Private Snowflake (VPS) — A Snowflake installation where underlying resources are dedicated to a single account. Provides all the features in business critical edition. Still cloud based, not on-premise.

Snowflake Editions - Snowflake Documentation

For the exam, it is good to know that Standard supports most features except those listed below. One point to note is that Standard supports Time Travel but only up to 1 day. All higher editions of Snowflake support it up to 90 days.

In addition to the features supported by Standard edition, Enterprise provides the following extra features:

  • Extended Time Travel up to 90 days.
  • Dynamic Data Masking and External Tokenization for column-level security.
  • Row Access Policies.
  • Object Tagging.
  • Data Classification.
  • Account Usage ACCESS_HISTORY access.
  • Multi-cluster warehouses.
  • Search Optimization.
  • Materialized Views.

Business Critical provides these features above what you get with Enterprise:

  • The ability to use your own encryption keys with Tri-Secret Secure.
  • Support for Private Connectivity to the Snowflake Service using AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect.
  • Failover and failback between Snowflake accounts.
  • Redirecting client connections between Snowflake accounts.
  • Support for a number of compliance regulations such as PHI Data, PCI DSS, FedRAMP and IRAP — Protected data.

And finally Virtual Private Snowflake gives all of these features but in a completely dedicated environment, isolated from all other Snowflake accounts. You don’t loose any functionality by choosing VPS, it even includes an option to enable data-sharing.

Individual Features

Clustering

Without specifying a clustering key on a table, Snowflake will chose it’s own and cluster the data as it thinks best. However, there is also the option to manually specify your clustering key. Only one key can be specified per table. Once the key is specified, a cloud services feature called automatic clustering takes care of keeping the data clustered according to the field specified. This will cost you money to run.

Account and billing

Charged per second, 1 minute minimum on virtual warehouses. Data in account_usage schema and organisation_usage is not real time.

Resource monitoring

You can only have one resource monitor per virtual warehouse.

You cannot set a resource monitor on cloud services such as Snowpipe, serverless compute, auto-clustering.

You can use a resource monitor to suspend a warehouse at a set time and date, regardless of resource consumption.

The default interval for a resource monitor is monthly but can be set to a custom interval such as daily, weekly or annually.

Resource monitors don’t shut down virtual warehouses immediately and you may incur additional charges after the threshold has been reached and while the warehouse is being suspended.

Cloning

To clone a table, you need the SELECT privileges on the source table, and for all other objects that can be cloned you need the USAGE privilege. When you clone a database or a schema, the privileges on all child objects in the cloned object will match the parent objects. However the privileges on the database or schema are not copied over.

Caching

I have always found this article to be a great resource to explain caching in Snowflake.

Snowflake Community

Metadata cache — the metadata cache is the fastest way to return information from Snowflake. As it uses only data stored in the Cloud Services layer, there is no need for Snowflake to spin up compute or access data storage. If the user is querying for information like row counts in a table or information about clustering, Snowflake can use the data it has cached in the cloud services layer to fulfil that query.

Results cache — Snowflake uses the query result cache if the following conditions are met.

  • A new query matches an old query, and the underlying data contributing to the query results remains unchanged.
  • The table micro-partitions have not changed as a result of clustering or consolidation.
  • The query makes no use of user-defined, external, or runtime functions. On the other hand, queries that use the CURRENT DATE function are eligible for query result caching.

Results cache lasts for 24 hours since query results was last accessed but will be discarded after 31 days.

Virtual warehouse cache — Every time a virtual warehouse accesses data from a table, it caches that data locally. This data can be re-used for subsequent queries without having to make the round trip to the data storage layer.

Query & Results History

I found this video to be extremely helpful in understanding how query and results history works. It does a good job of running the viewer through querying Snowflake in the older web application and covers a lot of crossover with caching.

Snowflake Community

A few things to note:

  • A user can never review someone else’s results but you can re-run their query.
  • The query history page only shows queries from the last 14 days.
  • Query results are held for 24 hours (presume it re-uses query results cache).
  • However the account usage query_history view stores data for 365 days

Fail-safe and Time-travel

One point to note is that Standard support Time Travel but only up to 1 day. All higher editions of Snowflake support it up to 90 days.

A time-travel period of 1 day can be specified for temporary and transient tables but they are not included in fail-safe.

To support time travel queries, Snowflake supports special SQL extensions. It supports the AT and BEFORE statements which can be used with SELECT statements or while cloning tables, schemas, and databases. Snowflake also supports the UNDROP statement, which can be used to recover tables, schemas, or even complete databases after they have been dropped.

Fail-safe is a fixed 7 day period for all Snowflake editions. Only Snowflake support can retrieve data from fail-safe storage.

Metadata retention periods

  • load metadata for a table is stored for 64 days

Working with external data

For loading and unloading data into and out of Snowflake, you can use a feature called stages. A stage specifies where data files are stored (i.e. “staged”) so that the data in the files can be loaded into a table. There are external and internal stages.

An external stage is a pointer to an external location in cloud storage. An external stage is composed of the url and information used to connect to the location and optional parameters on the file format. You can use the COPY INTO statement to copy data from an external stage into a Snowflake table and also do the reverse.

An internal stage allows the possibility to load files directly into Snowflake. There are 3 types of internal stage (user, table and named). This documentation does a good job of explaining the difference between the three types and where you should use one over the other.

Choosing an Internal Stage for Local Files | Snowflake Documentation

In general, it all comes down to access. User Stages are only accessible to the individual user. A Table Stage can be accessed by multiple users but can only be loaded into one table. A Named Stage is the most flexible and can be accessed by multiple users and loaded into multiple tables. A Named Stage is a database object and permissions can be granted to it.

PUT and GET

You can use the PUT and GET statements to work with stages to put and get data into and out of Snowflake.

  • GET = gets data out of Snowflake
  • PUT = puts data into Snowflake

The GET command is used to download data from an internal stage to an on-premises system. The PUT command is used to upload data from an on-premises system to an internal stage.

The GET command does not support downloading files from external stages. To download files from external stages, use the utilities provided by the cloud service.

Directory Tables

A directory table is similar to an external table in that it points to files stored outside of Snowflake. The big difference is that a directory table points to a catalog of files in cloud storage, not the actual data in the files. Both internal and external stages support directory tables.

Directory Tables | Snowflake Documentation

Unstructured Data Support

For the exam, I would suggest you have a good understanding of this feature and study the documentation linked below very closely.

Introduction to Unstructured Data Support - Snowflake Documentation

The documentation covers how Snowflake works with unstructured data. They define unstructured data as data that does not fit into any of their existing datatypes. However you can access and share this type of data using stages, both internal or external. External stages allow data in cloud storage like S3 to be exposed in Snowflake. Once the stage has been created, you can access the data via a URL link.

Scoped URL — Encoded URL that permits temporary access to a staged file without granting privileges to the stage. The URL expires when the persisted query result period ends (i.e. the results cache expires), which is currently 24 hours.

File URL — URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files.

Pre-signed URL — Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable.

Conclusion

I hope you found this useful. If you’re interesting in how Tenable develops on Snowflake, check out our previous blog here. While you’re there, check our careers page.


How to pass Snowflake Snowpro Core exam? was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

❌
❌