Snowflake Horizon Catalog¶
Snowflake Horizon Catalog lets organizations discover and govern data, apps, and models with a built-in set of compliance, security, privacy, discovery, and collaboration capabilities. It’s a unified solution that solves problems across the enterprise, meeting the unique needs of different users who work with the organization’s content.
Who benefits from Snowflake Horizon Catalog?¶
Snowflake Horizon Catalog provides a solution for everyone with a stake in governing, discovering, or taking action on an organization’s content. These stakeholders include the following:
- Data stewards:
Data stewards want to provide access to data, apps, and models while still ensuring that the right people have access to the content. They want to identify sensitive data and appropriately protect it. It’s their job to determine who’s using what data and understand the quality of the data.
Horizon Catalog lets data stewards effectively govern the organization’s content with a built-in solution. They can protect content on a granular level to safely make it available to a wider audience; use tools that monitor security, quality of data, and flow of sensitive data; and continually audit who has accessed data and whether that access was done securely.
- Data teams:
Data teams of analysts, data scientists, and data engineers often struggle with finding the right data, app, or model for their task. After they find an object, it’s hard to tell if the data is up-to-date and trustworthy, what the columns mean, and who owns it. Even when they’ve determined it’s the right data, getting access to it can take days or weeks.
Horizon Catalog helps data teams find and collaborate on relevant content faster. Horizon Catalog helps these teams extract more value from content by making it easier to find the right data, understand the data so they can trust that it meets requirements, and take action on that data.
Scope of an organization’s content¶
Horizon Catalog governs and makes discoverable more than just Snowflake tables and views in the internal storage of an account. It covers a range of content, including the following:
Data, apps, and models in accounts across your entire organization.
Data from Apache Iceberg™ tables and external tables.
Data shared through private listings by trusted partners.
Publicly available data and every Snowflake Native App from the Snowflake Marketplace.
Data from third-party applications and data systems brought into Snowflake using connectors.
Governing content¶
Horizon Catalog provides the tools a data steward needs to govern an organization’s data, apps, and models.
- Compliance:
Horizon Catalog lets you do the following:
Audit the access history and object dependencies of content.
Monitor data quality using built-in and custom data metric functions, which lets you troubleshoot and visualize. You can configure an alert based on the centralized table to enable near-real-time data quality notifications.
View data lineage in Snowsight [1] to understand the table and column lineage from a source table to a target table, and set tags on columns that appear in either a downstream or upstream table.
View object insights [1] using a user interface that lets you learn information about tables and views without writing SQL. You can determine who is accessing the data, the queries that access the data most frequently, whether someone has been modifying the governance posture of the data, whether there are downstream or upstream dependencies on the data, and whether the data has been classified as sensitive.
Track data by monitoring tags, which can be user-defined tags implemented with object tagging or classification tags (system-defined or custom) that have been automatically assigned to columns based on the content of the column.
[1] Currently in private preview.
- Security:
Horizon Catalog lets you do the following:
Use the Trust Center to determine the current security posture of an account, including whether it meets the benchmarks established by the Center for Internet Security (CIS).
Use end-to-end encryption to prevent third parties from reading data while at-rest or in transit to and from Snowflake while minimizing the attack surface.
Choose your preferred authentication method such as OAuth or federated authentication.
Use granular authorization controls to control access to objects.
Define and apply data access policies to provide column-level and row-level protections.
- Privacy:
Horizon Catalog lets you do the following:
Define and assign aggregation policies and projection policies to control what type of queries can be run against shared data. Aggregation policies require analysts to run queries that aggregate data rather than retrieving individual rows. Projection policies control whether an analyst can use a SELECT statement to project a particular column.
Open up highly sensitive data to analysts while protecting the identity of individuals. Differential privacy uses rigorous mathematics to protect against sophisticated privacy attacks on your data.
Facilitate collaboration while preserving privacy using a Snowflake Data Clean Room.
Expand who can learn insights from sensitive data by synthetically generating data [2] with similar characteristics that they can work with directly.
[2] Currently in private preview.
- Discovery:
Horizon Catalog lets you understand your data more quickly using AI-powered Object Descriptions [3].
[3] Currently in private preview.
Discovering and taking action on content¶
Data teams rely on an organization’s data, apps, and models to do their job. Horizon Catalog provides these teams with the tools they need to discover content for their task, evaluate that content to ensure it’s relevant and trustworthy, and take action on the content.
- Discovery:
Horizon Catalog lets you do the following:
Search for data, apps, and models using Universal Search, which is a user interface that lets you find content inside and outside your organization using natural language.
Browse Snowflake content within an organization using the Internal Marketplace to find organizational listings [4].
Browse publicly available data on the Snowflake Marketplace.
Evaluate the relevancy of data by using object insights in Snowsight [4] to look at the popularity, access, quality, and dependencies of content.
Take action on a listing by referencing its data with a Uniform Listing Locator [4], which lets you write queries against the data of a listing without the overhead of creating a database or needing administrative privileges.
[4] Currently in private preview.
- Collaboration:
Horizon Catalog lets you do the following:
Share data within your organization in the Internal Marketplace [5] and privately with external business partners using private listings.
Buy and sell data products on the Snowflake Marketplace.
Manage your listings with a user interface or programmatically using SQL commands.
[5] Currently in private preview.
Use case: Seeing Horizon Catalog in action¶
Suppose BazFin, a large financial services firm, needs to ensure the compliance, data quality, and usability of its content, which consists of 10 PB of data. BazFin uses Horizon Catalog to govern and discover content.
- Govern content
The chief data officer (CDO) of BazFin needs to assure company stakeholders that business decisions are based on high-quality data. The CDO instructs the data steward to leverage system-defined and custom data metric functions to continually monitor data quality on a regular schedule. On any given day, the CDO can view a dashboard built on the events table to report on data quality.
Returning to her work for the day, the data steward opens the Trust Center to check the overall security posture of a Snowflake account that was recently created for a new division. From a built-in interface, she identifies that someone forgot to define a network policy to protect the account from unknown network traffic.
- Discover and take action on content
A BazFin analyst wants to build a new dashboard to show top-performing products. The analyst goes to the Internal Marketplace [6] and finds just the right organizational listing [6] with performance data published by the finance team. The analyst browses through a Data Dictionary to preview the data, then starts querying the data right away using the listing’s Unified Listing Locator [6].
The analyst also wants to enrich BazFin data with third-party data. Turning to Universal Search, the analyst uses the natural language search term
income bands for zipcodes
, which returns a data product from the Snowflake Marketplace that they can join with the BazFin product performance data.[6] Currently in private preview.