The promise of the open lakehouse envisions a single, governed data copy that’s accessible to any engine, but this idea has long been haunted by “proprietary gravity.” And, while Apache Iceberg™ emerged as the community’s first answer to data interoperability, an open format alone is no longer enough.
In the age of AI, data silos, governance and semantic fragmentation are taxes on innovation. When teams cannot act on data where it lives, they are forced to move it, leading to ballooning costs and “noisy” data that is missing the rich, semantic context that AI needs. AI initiatives are undermined before they even get started.
At Snowflake, we are building toward a future where full interoperability is a reality. By working with the community across data, governance and semantic interoperability, we are enabling our customers to overcome data silos and multilayer fragmentation once and for all.
The result is users who have agency over their data. Users decide how and from where to securely act on a single logical data copy for any operation without impacting governance controls and semantic context.
Agency over data can’t be accomplished by a single vendor or with just data interoperability, however. It requires interoperability at each layer of an architecture. Delivering on this vision means solutions must be grounded in widely accepted open and community-driven initiatives that prioritize vendor-neutral interoperability.
Getting to a place where users have agency over their data, regardless of engine, starts with a common table format. With its widespread native support across platforms and active community, Iceberg is that format. Most recently, the community reached a critical milestone: Iceberg v3. Iceberg v3 builds on existing capabilities to expand data interoperability to critical use cases, including semi-structured data, change data capture (CDC) and more.
Today, as we gather for Iceberg Summit in San Francisco, we are excited to announce general availability soon for broader support for v3 capabilities.
By supporting a broad set of v3 capabilities, more of our customers’ data becomes accessible from more engines than ever before. Customers can power the following use cases with Snowflake for Apache Iceberg tables, managed by Snowflake’s Horizon Catalog or any other catalog:
pg_lakeNot every data set starts in an analytical lake. Much of a company’s most valuable information lives inside transactional databases such as Postgres. Historically, the two worlds of transactional and analytical were silos. To get them to talk to each other, teams had to glue them together with data pipelines that moved data downstream.
To bridge this gap, Snowflake developed and open sourced pg_lake. This extension transforms Postgres from a standard database into a functional part of a data lakehouse. pg_lake gives databases two new capabilities:
Now transactional and analytical data can share the same open language.
Governance controls and secure access must follow the data. This is why, two years ago, we open sourced and donated an Iceberg catalog, now Apache Polaris, and have partnered with the community to help the open source catalog become a Top-Level Project under the Apache Software Foundation. Our aim is to deliver a future where Snowflake’s fine-grained access controls, or those of any other platform, are enforced consistently and performantly across any engine, on any compute, without forcing customers to choose between security and the flexibility of an interoperable lakehouse.
Historically, authorization has been hard-coded into database engines, which has locked customers in at two levels: policy definition and policy execution. However, the issue isn’t that customers don’t trust these engines to enforce rules — they do, and they always have — but rather that fine-grained access control (FGAC) requires compute to understand and execute those rules.
We are breaking this cycle with Apache Polaris. By developing standards for Policy Exchange, Governance Federation and Read Restriction APIs, we’re creating a standardized way to interchange policies and a trust mechanism to manage enforcement across platforms. By using Read Restriction APIs, one platform can share pre-evaluated access rules that a downstream engine can enforce directly. This ensures that governance truly travels with the data, removing the heavy “compute tax” of data materialization and allowing for consistent enforcement regardless of which engine is accessing the information.
The goal is simple. Fine-grained security and governance controls — whether on Snowflake’s Horizon or any other supported catalog — should be enforced consistently across any engine, without server-side materialization or performance penalties.
AI agents waste tokens and “guess” meanings when business logic is locked in proprietary silos. To address this, we’re building Open Semantic Interchange (OSI), a vendor-neutral specification for metrics, dimensions and relationships that makes semantic context as open and interoperable as Iceberg itself. The first OSI spec is live under an Apache 2 license, backed by a coalition of more than 35 industry leaders, including Salesforce, dbt Labs and Databricks, with a commitment to transition to foundation-led neutral governance.
Snowflake customers can start today with semantic views in the Horizon Catalog, giving Snowflake Cortex AI and agentic applications the governed “map of truth” they need to reason accurately, while building on the same foundational constructs that OSI is standardizing across the industry.
Our commitment to unlocking agency over a user’s data represents a fundamental shift in our engineering culture. Snowflake is no longer just a consumer of open source; we are building with the community. We are proud of how this change has enabled us to work with the community to make agency over data a reality for all.
For true open data interoperability to become a reality, we all must play our parts; this is, after all, a collective responsibility. This means moving beyond “proprietary gravity” because that is what the age of AI demands.
No single vendor can solve data silos and fragmentation alone. It requires a diverse community of users, vendors and organizations working toward this common goal. Only then can we help data teams everywhere realize the promise of open source: the ability to have agency over their data.
If you are at the Iceberg Summit, come find the Snowflake engineers writing the PRs and reviewing the spec proposals. The work is public, the doors are open, and a future where users have agency over their data belongs to everyone.
A single interface for enterprise intelligence and action At enterprise scale, the hardest problem is not the intelligence itself. The […]
Snowflake Intelligence is redefining how business users work with data—turning insights into decisions and actions across the enterprise. Powering this […]
Get hands-on The best way to see what Cortex Code can do is to run it in your own Snowflake […]