Make Your Lakehouse AI-Ready

Data engineering teams are under unprecedented pressure. They’ve been tasked with building the data foundation for generative AI and advanced analytics, but studies show a staggering 75% of AI projects fail to reach production. Why?

The problem isn’t the AI models; it’s the fragmented data foundations they rely on.

Today’s data engineer is often forced into the role of a firefighter, spending their valuable time putting out fires and doing routine maintenance jobs in their infrastructure rather than innovating. They are constantly stitching together rigid, complex data pipelines, fixing broken dependencies, and managing fragmented infrastructure across silos.

We are excited to announce the general availability (GA) of advanced data engineering capabilities for open table formats on Snowflake, empowering any organization to build a unified, governed and high-performance lakehouse for the AI era.  

With these improvements, we are removing the forced choices from the past between flexibility and simplicity, openness and security, and lock-in and interoperability. Here’s how:

  • Use catalog-linked databases (GA): Federate to any Iceberg REST catalog — including AWS Glue, Databricks Unity and Microsoft OneLake — all from a single Snowflake development environment to automatically discover and access fresh data. This delivers on the lakehouse’s zero-ETL promise, while providing unprecedented interoperability and processing power with Snowflake’s world-class performance engine.

  • Write to any Apache Iceberg™ table (GA): Full data engineering is now supported for any Iceberg table, regardless of Iceberg rest catalog. Users are able to centralize not only discovery but also ingestion, transformation and modeling operations on Snowflake’s unified and fully managed platform. As a result, you spend more time innovating and less time managing infrastructure.

  • Take advantage of automatic Iceberg optimization: Get the flexibility of open formats without the operational overhead. With Snowflake, you can now optimize file sizes and partitions (now in GA) for your entire Iceberg ecosystem to optimize performance, regardless of catalog or engine. Additionally, easily automate table maintenance operations (now in private preview) — such as expiring snapshots, compacting files and rewriting manifests — for superior query performance and simplified management across your entire lakehouse.

  • Share data assets on open formats (GA): Snowflake’s secure zero-ETL data sharing now supports both Iceberg and Delta Lake tables regardless of catalog. This means you can easily and securely share open table formats across regions and clouds with security and governance policies persisting for your data customer.

These solutions fully unlock Snowflake’s suite of data engineering and collaboration solutions, from ingestion to business impact, to help more organizations conquer data complexity and realize their AI potential. 

Conquer data complexity: The new data engineering paradigm

The shift to a true AI-ready data lakehouse requires removing three major friction points that plague modern data teams: spending time stitching together rigid and fragmented data architectures, fixing broken and complex pipelines and managing inconsistent governance across silos.

Here’s how Snowflake’s new GA and existing capabilities address these problems to empower data engineering teams to focus on delivering trusted data for AI.

Securely connect data, wherever it lives

The promise of the lakehouse is in its open, multiformat flexibility, but that vision is often undermined by the complexity of managing metadata and catalogs across different teams, regions and clouds. To overcome this challenge, we are delivering on the vision for a unified, connected and governed lakehouse. 

Overcome fragmentation for existing data with catalog-linked databases and connect new data with superior economics: 

  • Connect new data with improved economics: Enjoy a simpler, more predictable pricing model based on data volume, which has resulted in 50%+ ingestion cost reduction for Business Critical/Virtual Private Snowflake edition customers (with a full rollout expected to be completed soon). With Snowpipe, together with Snowpipe Streaming API, you can bring data at the latency of your choice, or simply connect multimodal data from anywhere using Snowflake Openflow, a low-code managed integration service.

  • Expand your Iceberg ecosystem: Access data in Delta tables with Delta Direct and Parquet files with simple metadata transformations for a truly unified view of your entire data estate. 

  • Unlock AI-ready data: Get your data AI ready by making it connected, continuous, curated and contextual for AI. Automate unstructured data preparation or simply add ready-to-query data from trusted third-party sources with Snowflake Cortex AI, using Document AI, Cortex AISQL and Cortex Knowledge Base.

This means you can unify your fragmented data estate into a centralized and governed pane of glass, regardless of location or catalog, while maintaining the flexibility and choice that Iceberg’s broad ecosystem support offers.

Streamline pipelines with a fully managed infrastructure

The most significant drain on a data engineer’s time is manually managing dependency graphs and debugging procedural data transformation code. You deserve a better way to build low-latency data pipelines.

We are bringing the power of the Snowflake AI Data Cloud to your open-format data with features designed to remove the complexity of managing pipelines:

  • Use Dynamic Tables for Iceberg: By leveraging a declarative SQL framework, simply define the desired outcome of your data transformation, and Snowflake automatically handles orchestration, dependency management, scheduling and incremental refresh. The result is fully managed pipelines that free up development hours and deliver efficient and stable data.

  • Accelerate existing pipelines: For teams running extensive Spark codebases, Snowpark Connect for Apache Spark™ allows you to execute Spark workloads directly on Snowflake’s high-performance engine, often resulting in substantial price-for-performance improvements. Customers see 5.6x faster performance and 41% cost savings with Snowpark over their traditional Spark environment.1

  • Work your way: Maintain developer flexibility by using your language of choice with support for SQL, Python or Java. Automate object management in a CI/CD pipeline with Snowflake CLI, dbt projects, GIT integration and other tools that help your team build production pipelines with optimal efficiency.

Govern for AI: Delivering trusted data products

AI/ML models rely on governed, high-quality data to avoid bias and generate reliable outputs. This means governance, data quality and discovery capabilities should be built in. This is particularly difficult in lakehouse architectures where data lives in multiple regions, clouds and tools. Snowflake Horizon Catalog centralizes governance for AI by providing unified manageability regardless of where your data lives. 

Horizon Catalog help you build a data foundation that is auditable, secure and ready for your most critical AI/ML initiatives:

  • Centralized, intelligent governance with Horizon Catalog: The Snowflake Horizon Catalog provides a single, intelligent governance layer that applies policies across regions, clouds and all data objects — including your Iceberg tables, regardless of catalog.

  • Isolated data access: Implement out-of-the-box security features such as role-based access controls that separate function from identity, fine-grained access controls (FGAC) and attribute-based access controls (ABAC) to create precise, real-time access policies. Isolate sensitive data and ensure that only authorized users or ML models can access specific fields, regardless of the source.

  • Data quality as a nonnegotiable: Leverage customizable data quality controls and proactive alerts (currently in private preview) that isolate bad records for remediation. You gain confidence that every data product delivered — whether to a dashboard, to an application or powering a gen AI model — is consistent and trusted.

The Snowflake AI Data Cloud: Build for innovation

The goal of modern data engineering is to provide the shortest path from raw data to business impact. This GA release marks a huge leap forward in making that path simple, open and scalable.

Customers such as Affirm now have both the sovereignty over their data and the operational simplicity they need to scale their AI-ready data foundation. Affirm has seen a 6x reduction in monthly costs for replication pipelines and up to 66% improvement in critical SLAs. Watch their presentation.

It’s time for data engineers to shed the burden of the reactive firefighter and step into your role as a skilled data artisan. Stop managing complex infrastructure and dependencies. Start delivering innovation.

Ready to conquer data complexity?

  1. See the solution: Watch “Data Engineer Connect: Architecting for AI” for demos.

  2. Dive deeper: Access the solutions page for detailed instructions for each of the use cases.

  3. Start building.

Forward Looking Statements
This article contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risk and uncertainties. See our latest 10-Q for more information.


1  Based on customer production use cases and proof-of-concept exercises comparing the speed and cost for Snowpark versus managed Spark services between November 2022 and May 2025. All findings summarize actual customer outcomes with real data and do not represent fabricated data sets used for benchmarks.

LATEST ARTICLE

See Our Latest

Blog Posts

admin October 29th, 2025

At Snowflake, we’re committed to making your data platform easy, connected and trusted, so you can confidently push the boundaries […]

admin October 29th, 2025

At Workday Rising in San Francisco, Snowflake and Workday proudly announced a groundbreaking zero-copy, bidirectional partnership. This collaboration aims to […]

admin October 29th, 2025

AI for the public sector: Why FedRAMP authorization for Cortex AI is a game changer The imperative for government agencies […]