Accelerate AI Development with Snowflake

At Snowflake BUILD, we are introducing powerful new features designed to accelerate building and deploying generative AI applications on enterprise data, while helping you ensure trust and safety. These new tools streamline workflows, deliver insights at scale, and get AI apps into production quickly. Customers such as Skai have used these capabilities to bring their generative AI solution into production in just two days instead of months.

Here’s how Snowflake Cortex AI and Snowflake ML are accelerating the delivery of trusted AI solutions for the most critical generative AI applications:

Natural language processing (NLP) for data pipelines: Large language models (LLMs) have a transformative potential, but they often batch inference integration into pipelines, which can be cumbersome. Now, with a simple query, developers can infer sentiment or categorize customer reviews across millions of records, improving efficiency and saving time. For instance, TS Imagine implemented generative AI at scale with Snowflake Cortex AI, cutting costs by 30% and saving 4,000 hours previously spent on manual tasks.
Conversational apps: Creating reliable, engaging responses for user questions is now simpler, opening the door to powerful use cases such as self-service analytics and document search via chatbots.
GPU-based model development and deployment: Build powerful, advanced ML models with your preferred Python packages on GPUs or CPUs serving them for inference in containers — all within the same platform as your governed data. Customers such as Avios, CHG Healthcare and Keysight Technologies are already developing container-based models in Snowflake ML.

Optimize NLP pipeline performance with cost-effective LLM batch inference

With rapidly evolving technology, models are available in various sizes, context windows and capabilities, making it essential to select the right one for your specific use case. For instance, if your documents are in multiple languages, an LLM with strong multilingual capabilities is key. However, for simpler NLP tasks such as classification, an advanced model might be excessive and a smaller LLM may be more effective. Cortex LLM functions offer models optimized for specific use cases, such as translation, summarization and classification. These scalable models can handle millions of records, enabling you to efficiently build high-performing NLP data pipelines. However, scaling LLM data processing to millions of records can pose data transfer and orchestration challenges, easily addressed by the user-friendly SQL functions in Snowflake Cortex.

Optimize performance and cost with a broader range of model options

Cortex AI provides easy access to industry-leading models via LLM functions or REST APIs, enabling you to focus on driving generative AI innovations. We offer a broad selection of models in various sizes, context window lengths and language supports. Recent additions include the multilingual embedding model from Voyage, the Llama 3.1 and 3.2 models from Meta and the Jamba-Instruct model from AI21.

With Cortex Playground (public preview soon), you can try out models directly in Snowsight. This no-code interface allows you to quickly experiment with, compare and evaluate models as they become available.

Model customization techniques enable you to optimize models for your specific use case. Snowflake is introducing serverless fine-tuning (generally available soon), allowing developers to fine-tune models for enhanced cost-performance benefits. This fully managed service eliminates the need for developers to build or manage their own infrastructure for training and inference.

Gain multimodal support in COMPLETE function

Enhance AI apps and pipelines with multimodal support for richer responses. With new generative AI capabilities, developers can now process multimodal data, using the most relevant information in their applications. We are enabling multimodal LLM inference (private preview soon) as part of the Cortex COMPLETE function for image inputs using the Llama 3.2 models available in Snowflake Cortex AI. Support for audio, video and image embeddings will follow soon. Expanded multimodal support enriches responses for diverse tasks such as summarization, classification and entity extraction across various media types.

Deliver multimodal analytics with familiar SQL syntax

Database queries are the underlying force that runs the insights across organizations and powers data-driven experiences for users. Traditionally, SQL has been limited to structured data neatly organized in tables. Snowflake will be introducing new multimodal SQL functions (private preview soon) that enable data teams to run analytical workflows on unstructured data, such as images. With these functions, teams can run tasks such as semantic filters and joins across unstructured data sets using familiar SQL syntax.

Confidently process large inference jobs with provisioned throughput capacity

A consistent end-user experience is often a gating factor as developers move beyond proofs of concept. With Provisional Throughput (public preview soon on AWS), customers can reserve dedicated throughput, ensuring consistent and predictable performance for their workloads. Additionally, we launched cross-region inference, allowing you to access preferred LLMs even if they aren’t available in your primary region.

Develop high-quality, conversational AI apps, faster

Snowflake now offers new tools to simplify developing and deploying conversational AI applications.

Advanced document preprocessing for RAG

Earlier this year, we launched Cortex Search to help customers unlock insights from unstructured data and turn vast document collections into AI-ready resources without complex coding. The fully managed retrieval solution enables developers to build scalable AI apps that extract insights from unstructured data within Snowflake’s secure environment. This capability is especially powerful when paired with layout-aware text extraction and chunking functions, which optimize documents for retrieval by streamlining preprocessing through short SQL functions.

Now you can make documents AI-ready faster with two new SQL preprocessing functions. We’re introducing streamlined solutions for processing documents from blob storage (e.g., Amazon S3) into text representations for usage in retrieval-augmented generation (RAG) applications. SQL users can now replace complex document processing pipelines with simple SQL functions from Cortex AI, such as PARSE_DOCUMENT (public preview) and SPLIT_TEXT_RECURSIVE_CHARACTER (private preview). The parsing function takes care of extracting text and layout from documents. Developers do not have to move the raw data from its original storage location. The text splitting function takes care of the chunking of extracted text into segments that are more optimized for indexing and retrieval. Learn more.

Conversational analytics improvements in Cortex Analyst

Expand the scope of accurate, self-service analytics in natural language with Cortex Analyst. Snowflake Cortex Analyst continues to evolve as a fully managed service, providing conversational, self-serve analytics that allow users to seamlessly interact with structured data in Snowflake. Recent updates enhance user experience and analytical depth, including SQL Joins support for Star and Snowflake schemas (public preview) while maintaining high quality, enabling more complex data explorations and richer insights. Additionally, multiturn conversations (public preview) allow users to ask follow-up questions for more fluid interactions. Integration with Cortex Search (public preview) improves the accuracy of generated SQL queries by dynamically retrieving exact or similar literal values for complex, high-cardinality data fields, while API-level role-based access controls strengthen security and governance.

Together, these updates empower enterprises to securely derive accurate, timely insights from their data, reducing the overall cost of data-driven decision-making. To learn more about these new features and related updates check out our Cortex Analyst blog post.

Advanced orchestration and observability tools for LLM apps

Reduce manual integration and orchestration in chat applications with the Cortex Chat API (public preview soon), which simplifies building interactive applications in Snowflake. By combining retrieval and generation into a single API call, you can now build agentic chat apps to talk to both structured and unstructured data. The optimized prompt enables high-quality responses along with citations that reduce hallucinations and increase trust. A single integration endpoint simplifies the application architecture.

Increase AI app trustworthiness with built-in evaluation and monitoring through the new integrated AI Observability for LLM Apps (private preview). This observability suite provides essential tools to enhance evaluation and trust in LLM applications, supporting customers’ AI compliance efforts. These observability features allow app developers to assess quality metrics — such as relevance, groundedness and bias — alongside traditional performance metrics such as latency all during the development process. They also enable thorough monitoring of application logs, enabling organizations to keep a close eye on their AI applications.

AI developers can now seamlessly track and evaluate app performance metrics, helping them choose optimized models, prompts and retrieval services for their specific use cases. Additionally, developers can manage logs and leverage prebuilt monitoring for apps within Snowflake or for external apps using the TruLens open source library, which Snowflake oversees as part of the TruEra acquisition.

New data source integrations

Bring AI to more of your data with new data source integrations. The Snowflake Connector for SharePoint (public preview) enables data teams to build AI applications on top of SharePoint data in Snowflake, without needing to manually set up pipelines or preprocess data, while adhering to existing access policies.

Additionally, now you can enhance chatbot capabilities with Cortex Knowledge Extensions (private preview) on Snowflake Marketplace. These extensions allow data teams to enrich enterprise AI chatbots with recent and proprietary content from third-party providers, such as research or newspaper publications. For publishers and content providers, this opens a new revenue stream while also protecting intellectual property from unauthorized usage, such as for LLM training. And for consumers, it provides faster access to high-quality AI responses, free from concerns over quality or commercial compliance.

Expedite reliable ML insights with GPUs

As organizations accumulate more data in a wide variety of formats, and as modeling techniques continue to get more sophisticated, the tasks of a data scientist and ML engineer are becoming increasingly complex. Snowflake ML provides the building blocks that data science and ML teams need to quickly go from prototype to production for features and models on the same platform they use to govern and manage their data. Organizations such as CHG Healthcare, Stride, IGS Energy and Cooke Aquaculture are building end-to-end sophisticated ML models directly in Snowflake. We recently announced new innovations for developing and serving ML models with distributed GPUs for advanced use cases such as recommendation systems, computer vision, custom embeddings and decision tree models.

Accelerate ML development with GPU-powered notebooks

GPUs offer powerful computing that speeds up resource-intensive ML tasks such as model training. This accelerated compute significantly improves how quickly teams can iterate and deploy models, especially when working with large data sets or using advanced deep learning frameworks such as PyTorch. To support resource-intensive workflows without having to move large volumes of data, and without limitations on the code or libraries they can use, Snowflake ML now supports Container Runtime (public preview on AWS and public preview soon on Azure) accessible through Snowflake Notebooks (generally available), a unified cell-based, interactive development surface that blends Python, SQL and Markdown.

Following internal testing, we found that Snowflake ML APIs in Container Runtime can efficiently execute ML training jobs on GPUs with 3-7x execution speed improvement compared to running the same workload with your Snowflake data using open source libraries outside of the runtime. This fully managed, container-based runtime comes preconfigured with the most popular Python libraries and frameworks, with the flexibility to extend from open source hubs such as PyPI and Hugging Face.

Scale out inference in containers with GPUs

After development, you can serve models for production from the Snowflake Model Registry for any ML, LLM or embedding model using distributed CPUs or GPUs in Snowpark Container Services (generally available in Azure and AWS). Model Serving in Containers (public preview in AWS) allows for faster, more powerful inference using on-demand GPU instances without the need to manually optimize for resource utilization.

Quickly detect model degradation with built-in monitoring

To keep ML models running inference in production, you can use Snowflake’s expanded set of natively integrated machine learning operations (MLOps) features, including Observability for ML Models (public preview). Teams can now quickly track, set alerts on and address degradation, drift and other model metrics directly from built-in dashboards tied to the Snowflake Model Registry. Also built into the platform is ML Explainability (public preview), which allows users to easily compute Shapley values for models logged in the Snowflake Model Registry — whether trained internally or externally.

These new ML monitoring capabilities join the set of MLOps capabilities available in Snowflake ML, including Model Registry, ML Lineage (public preview) and Feature Store (generally available).

Unlock the full power of enterprise data with AI agents in Snowflake Intelligence

Snowflake Intelligence (private preview soon) is a platform to create data agents that empower business users to analyze, summarize and take action from structured and unstructured data, all in one unified, conversational interface. Snowflake Intelligence lets users connect seamlessly to enterprise data — like sales transactions, documents in knowledge bases such as SharePoint, and productivity tools such as Jira and Google Workspace — so business users can generate data-driven insights and take action all in natural language with no technical skills or coding knowledge required.

Learn more

With the latest enhancements in Cortex AI and Snowflake ML, developers can confidently productionize generative AI apps within Snowflake’s secure environment.

Start building AI apps and custom models today by using these resources:

NLP solutions for data pipelines: Try our quickstart for customer reviews analytics using Snowflake Cortex.
Conversational apps: Build a RAG-based chatbot using our quickstart, or join the RAG ’n’ Roll Hackathon for a chance to win $10,000 in prizes.
GPU-based ML: Start training an XGBoost model with GPUs or scaling custom embeddings from Snowflake Notebooks.

Note: This article contains forward-looking statements, including about our future product offerings, and are not commitments to deliver any product offerings. Actual results and offerings may differ and are subject to known and unknown risk and uncertainties. See our latest 10-Q for more information.

Accelerate AI Development with Snowflake

Optimize NLP pipeline performance with cost-effective LLM batch inference

Optimize performance and cost with a broader range of model options

Gain multimodal support in COMPLETE function

Deliver multimodal analytics with familiar SQL syntax

Confidently process large inference jobs with provisioned throughput capacity

Develop high-quality, conversational AI apps, faster

Advanced document preprocessing for RAG

Conversational analytics improvements in Cortex Analyst

Advanced orchestration and observability tools for LLM apps

New data source integrations

Expedite reliable ML insights with GPUs

Accelerate ML development with GPU-powered notebooks

Scale out inference in containers with GPUs

Quickly detect model degradation with built-in monitoring

Unlock the full power of enterprise data with AI agents in Snowflake Intelligence

Learn more

LATEST ARTICLE

See Our Latest

Blog Posts

Announcing Claude Fable 5 on Snowflake Cortex AI

Deploy Snowpark Python via Snowflake CoCo

LGND AI is the 2026 Snowflake Startup Challenge winner

Our Services

Company Address

Popular Post

Announcing Claude Fable 5 on Snowflake Cortex AI

Deploy Snowpark Python via Snowflake CoCo