Structuring the Unstructured Data: Powered by Snowflake Cortex AI Functions

For years, the multilayered data architecture of raw → transformed → curated has been the gold standard for transforming structured data into analytics-ready assets, bringing discipline and trust to the data lifecycle. In this established pipeline, raw ingests raw data, preserving its original state; transformed cleanses, enriches and integrates this data into a conformed view; and curated delivers highly curated, optimized data for direct business consumption. This approach brings discipline, clarity and trust to the data lifecycle.

But what about the vast universe of unstructured data that organizations generate daily? Valuable information is locked within call transcripts, support tickets and legal contracts, as well as in images and videos. Despite its immense potential, this data often languishes in fragmented silos, managed by ad hoc scripts. This disjointed approach leads to inconsistent insights, slower decision-making and a significant missed opportunity to unlock its true value.

It’s time to apply the same rigor to unstructured data.

We are introducing a powerful new way to structure the unstructured data, powered by Snowflake Cortex AI Functions — a powerful, repeatable workflow that brings unstructured data directly into your data warehouse and transforms it into structured, actionable insights. At its heart is a reimagined transformed stage, which uses Cortex AI Functions to transform raw unstructured data into extracted entities, sentiment scores, summaries and more, directly in SQL. From there, these enriched outputs flow seamlessly into the curated layer, ready to power business intelligence (BI) dashboards, machine learning (ML) pipelines and natural-language exploration with Snowflake Cortex Analyst.

Introducing the new transformed layer for unstructured data 

In this framework, the transformed layer is the critical link between messy, unstructured text and structured, measurable analytics. It’s where raw text becomes something the business can trend, measure and act on.

Key principles of this layer include:

  • Stay native: Process all unstructured data directly in Snowflake using Cortex AI Functions. There’s no need to incur data transfer taxes for natural language processing, which simplifies your architecture and improves governance.

  • Align with business: Focus on extracting concepts that are meaningful to the business, such as identifying a call’s escalation reason, a contract’s key terms or a customer’s buying stage.

  • Create reusable assets: Create structured data that can feed multiple downstream applications, from BI dashboards to ML models and operational systems, maintaining a single source of truth.

The transformed layer is all about transforming the text data itself, enriching it with meaningful context before it’s ever queried.

The workflow for unstructured data

The workflow follows a familiar pattern, but with a new layer of intelligence:

  • Raw layer: This initial layer leverages Snowflake OpenFlow to connect to and ingest raw unstructured data from any source. This layer holds the full, unedited text along with any metadata, providing a foundation for traceability and auditing.

  • Transformed layer: This is where the value is generated. Cortex AI Functions can turn raw text, audio and image data into an easily consumable structured format.  

  • Curated layer: This layer integrates the newly structured data with other enterprise data sets. Here, you create curated tables with key performance indicators (KPIs) and business-critical metrics.

  • Consumption layer: The final destination for your insights. The data is now ready to be consumed by BI tools, ML pipelines and Cortex Analyst for natural-language queries.

Powering the transformed layer with Cortex AI Functions

Snowflake’s Cortex AI Functions are the engine of the transformed layer for unstructured data, designed to unlock insights from text directly in your data warehouse. Here are some examples of Cortex AI Functions. For more information, refer to this blog post and Snowflake documentation

  • AI_COMPLETE: Use this general-purpose function for extracting key information or generating a concise summary from a single text or image record.

  • AI_CLASSIFY: Categorize content into a predefined business taxonomy, such as sorting customer calls into categories such as “billing_issue,” “technical_support” or “cancellation.”

  • AI_FILTER: Quickly identify rows that meet specific, business-defined conditions. This is perfect for filtering out nonessential data or flagging important events such as detecting whether a support ticket is a complaint.

  • AI_SIMILARITY: Find similar cases or documents, which is ideal for matching new issues to known problems for faster resolution.

  • AI_AGG / AI_SUMMARIZE_AGG: Summarize insights across a large number of records to generate high-level summaries for executive reporting.

  • AI_EMBED: Generate vector embeddings for text or images, enabling advanced semantic search and similarity comparisons.

  • AI_TRANSCRIBE: Convert spoken language from audio files into text, making audio data searchable and analyzable within Snowflake.

These functions enable you to move beyond simple keyword searches and perform sophisticated, business-aligned analysis on your text data in a consistent and governed way.

Real-world example: Call center analytics

Imagine a customer service organization with thousands of call transcripts, but managers can’t easily get answers to critical questions such as:

  • Why are customers calling?

  • Which cases are escalations?

  • How is customer sentiment trending?

  • Which known issues are recurring?

With the analytics layer and Cortex AI Functions, you can turn these questions into a repeatable workflow. The first step is to transform individual call transcripts into structured, row-level data.

When the original file is audio, Snowflake’s AI_TRANSCRIBE can be used to directly transcribe text from the audio file.

Here is a single SQL query that demonstrates how to use multiple Cortex AI Functions to transform a raw transcript text, after being transcribed from audio using AI_TRANSCRIBE, into a structured record.

Example output:

Using AI_AGG to create executive summaries

While functions such as AI_CLASSIFY and AI_FILTER work on a row-by-row basis, AI_AGG is an aggregate function that consolidates insights across many records. It’s the perfect tool for the curated layer of your framework, where you’ll create high-level, curated summaries for executive consumption.

Here is a simple example showing how AI_AGG can take a set of call transcripts and summarize the key issues into a single, cohesive statement.

Benefits of the structured framework for unstructured data

By applying the structured multilayered framework to your unstructured data, you’ll gain:

  • Governance and lineage: Keep all unstructured processing within Snowflake, maintaining a full audit trail and lineage from raw text to structured insight.

  • Consistency and reusability: Build one enrichment pipeline that can serve multiple business teams, eliminating data silos and inconsistent definitions.

  • Scalability and trust: Scale the framework to any domain, from support transcripts to legal contracts, and trace every structured fact back to its source text, building confidence in the data.

Conclusion

Ultimately, this structured approach to unstructured data, powered by Snowflake Cortex AI Functions, is transformative. It empowers you to finally treat your unstructured data — your most valuable untapped asset — with the same level of discipline, governance and rigor that you apply to the rest of your data ecosystem.

Ready to get started?

  1. Identify a high-value unstructured source, such as customer support tickets or sales calls.

  2. Define the specific values you want to extract from that text.

  3. Implement your transformed layer for unstructured data in Snowflake using Cortex AI Functions.

By bringing unstructured content into a structured multilayered framework, you can stop treating it as a difficult afterthought and start turning it into a trusted driver of strategic business decisions.

LATEST ARTICLE

See Our Latest

Blog Posts

admin December 24th, 2025

Each year, the Modern Marketing Data Stack report captures a snapshot of where marketers are heading — and what’s changing […]

admin December 24th, 2025

For years, the multilayered data architecture of raw → transformed → curated has been the gold standard for transforming structured […]

admin December 24th, 2025

Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and […]