Because human-machine interaction using natural language is now possible with large language models (LLMs), more data teams and developers can bring AI to their daily workflows. To do this efficiently and securely, teams must decide how they want to combine the knowledge of pre-trained LLMs with their organization’s private enterprise data in order to deal with the hallucinations (that is, incorrect responses) that LLMs can generate due to the fact that they’ve only been trained on data available up to a certain date.
To reduce these AI hallucinations, LLMs can be combined with private data sets via processes that either don’t require LLM customization (such as prompt engineering or retrieval augmented generation) or that do require customization (like fine-tuning or retraining). To decide where to start, it is important to make trade-offs between the resources and time it takes to customize AI models and the required timelines to show ROI on generative AI investments.
While every organization should keep both options on the table, to quickly deliver value, the key is to identify and deploy use cases that can deliver value using prompt engineering and retrieval augmented generation (RAG), as these can be fast and cost-effective approaches to get value from enterprise data with LLMs.
To empower organizations to deliver fast wins with generative AI while keeping data secure when using LLMs, we are excited to announce Snowflake Cortex LLM functions are now available in public preview for select AWS and Azure regions. With Snowflake Cortex, a fully managed service that runs on NVIDIA GPU-accelerated compute, there is no need to set up integrations, manage infrastructure or move data outside of the Snowflake governance boundary to use the power of industry-leading LLMs from Mistral AI, Meta and more.
So how does Snowflake Cortex make AI easy, whether you are doing prompt engineering or RAG? Let’s dive into the details and check out some code along the way.
In Snowflake Cortex, there are task-specific functions that work out of the box without the need to define a prompt. Specifically, teams can quickly and cost-effectively execute tasks such as translation, sentiment analysis and summarization. All that an analyst or any other user familiar with SQL needs to do is point the specific function below to a column of a table containing text data and voila! Snowflake Cortex functions take care of the rest — no manual orchestration, data formatting or infrastructure to manage. This is particularly useful for teams constantly working with product reviews, surveys, call transcripts and other long-text data sources traditionally underutilized within marketing, sales and customer support teams.
SELECT SNOWFLAKE.CORTEX.SUMMARIZE(review_text) FROM reviews_table LIMIT 10;
Of course, there are going to be many use cases where customization via prompts becomes useful. For example:
All of these and more can quickly be accomplished with the power of industry-leading foundation models from Mistral AI (Mistral Large, Mistral 8x7B, Mistral 7B), Google (Gemma-7b) and Meta (Llama2 70B). All of these foundation LLMs are accessible via the complete function, which just like any other Snowflake Cortex function can run on a table with multiple rows without any manual orchestration or LLM throughput management.
SELECT SNOWFLAKE.CORTEX.COMPLETE( 'mistral-large', CONCAT('Summarize this product review in less than 100 words. Put the product name, defect and summary in JSON format: <review>', content, '</review>') ) FROM reviews LIMIT 10;
For use cases such as chatbots on top of documents, it may be costly to put all the documents as context in the prompt. In such a scenario, a different approach may be more cost effective by minimizing the volume of tokens (a general rule of thumb is that 75 words approximately equals 100 tokens) going into the LLM. A popular framework to solve this problem without having to make changes to the LLM is RAG, which is easy to do in Snowflake.
Let’s go over the basics of RAG before jumping into how to do this in Snowflake.
RAG is a popular framework in which an LLM gets access to a specific knowledge base with the most up-to-date, accurate information available before generating a response. Because there is no need to retrain the model, this extends the capability of any LLM to specific domains in a cost-effective way.
To deploy this retrieval, augmentation and generation framework teams need a combination of:
Now that we understand how RAG works in general, how can we apply it to Snowflake? Using the Snowflake platform’s rich foundation for data governance and management, which includes vector data type (in private preview), developing and deploying an end-to-end AI app using RAG is possible without integrations, infrastructure management or data movement using three key features:
Here is how these features map to the key architecture components of a RAG framework:
Ready to try Snowflake Cortex and its tightly integrated ecosystem of features that enable fast prototyping and agile deployment of AI apps in Snowflake? Get started with one of these resources:
To watch live demos and ask questions of Snowflake Cortex experts, sign up for one of these events:
Want to network with peers and learn from other industry and Snowflake experts about how to use the latest generative AI features? Make sure to join us at Snowflake Data Cloud Summit in San Francisco this June!
The post Easy and Secure LLM Inference and Retrieval Augmented Generation (RAG) Using Snowflake Cortex appeared first on Snowflake.
The stage is set for a new era in marketing, and marketers have never had so much data and technology […]
The Snowflake AI Data Cloud has democratized data for thousands of customers, removing data silos and powering data sharing and […]
Adtech and martech companies are engaged in a fierce battle for audience attention. Customers are bombarded with thousands of ads […]