Think back just a few years ago when most enterprises were either planning or just getting started on their cloud journeys. The pandemic hit and, virtually overnight, the need to radically change ways of working pushed those cloud journeys into overdrive. Cost-effective adaptability was essential. And the companies that could scale up or scale down quickly were the ones that navigated the pandemic successfully. Migrating to the cloud made that possible.
Today, game-changing benefits of generative AI are creating a renewed impetus to act just as fast and decisively. This time it’s all about ensuring that the data and the platform where it’s processed are ready for the new AI models.
But there’s still a long way to go in an environment where the volume, velocity and complexity of data and data types is constantly increasing. By 2025 it’s estimated that there will be 7 petabytes of data generated every day compared with “just” 2.3 petabytes daily in 2021. And it’s not just any type of data. The majority of it (80%) is now estimated to be unstructured data such as images, videos, and documents — a resource from which enterprises are still not getting much value.
By 2025 it’s estimated that there will be 7 petabytes of data generated every second compared with “just” 2.7 petabytes per second in 2021. And it’s not just any type of data. The majority of it (80%) is now estimated to be unstructured data such as images, videos, and documents — a resource from which enterprises are still not getting much value.
In this data-rich world, organizations understand that their ability to compete from now on will rest on the availability, veracity and accessibility of the data they need. At present, however, while 83% of Accenture’s clients say that real-time data is going to be crucial for competitive advantage over the next two years, just 31% say that they’re managing that data effectively.
In other words, there’s a big gap between aspiration and reality. And as the need to securely share data — both within and beyond the enterprise — becomes mission critical, the ability to manage and create robust and trusted data pipelines is key. Yet today, 55% of enterprises say they can’t trace the lineage of their data from source to endpoint. And with structured and unstructured data held across multiple silos in many different cloud-based and on-premises locations, it’s a huge challenge. But it’s one enterprises have to solve to remain competitive.
Our research supports this. We’ve found that the highest-performing companies are 2.4x more likely to store their data in a specialized, modern data platform in the cloud. Key actions that set them apart? Breaking down data silos, removing duplication, creating trusted data products, reducing the cost of data rework, ensuring more timely insights and cross-functional use cases, and improving user adoption.
The greatest value from large-scale machine learning (ML) and generative AI will be realized when companies can rely on their own data to deliver the unique insights and recommendations that will fundamentally move the performance needle. Then they’ll be able to go from interacting with a generic internet-trained chatbot to generating highly relevant content that leverages up-to-date and potentially confidential enterprise information.
Companies that have real control over their data can put the technology to much more targeted and valuable use. Think, for example, about a life sciences business using a model narrowly trained on its proprietary trial and product data to predict the likelihood of a drug’s success much more accurately, efficiently and quickly than its competitors.
Many modern enterprises have far-flung operations, products and value chains that generate data globally and in a federated way. In order to build more targeted, discrete models like the one in the example above, they need to find a way for teams to share and access data stored on multiple clouds in secure and governed environments.
The ideal solution is to enable usage of the primary, most up-to-date data, without having to copy it from one place to another, all while meeting relevant regulatory requirements, which will continue to evolve with AI.
This approach can avoid significant and unnecessary data storage costs, of course, as well as prevent the creation of yet more data silos. But it’s also the vital means through which to enable strong governance and security by preserving, for example, fine-grained data-access controls. Finally, seamless access — via a trusted virtual “clean room” — to valuable data sets controlled by third parties opens up entirely new opportunities for value creation.
How can companies do all this — move fast and stay safe at the same time? A comprehensive data foundation, with security and governance baked-in at the digital core, is non-negotiable. This foundation must allow every team to trust all the data they use, whether it is proprietary to the enterprise or from other sources, including ecosystem partners.
And this foundation has to control access to data in more complex configurations than ever before. One of the many exciting things about gen AI is its power to democratize access to insights that were only ever previously available to AI specialists and data scientists. But lowering the barriers also raises the risks. Security and governance gain even more prominence.
Many, but by no means all, have successfully tackled phase one of the data challenge: making structured data shareable across corporate lines and to third parties. The second phase, being able to trust the explosion of unstructured, streaming high-velocity information, is still a work in progress for the majority. The third phase, harnessing bespoke large language models (LLMs) and larger-scale ML models tuned or trained with this data, is now just emerging.
Particularly crucial to the second phase is engendering trust in data. This requires a data platform that can bring all the necessary pieces of compute to the data and make them available within the same governance boundary. With our partners at Snowflake, that’s something we help clients to achieve. By providing controls at the data layer and across clouds, Snowflake’s platform enables processing to happen next to the data. This means people enterprise-wide know their AI models are using trusted data every time. Without that assurance, there’s always the risk that models will provide faulty insights.
And for phase three, democratizing and extending the benefits of industry-leading AI and LLMs, what’s needed is a way for everyone (not just AI specialists) to be able to access and use these cutting-edge technologies and apply all their trusted data to train and prompt both custom-built and open source LLMs.
Whatever stage your organization has reached or is aiming for, investing today in a modern data platform for your digital core is a “no regrets” investment. Identify areas of the business with the highest value potential and invest in optimizing how you manage and secure the data pipelines that feed them.
We are increasingly seeing our clients invest in this as a top priority. Generative AI and ML capabilities are rapidly becoming the crucial differentiator for companies across industries. In this world, every business needs to democratize access to these capabilities and ensure the data they use is trusted.
Provided they can do this, they’ll secure the competitive edge by standing out in three key ways:
The post Why a Solid Data Foundation Is the Key to Successful Gen AI appeared first on Snowflake.
Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and […]
Every business has key customer behaviors it aims to drive — whether it’s encouraging repeat purchases, promoting product upgrades or […]
The Energy Sector’s transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts […]