In the age of AI, enterprises are increasingly looking to extract value from their data at scale but often find it difficult to establish a scalable data engineering foundation that can process the large amounts of data required to build or improve models. Designed for processing large data sets, Spark has been a popular solution, yet it is one that can be challenging to manage, especially for users who are new to big data processing or distributed systems.
To empower organizations to build the secure and scalable data foundation required for AI, but without the operational complexity, Snowflake launched Snowpark. With familiar DataFrame-style programming and custom code execution, Snowpark lets teams process their data in Snowflake using Python and other programming languages by automatically handling scaling and performance tuning. Snowflake customers see an average of 4.6x faster performance and 35% cost savings with Snowpark over managed Spark.
To help teams with existing Spark codebases to get up and running with Snowpark faster, we are excited to launch the Snowpark Migration Accelerator. This is a free, self-service code assessment and conversion tool from Snowflake that helps developers move to Snowpark faster and more efficiently. The tool serves two primary functions: assessment and conversion.
The assessment is built by scanning any codebase written in Python or Scala and outputting a readiness score for conversion to Snowpark. From that assessment, the accelerator can automatically convert references from the Spark API to the Snowpark API.
“The Snowpark Migration Accelerator really helped us make the decision on whether to move to Snowflake. It provided us insights as to code compatibility and allowed us to better estimate our migration time.” —Alan Feuerlein, CTO of Travelpass
The Snowpark Migration Accelerator builds an internal model representing the functionality present in the codebase. This model is an Abstract Syntax Tree (AST) that is not dependent on a specific source language. As a result, the tool can take in both code files and notebooks with multiple languages (such as Scala, Python and SQL) at the same time. No source data is ever analyzed by the tool (code is the only input), and it does not connect to any source platform.
Once the model is built, the assessment generates a series of reports designed to explain what is present in the source code. Some of these are high-level summaries that report on how “ready” a codebase is for Snowpark. Others are complete inventories showing where each reference to a given API or SQL statement or internal dependency can be found.
From these inventories, the Snowpark Migration Accelerator will identify exactly what can be converted.
For each element of the Spark API that is identified, the conversion engine in the Snowpark Migration Accelerator will usually perform one of the following:
Note that the conversion capability of the tool is not a silver bullet. It does not execute a complete migration, but rather, it will identify and convert what it can into a functionally equivalent output, compatible with Snowflake.
The tool has been designed to optimize data processing pipelines, specifically those built on Spark and Hive codebases, including:
While the Snowpark Migration Accelerator has been purpose-built to provide acceleration for Spark and Hive codebases, Snowpark supports Python code in general. Although no acceleration is provided for non-Spark code, the tool can still offer valuable information about code size, used technologies, supported libraries and pandas usages. This data can be utilized to guide efforts in orchestrating within Snowflake, optimizing performance and efficiency across various workloads.
Here are a few helpful ideas to keep in mind when using the Accelerator:
Following these best practices will help you get the most out of your experience with the Snowpark Migration Accelerator. Try it today to see how smooth the on-ramp to Snowpark can be.
The Snowpark Migration Accelerator is available now for free just by downloading the installer onto your local machine or container. You can find more information with the following resources:
The post Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake appeared first on Snowflake.
AI is proving that it’s here to stay. While 2023 brought wonder, and 2024 saw widespread experimentation, 2025 will be […]
We are entering a new era for marketing and advertising agencies. From evolving consumer expectations and increasingly stringent privacy regulations […]
Imagine a world where AI recommends an optimal product, speeds up medical breakthroughs and predicts financial trends — this isn’t […]