July 8, 2022

Data Engineering simplified with Snowpark for Python on Snowflake

Snowpark brings deeply integrated, DataFrame-style programming to the languages developers like to use – as well as functions to help them expand more data use cases easily. It enables data engineers, data scientists, and developers coding in languages like Python to take advantage of Snowflake’s powerful platform without having to first move data out of [...]

July 8, 2022

The Oracle of Lake: Predicting Big Data Processing Time at Goldman Sachs

Data Lake is the firm’s big data platform. It consists of a proprietary data store, services, and infrastructure components that are used to house and process petabytes of data every day. The Data Lake platform scales to run hundreds of thousands of ingestions and more than a million exports per day (and we are constantly [...]

July 8, 2022

Large Volume Data Replication from MySQL to Snowflake

In this talk, we will be covering the journey that we took from originally introducing StreamSets into our architecture to where we are today. We will go through the challenges around syncing and replicating large volumes of data from MySQL to Snowflake to support near real-time use cases. Additionally, we will explore how StreamSets, MySQL, [...]

July 8, 2022

Ingest at Scale & CDC Data for High Velocity Source System Infusing Data Science & Advanced Analytics

July 8, 2022

How Ad Astra Increased Agility & Customer Value with StreamSets

Moving from direct database-to-database E-T-L data flow to streaming E-L-T data flow has allowed Ad Astra to increase customer value in many ways. Shorter implementations mean less demand on short-staffed IT teams as well as quicker time-to-value. It has also freed up Ad Astra resources because we only maintain a few pipelines instead of dozens [...]