September 13-16, 2022
Dublin, Ireland + Virtual
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit Europe 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Irish Standard Time (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Back To Schedule
Tuesday, September 13 • 15:05 - 15:45
Automating Cloud-native Spark Jobs with Argo Workflows - Caelan Urquhart & Darko Janjić, Pipekit

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Companies with large computational workloads often use Apache Spark combined with numerous Python packages such as PySpark, NumPy, MLlib, XGBoost, and more. Unfortunately, as teams add the number of jobs running on a single Spark cluster managing dependencies becomes a nightmare. Kubernetes makes it easy to use numerous packages for large data jobs in distributed environments, and Argo Workflows is the best way to run pipelines on Kubernetes. This talk demonstrates how to orchestrate common Spark jobs with Argo Workflows, from the architecture to resource and workflow definitions. We'll show how to provision Spark and Argo Workflows on Kubernetes to process large data jobs. We'll also show how Argo Workflows and Kubernetes provide distinct scaling and stability advantages for Spark users by running some example jobs. We hope that listeners of this talk will learn the pros and cons of orchestrating their Spark job on Kubernetes with Argo Workflows, instead of traditional local or cloud environments.

avatar for Caelan Urquhart

Caelan Urquhart

Co-founder, CEO, Pipekit
Caelan is the Co-founder and CEO of Pipekit, a control plane for Argo Workflows that enables massive data pipelines in minutes, saving engineering time and cloud spend. He's passionate about using distributed systems to solve data engineering challenges, and is a contributor to the... Read More →
avatar for Darko Janjić

Darko Janjić

Senior Software Engineer, Pipekit
Darko is a Senior Software Engineer at Pipekit, a control plane for Argo Workflows that enables massive data pipelines in minutes. He has extensive experience with distributed systems, virtualization, and cloud engineering across a variety of industries. Besides engineering, Darko... Read More →

Tuesday September 13, 2022 15:05 - 15:45 IST
Liffey Meeting Room 3 (Level 1)