September 13-16, 2022
Dublin, Ireland + Virtual
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit Europe 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Irish Standard Time (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Back To Schedule
Friday, September 16 • 15:55 - 16:35
Training AI To Code Using The Largest Code Dataset (Project CodeNet) - Tommy Li & Animesh Singh, IBM

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Project CodeNet is a large dataset of 14 million code samples totaling 500 million lines of code in 55 programming languages. It enables machine learning for code, like finding code similarity, extracting semantic context, and even translating between different programming languages. Using the Machine Learning Exchange (MLX), a Linux Foundation for AI & Data Sandbox Project, we demonstrate how Project CodeNet can be leveraged to classify code and analyze code complexity in three steps. Using DataShim we turn domain specific subsets of the data into Kubernetes Custom Resources. Running Jupyter notebooks on Kubernetes we use the datasets to train deep learning models. The models are then containerized and served for inferencing on Kubernetes. For each of these steps, MLX generates Kubeflow Pipelines on Tekton so data scientists are not required to write Kubernetes specific code. Using the curated datasets, example notebooks and pre-trained models, teams of data scientists can utilize the Machine Learning Exchange to bring machine learning and AI into the world of code.

avatar for Tommy Li

Tommy Li

Senior Software Developer, IBM
Tommy Li is a senior software developer in IBM focusing on Cloud, Kubernetes, and Machine Learning. He is one of the Kubeflow committers and worked on various open-source projects related to Kubernetes, Microservice, and deep learning applications to provide advanced use cases on... Read More →
avatar for Animesh Singh

Animesh Singh

Distinguished Engineer and CTO - Watson Data and AI OSS Platform, IBM
Animesh Singh is CTO and Director for IBM Watson Data and AI Open Technology, responsible for Data and AI Open Technology strategy. Creating, designing and implementing IBM’s Data and AI engine for AI and ML platform, leading IBM`s Trusted AI efforts, driving the strategy and execution... Read More →

Friday September 16, 2022 15:55 - 16:35 IST
Wicklow Meeting Room 2 (Level 2)