📆 Thursday, August 29, 10:00 AM - 1:00 PM
📍 Centre de Convencions Internacional de Barcelona
KDD 2024:
Privacy-Preserving Federated Learning using Flower Framework
AI projects often face the challenge of limited access to meaningful amounts of training data. In traditional approaches, collecting data in a central location can be problematic, especially in industry settings with sensitive and distributed data. However, there is a solution - moving the computation to the data through Federated Learning.
Federated Learning, a distributed machine learning approach, offers a promising solution by enabling model training across devices. It is a data minimization approach where direct access to data is not required. Furthermore, federated learning can be combined with techniques like differential privacy, secure aggregation, homomorphic encryption, and others, to further enhance privacy protection. In this hands-on tutorial, we delve into the realm of privacy-preserving machine learning using federated learning, leveraging the Flower framework which is specifically designed to simplify the process of building federated learning systems, as our primary tool.
Moreover, we present the foundations of federated learning, explore how different techniques can enhance its privacy aspects, how it is being used in real-world settings today and a series of practical, hands-on code examples that showcase how you can federate any AI project with Flower, an open-source framework for all-this federated.
Target Audience and Prerequisites:
This tutorial is suitable for researchers, machine learning practitioners, data scientists and developers interested in privacy-preserving machine learning techniques. Basic knowledge of machine learning concepts and Python programming is recommended. No prior experience with federated learning or Flower framework is required.
Meet the tutors
Mohammad Naseri
Research Scientist
mohammad@flower.aiMohammad focused on the privacy and security aspects of Flower framework. He recently completed his Ph.D. at University College London (UCL). His research primarily revolves around the field of security and privacy in machine learning, with a particular focus on federated learning. During his Ph.D. journey, Mohammad has completed research internships at Microsoft Research and Telefonica. His work has been published in venues like IEEE S&P, CCS, NDSS, ICML, and PETs.
Javier Fernandez
Lead Research Scientist
javier@flower.aiJavier works on the core framework and develops the Flower Simulation Engine, which allows to run Federated Learning workloads in a resource-aware manner and scale these to thousands of active clients. Javier interests lie in the intersection of Machine Learning and Systems, and more concretely running on-device ML workloads, a key component in Federated Learning. Javier got his PhD in Computer Science from the University of Oxford in 2021. Before joining Flower Labs, he worked as a research scientist at Samsung AI (Cambridge, UK)
Heng Pan
Research Scientist
pan@flower.aiPan specializes in federated learning and the integration of secure aggregation functionalities into the Flower framework. He holds a masters degree from the university of Cambridge, where he collaborated with Prof. Nic Lane. He was part of the team from university of Cambridge that won the first prize in the UK-US PETs Prize Challenge for creating a privacy-centric solution to detect anomalies in the SWIFT network. His expertise lies at the intersection of machine learning, federated learning, and data security.
Yan Gao
Research Scientist
yan@flower.aiYan focused is at the forefront of federated learning innovation with different types of models, including XGBoost, LLMs, etc. Prior to this role, he completed his PhD at the University of Cambridge within the Machine Learning System Lab. His research interests include machine learning, federated learning, self-supervised learning, and optimisation techniques. Throughout his doctoral studies, he focused on pioneering research in federated self-supervised learning, specifically targeting the challenge of working with unlabelled data across diverse domains such as audio, image, and video. This groundbreaking work has been recognised and published in several top-tier international conferences and journals, including ICCV, ECCV, ICLR, INTERSPEECH, ICCASP, and JMLR, marking significant contributions to the field of federated learning and its applications.
Tutorial outline
- Introduction to Federated Learning (30 mins)
- Challenges of centralized learning
- Introduce the limitations of traditional centralized machine learning approaches, such as data privacy concerns, data silos, and scalability issues.
- Define federated learning and its key components, including client devices, server aggregator, and global model.
- Explain the federated learning workflow, including model initialization, client updates, aggregation, and model updating.
- Explain and compare common aggregation strategies
- Implementing Federated Learning with Flower (30 mins)
- Step-by-step federated learning environment using the Flower framework
- Live demo: Implementing a simple task for image classification.
- Flower Datasets (25 mins)
- Introduce Flower Datasets library to create datasets for federated learning
- Live demo: Present different approaches for partitioning.
- Privacy and Security Aspects of Federated Learning (45 mins)
- Differential Privacy (DP) Introduction
- Introduce the concept of differential privacy and its relevance to federated learning.
- Discuss mechanisms for incorporating differential privacy into federated learning.
- Explain the importance of secure aggregation in federated learning
- Explore cryptographic techniques for secure aggregation, including homomorphic encryption and secure multi-party computation.
- Live demo: integrate DP and SecAgg using Flower
- LLM training using FL in Flower (20 mins)
- Overview of Language Model Training
- Live demo: Hands-on Session with Flower
- Other advanced topics in Federated Learning (15 mins)
- Heterogeneous clients, underlying data distributions, communication overheads, high degree of parallelism
- Q&A and Wrap-up (15 mins)