Federated Learning on Non-IID Data Silos: An Experimental Study

View on GitHub

Paper: arxiv.org/abs/2102.02079

Authors: Qinbin Li, Yiqun Diao, Quan Chen, Bingsheng He

Abstract: Due to the increasing privacy concerns and data regulations, training data have been increasingly fragmented, forming distributed databases of multiple “data silos” (e.g., within different organizations and countries). To develop effective machine learning services, there is a must to exploit data from such distributed databases without exchanging the raw data. Recently, federated learning (FL) has been a solution with growing interests, which enables multiple parties to collaboratively train a machine learning model without exchanging their local data. A key and common challenge on distributed databases is the heterogeneity of the data distribution among the parties. The data of different parties are usually non-independently and identically distributed (i.e., non-IID). There have been many FL algorithms to address the learning effectiveness under non-IID data settings. However, there lacks an experimental study on systematically understanding their advantages and disadvantages, as previous studies have very rigid data partitioning strategies among parties, which are hardly representative and thorough. In this paper, to help researchers better understand and study the non-IID data setting in federated learning, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. Moreover, we conduct extensive experiments to evaluate state-of-the-art FL algorithms. We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases. Our experiments provide insights for future studies of addressing the challenges in “data silos”.

About this baseline

What’s implemented: The code in this directory replicates many experiments from the aforementioned paper. Specifically, it contains implementations for four FL protocols, FedAvg (McMahan et al. 2017), SCAFFOLD (Karimireddy et al. 2019), FedProx (Li et al. 2018), and FedNova (Wang et al. 2020). The FL protocols are evaluated across various non-IID data partition strategies across clients on three image classification datasets MNIST, CIFAR10, and Fashion-mnist.

Datasets: MNIST, CIFAR10, and Fashion-mnist from PyTorch’s Torchvision

Hardware Setup: These experiments were run on a linux server with 56 CPU threads with 250 GB Ram. There are 105 configurations to run per seed and at any time 7 configurations have been run parallelly. The experiments required close to 12 hrs to finish for one seed. Nevertheless, to run a subset of configurations, such as only one FL protocol across all datasets and splits, a machine with 4-8 threads and 16 GB memory can run in reasonable time.

Contributors: Aashish Kolluri, PhD Candidate, National University of Singapore

Experimental Setup

Task: Image classification

Model: This directory implements CNNs as mentioned in the paper (Section V, paragraph 1). Specifically, the CNNs have two 2D convolutional layers with 6 and 16 output channels, kernel size 5, and stride 1.

Dataset: This directory has three image classification datasets that are used in the baseline, MNIST, CIFAR10, and Fashion-mnist. Further, five different data-splitting strategies are used including iid and four non-iid strategies based on label skewness. In the first non-iid strategy, for each label the data is split based on proportions sampled from a dirichlet distribution (with parameter 0.5). In the three remaining strategies, each client gets data from randomly chosen #C labels where #C is 1, 2, or 3. For the clients that are supposed to receive data from the same label the data is equally split between them. The baseline considers 10 clients. The following table shows all dataset and data splitting configurations.

Datasets

#classes

#partitions

partitioning method

partition settings

CIFAR10, MNIST, Fashion-mnist

10

10

IID
dirichlet
sort and partition
sort and partition
sort and partition

NA
distribution parameter 0.5
1 label per client
2 labels per client
3 labels per client

Training Hyperparameters: There are four FL algorithms and they have many common hyperparameters and few different ones. The following table shows the common hyperparameters and their default values.

Description

Default Value

total clients

10

clients per round

10

number of rounds

50

number of local epochs

10

client resources

{‘num_cpus’: 4.0, ‘num_gpus’: 0.0 }

dataset name

cifar10

data partition

Dirichlet (0.5)

batch size

64

momentum for SGD

0.9

For FedProx algorithm the proximal parameter is tuned from values {0.001, 0.01, 0.1, 1.0} in the experiments. The default value is 0.01.

Environment Setup

# Setup the base poetry environment from the niid_bench directory
# Set python version
pyenv local 3.10.6
# Tell poetry to use python 3.10
poetry env use 3.10.6
# Now install the environment
poetry install
# Start the shell
poetry shell

Running the Experiments

You can run four algorithms FedAvg, SCAFFOLD, FedProx, and FedNova. To run any of them, use any of the corresponding config files. For instance, the following command will run with the default config provided in the corresponding configuration files.

# Run with default config, it will run FedAvg on cpu-only mode
python -m niid_bench.main
# Below to enable GPU utilization by the server and the clients.
python -m niid_bench.main server_device=cuda client_resources.num_gpus=0.2

To change the configuration such as dataset or hyperparameters, specify them as part of the command line arguments.

python -m niid_bench.main --config-name scaffold_base dataset_name=mnist partitioning=iid # iid
python -m niid_bench.main --config-name fedprox_base dataset_name=mnist partitioning=dirichlet # dirichlet
python -m niid_bench.main --config-name fednova_base dataset_name=mnist partitioning=label_quantity labels_per_client=3 # sort and partition

Expected Results

We provide the bash script run_exp.py that can be used to run all configurations. For instance, the following command runs all of them with 4 configurations running at the same time. Consider lowering --num-processes if your machine runs slow. With --num-processes 1 one experiment will be run at a time.

python run_exp.py --seed 42 --num-processes 4

The above command generates results that can be parsed to get the best accuracies across all rounds for each configuration which can be presented in a table (similar to Table 3 in the paper).

Dataset

partitioning method

FedAvg

SCAFFOLD

FedProx

FedNova

MNIST

IID
Dirichlet (0.5)
Sort and Partition (1)
Sort and Partition (2)
Sort and Partition (3)

99.09 ± 0.05
98.89 ± 0.07
19.33 ± 11.82
96.86 ± 0.30
97.86 ± 0.34

99.06 ± 0.15
99.07 ± 0.06
9.93 ± 0.12
96.92 ± 0.52
97.91 ± 0.10

99.16 ± 0.04
99.02 ± 0.02
51.79 ± 26.75
96.85 ± 0.15
97.85 ± 0.06

99.05 ± 0.06
98.03 ± 0.06
52.58 ± 14.08
96.65 ± 0.39
97.62 ± 0.07

FMNIST

IID
Dirichlet (0.5)
Sort and Partition (1)
Sort and Partition (2)
Sort and Partition (3)

89.23 ± 0.45
88.09 ± 0.29
28.39 ± 17.09
78.10 ± 2.51
82.43 ± 1.52

89.33 ± 0.27
88.44 ± 0.25
10.00 ± 0.00
33.80 ± 41.22
80.32 ± 5.03

89.42 ± 0.09
88.15 ± 0.42
32.65 ± 6.68
78.05 ± 0.99
82.99 ± 0.48

89.36 ± 0.09
88.22 ± 0.12
16.86 ± 9.30
71.67 ± 2.34
81.97 ± 1.34

CIFAR10

IID
Dirichlet (0.5)
Sort and Partition (1)
Sort and Partition (2)
Sort and Partition (3)

71.32 ± 0.33
62.47 ± 0.43
10.00 ± 0.00
51.17 ± 1.09
59.11 ± 0.87

71.66 ± 1.13
68.08 ± 0.96
10.00 ± 0.00
49.42 ± 2.18
61.00 ± 0.91

71.26 ± 1.18
65.63 ± 0.08
12.71 ± 0.96
50.44 ± 0.79
59.20 ± 1.18

70.69 ± 1.14
63.89 ± 1.40
10.00 ± 0.00
46.9 ± 0.66
57.83 ± 0.42