--- title: Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection url: https://arxiv.org/abs/2303.16668 labels: [robustness, model poisoning, anomaly detection, autoregressive model, regression, classification] dataset: [MNIST, FashionMNIST] --- # FLANDERS: Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection [ View on GitHub

](https://github.com/adap/flower/blob/main/baselines/flanders) > Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper. **Paper:** [arxiv.org/abs/2303.16668](https://arxiv.org/abs/2303.16668) **Authors:** Edoardo Gabrielli, Gabriele Tolomei, Dimitri Belli, Vittorio Miori **Abstract:** Current defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods. ## About this baseline **What’s implemented:** The code in this directory replicates the results of FLANDERS+\[baseline\] on MNIST and Fashion-MNIST under all attack settings: Gaussian, LIE, OPT, and AGR-MM; with $r=[0.2,0.6,0.8]$ (i.e., the fraction of malicious clients), specifically about tables 1, 3, 10, 11, 15, 17, 19, 20 and Figure 3. **Datasets:** MNIST, FMNIST **Hardware Setup:** AMD Ryzen 9, 64 GB RAM, and an NVIDIA 4090 GPU with 24 GB VRAM. **Estimated time to run:** You can expect to run experiments on the given setup in 2m with *MNIST* and 3m with *Fashion-MNIST*, without attacks. With an Apple M2 Pro, 16gb RAM, each experiment with 10 clients for MNIST runs in about 24 minutes. Note that experiments with OPT (fang) and AGR-MM (minmax) can be up to 5x times slower. **Contributors:** Edoardo Gabrielli, Sapienza University of Rome ([GitHub](https://github.com/edogab33), [Scholar](https://scholar.google.com/citations?user=b3bePdYAAAAJ)) ## Experimental Setup Please, checkout Appendix F and G of the paper for a comprehensive overview of the hyperparameters setup, however here's a summary. **Task:** Image classification **Models:** MNIST (multilabel classification, fully connected, feed forward NN): - Multilevel Perceptron (MLP) - minimizing multiclass cross-entropy loss using Adam optimizer - input: 784 - hidden layer 1: 128 - hidden layer 2: 256 Fashion-MNIST (multilabel classification, fully connected, feed forward NN): - Multilevel Perceptron (MLP) - minimizing multiclass cross-entropy loss using Adam optimizer - input: 784 - hidden layer 1: 256 - hidden layer 2: 128 - hidden layer 3: 64 **Dataset:** Every dataset is partitioned into two disjoint sets: 80% for training and 20% for testing. The training set is distributed across all clients (100) by using the Dirichlet distribution with $\alpha=0.5$, simulating a high non-i.i.d. scenario, while the testing set is uniform and held by the server to evaluate the global model. | Description | Default Value | | ----------- | ----- | | Partitions | 100 | | Evaluation | centralized | | Training set | 80% | | Testing set | 20% | | Distribution | Dirichlet | | $\alpha$ | 0.5 | **Training Hyperparameters:** | Dataset | # of clients | Clients per round | # of rounds | Batch size | Learning rate | Optimizer | Dropout | Alpha | Beta | # of clients to keep | Sampling | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | MNIST | 100 | 100 | 50 | 32 | $10^{-3}$ | Adam | 0.2 | 0.0 | 0.0 | $m - b$ | 500 | | FMNIST | 100 | 100 | 50 | 32 | $10^{-3}$ | Adam | 0.2 | 0.0 | 0.0 | $m - b$ | 500 | Where $m$ is the number of clients partecipating during n-th round and $b$ is the number of malicious clients. The variable $sampling$ identifies how many parameters MAR analyzes. ## Environment Setup ```bash # Use a version of Python >=3.9 and <3.12.0. pyenv local 3.10.12 poetry env use 3.10.12 # Install everything from the toml poetry install # Activate the env poetry shell ``` ## Running the Experiments Ensure that the environment is properly set up, then run: ```bash python -m flanders.main ``` To execute a single experiment with the default values in `conf/base.yaml`. To run custom experiments, you can override the default values like that: ```bash python -m flanders.main dataset=mnist server.attack_fn=lie server.num_malicious=1 ``` To run multiple custom experiments: ```bash python -m flanders.main --multirun dataset=mnist,fmnist server.attack_fn=gaussian,lie,fang,minmax server.num_malicious=0,1,2,3,4,5 ``` ## Expected Results To run all the experiments of the paper (for MNIST and Fashion-MNIST), I've set up a script: ```bash sh run.sh ``` This code will produce the output in the file `outputs/all_results.csv`. To generate the plots and tables displayed below, you can use the notebook in the `plotting/` directory. ### Accuracy over multiple rounds **(left) MNIST, FLANDERS+FedAvg with 80% of malicious clients (b = 80); (right) Vanilla FedAvg in the same setting:** ![acc_over_rounds](_static/screenshot-8.png) ### Precision and Recall of FLANDERS **b = 20:** ![alt text](_static/screenshot-4.png) --- **b = 60:** ![alt text](_static/screenshot-5.png) --- **b = 80:** ![alt text](_static/screenshot-6.png) ### Accuracy w.r.t. number of attackers: **b = 0:** ![alt text](_static/screenshot.png) --- **b = 20:** ![alt text](_static/screenshot-1.png) --- **b = 60:** ![alt text](_static/screenshot-2.png) --- **b = 80:** ![alt text](_static/screenshot-3.png)