StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems¶

View on GitHub


Authors: Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis

Abstract: Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients’ data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.

About this baseline¶

What’s implemented: The code in this directory replicates the experiments in the above paper for TON IoT datasets, which proposed the StatAvg algorithm. It replicates the Figure 3 of the paper.

Datasets: TON IoT dataset (linux memory logs). Online here.

Hardware Setup: These experiments were run on a desktop machine with 16 CPU threads. Any machine with 4 CPU cores or more would be able to run it in a reasonable amount of time.

Contributors: Pavlos Bouzinis (Metamind Innovations)

Experimental Setup¶

Task: Classification of cyberattacks.

Model: A simple MLP with three hidden layers.

Dataset: This baseline only includes the TON IoT dataset. By default, it will be partitioned into 5 clients following a stratified split based on the labels. The settings are as follows:




partitioning method

partition settings




stratified based on labels

6 classes per client

Training Hyperparameters: The following table shows the main hyperparameters for this baseline with their default value.


Default Value

total clients


clients per round


number of rounds


local epochs


client resources

{‘num_cpus’: 2.0, ‘num_gpus’: 0.0 }

data partition

stratified based on labels (6 classes per client)



Environment Setup¶

To construct the Python environment, simply run:

# Set directory to use python 3.10 (install with `pyenv install <version>` if you don't have it)
pyenv local 3.10.13

# Tell poetry to use python 3.10
poetry env use 3.10.13

# Install
poetry install

Dataset Preparation¶

You can download the TON_IoT dataset by accessing the following link. Please navigate to TON_IoT datasets/Train_Test_datasets/Train_Test_Linux_dataset and download the file Train_test_linux_memory.csv. Then, rename the downloaded file to dataset.csv and place it in the dataset/ directory. If you want to run the experiments with your own data, ensure your dataset is also named dataset.csv and located in the same directory. The dataset is preprocessed using, which you can modify if you wish to add custom preprocessing steps.

Running the Experiments¶

To run StatAvg with TON IoT baseline, ensure you have activated your Poetry environment (execute poetry shell from this directory), then:

python -m statavg.main # this will run using the default settings in the `conf/base.yaml`

# you can override settings directly from the command line
python -m statavg.main num_rounds=20 # will set number of rounds to 20

It is noted that an auxiliary directory is created in the runtime to persist client state, which includes the data scalers/normalizers of each client. By default, this directory is deleted upon the termination of the execution. To prevent automatic deletion (in this case, it is recommended to manually delete the directory from previous runs), execute the following:

# disable automated removal of the auxiliary directory of scalers
python -m statavg.main delete_scaler_dir=False

Expected Results¶

To reproduce the results of the paper (Fig. 3., StatAvg), simply run:

python -m statavg.main   # default settings

You can also reproduce the results with FedAvg as a baseline by running:

python -m statavg.main --config-name fedavg    # run with FedAvg

The expected results should look similar to the following figure:

Testing Accuracy vs Rounds for StatAvg
StatAvg Figure

It is noted that the results are saved into a pickle file in the directory outputs/, which will be automatically created when the experiments are run. In the paper, server-side evaluation is not implemented, as it is considered that the server does not own any data. However, it can be enabled by executing:

# enable server-side evaluation with the data ratio of your preference. Default settings do not include this option.
python -m statavg.main include_testset.flag=true include_testset.ratio=0.15

Disclaimer: Please note that the results presented above differ from those in the paper. Since the experiments for the paper were conducted, the dataset authors have made slight modifications to the dataset. Although these changes result in a decrease in accuracy (approximately a 10% drop), StatAvg is still expected to consistently outperform FedAvg, as demonstrated in the paper.