Federated Learning with XGBoost and Flower (Comprehensive Example)¶

View on GitHub

This example demonstrates a comprehensive federated learning setup using Flower with XGBoost. We use HIGGS dataset to perform a binary classification task. This examples uses Flower Datasets to retrieve, partition and preprocess the data for each Flower client. It differs from the xgboost-quickstart example in the following ways:

  • Customised FL settings.

  • Customised partitioner type (uniform, linear, square, exponential).

  • Centralised/distributed evaluation.

  • Bagging/cyclic training methods.

  • Support of scaled learning rate.

Training Strategies¶

This example provides two training strategies, bagging aggregation (docs) and cyclic training (docs).

Bagging Aggregation¶

Bagging (bootstrap) aggregation is an ensemble meta-algorithm in machine learning, used for enhancing the stability and accuracy of machine learning algorithms. Here, we leverage this algorithm for XGBoost trees.

Specifically, each client is treated as a bootstrap by random subsampling (data partitioning in FL). At each FL round, all clients boost a number of trees (in this example, 1 tree) based on the local bootstrap samples. Then, the clients’ trees are aggregated on the server, and concatenates them to the global model from previous round. The aggregated tree ensemble is regarded as a new global model.

This way, let’s consider a scenario with M clients. Given FL round R, the bagging models consist of (M * R) trees.

Cyclic Training¶

Cyclic XGBoost training performs FL in a client-by-client fashion. Instead of aggregating multiple clients, there is only one single client participating in the training per round in the cyclic training scenario. The trained local XGBoost trees will be passed to the next client as an initialised model for next round’s boosting.

Set up the project¶

Clone the project¶

Start by cloning the example project:

git clone --depth=1 https://github.com/adap/flower.git _tmp \
        && mv _tmp/examples/xgboost-comprehensive . \
        && rm -rf _tmp \
        && cd xgboost-comprehensive

This will create a new directory called xgboost-comprehensive with the following structure:

xgboost-comprehensive
├── xgboost_comprehensive
│   ├── __init__.py
│   ├── client_app.py   # Defines your ClientApp
│   ├── server_app.py   # Defines your ServerApp
│   └── task.py         # Defines your model, training and data loading
├── pyproject.toml      # Project metadata like dependencies and configs
└── README.md

Install dependencies and project¶

Install the dependencies defined in pyproject.toml as well as the xgboost_comprehensive package.

pip install -e .

Run the project¶

You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run will make use of the Simulation Engine.

Run with the Simulation Engine¶

flwr run .

You can also override some of the settings for your ClientApp and ServerApp defined in pyproject.toml. For example:

# To run bagging aggregation for 5 rounds evaluated on centralised test set
flwr run . --run-config "train-method='bagging' num-server-rounds=5 centralised-eval=true"

# To run cyclic training with linear partitioner type evaluated on centralised test set:
flwr run . --run-config "train-method='cyclic' partitioner-type='linear' centralised-eval-client=true"

[!TIP] For a more detailed walk-through check our XGBoost tutorial. To extend the aggregation strategy for saving, logging, or other functions, please refer to our advanced-pytorch example.

Run with the Deployment Engine¶

[!NOTE] An update to this example will show how to run this Flower application with the Deployment Engine and TLS certificates, or with Docker.

Expected Experimental Results¶

Bagging aggregation experiment¶

XGBoost with Flower and Bagging strategy

The figure above shows the centralised tested AUC performance over FL rounds with bagging aggregation strategy on 4 experimental settings. One can see that all settings obtain stable performance boost over FL rounds (especially noticeable at the start of training). As expected, uniform client distribution shows higher AUC values than square/exponential setup.

Cyclic training experiment¶

XGBoost with Flower and Cyclic strategy

This figure shows the cyclic training results on centralised test set. The models with cyclic training requires more rounds to converge because only a single client participate in the training per round.

Feel free to explore more interesting experiments by yourself !