Federated Learning with CatBoost and Flower (Quickstart Example)ΒΆ
This example demonstrates how to perform CatBoost within Flower using the catboost package.
We use adult-census-income dataset for this example to perform a binary classification task.
Tree-based with bagging method is used for aggregation on the server.
Tree-based bagging aggregationΒΆ
Bagging (bootstrap) aggregation is an ensemble meta-algorithm in machine learning, used for enhancing the stability and accuracy of machine learning algorithms. Here, we leverage this algorithm for learning CatBoost trees in a federated learning environment.
Specifically, each client is treated as a bootstrap by random sub-sampling (data partitioning in FL). At each FL round, all clients boost a number of trees (in this example, 1 tree) based on the local bootstrap samples. Then, the clientsβ trees are aggregated on the server, and concatenates them to the global model from previous round. The aggregated tree ensemble is regarded as the new global model.
For instance, if we consider a scenario with M clients, then at any given federation round R, the bagging models consist of (M*R) trees in total.
Set up the projectΒΆ
Clone the projectΒΆ
Start by cloning the example project:
git clone --depth=1 https://github.com/adap/flower.git _tmp \
&& mv _tmp/examples/quickstart-catboost . \
&& rm -rf _tmp \
&& cd quickstart-catboost
This will create a new directory called quickstart-catboost with the following structure:
quickstart-catboost
βββ quickstart_catboost
β βββ __init__.py
β βββ client_app.py # Defines your ClientApp
β βββ server_app.py # Defines your ServerApp
β βββ task.py # Defines your utilities and data loading
βββ pyproject.toml # Project metadata like dependencies and configs
βββ README.md
Install dependencies and projectΒΆ
Install the dependencies defined in pyproject.toml as well as the quickstart_catboost package.
pip install -e .
Run the projectΒΆ
You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run will make use of the Simulation Engine.
Run with the Simulation EngineΒΆ
[!NOTE] Check the Simulation Engine documentation to learn more about Flower simulations and how to optimize them.
flwr run .
You can also override some of the settings for your ClientApp and ServerApp defined in pyproject.toml. For example:
flwr run . --run-config "num-server-rounds=3 depth=5"
Run with the Deployment EngineΒΆ
Follow this how-to guide to run the same app in this example but with Flowerβs Deployment Engine. After that, you might be interested in setting up secure TLS-enabled communications and SuperNode authentication in your federation.
If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the Flower with Docker documentation.
