Introduction to XGBoost
XGBoost - or eXtreme Gradient Boosting - is an open-source library which provides a regularizing gradient boosting framework for Python, C++, Java, R, Julia, Perl, and Scala. It implements machine learning algorithms based on the gradient boosting concept, where a single model is created from an ensemble of weak learners (decision trees). This is commonly referred as a Gradient Boosting Decision Trees (GBDT), a decision tree ensemble learning algorithm.
GBDTs are commonly compared with the random forest algorithm. They are similar in the sense that they build multiple decision trees. But the key differences are in how they are built and combined. Random forest first builds full decision trees in parallel from bootstrap samples of the dataset, and then generates the final prediction based on an average of all of the predictions. In contrast, GBDT iteratively trains decision trees with the objective that each subsequent tree reduces the error residuals of the previous model - this is the concept of boosting. The final prediction in a GBDT is a weighted sum of all of the tree predictions. While the bootstrap aggregation method of random forest minimizes variance and overfitting, the boosting method of GBDT minimizes bias and underfitting.
XGBoost includes many features that optimizes the implementation of GBDT, including parallelized trees training (instead of sequential) and integration with distributed processing frameworks like Apache Spark and Dask. These various performance improvements have historically made XGBoost the preferred framework of choice when training models for supervised learning tasks, and have seen widespread success in Kaggle competitions on structured data.
Use cases in Federated Learning
While there is no way to know before hand what model would perform the best in federated learning, XGBoost is appealing for several reasons:
- To train the first model, XGBoost hyperparameters require significantly less tuning compared to neural network-based models.
- XGBoost is known to produce models that perform far better than neural networks on tabular datasets, which can be encountered in real-world federated learning systems such as in healthcare or IoT applications.
- Feature scaling is unnecessary when training XGBoost models. This not only facilitates fine-tuning on new data distributions, but also supports cross-device and cross-silo federated learning, where the data distributions from participating clients are not know a priori.
XGBoost in Flower
In Flower, we have provided two strategies for performing federated learning with XGBoost: FedXgbBagging and FedXgbCyclic, which are inspired from the work at Nvidia NVFlare. These implementations allow Flower users to use different aggregation strategies for the XGBoost model. FedXgbBagging aggregates trees from all participating clients every round, whereas FedXgbCyclic aggregates clients' trees sequentially in a round-robin manner. With these strategies, Flower users can very quickly and easily run and compare the performance of federated learning systems on distributed tabular datasets using state-of-the-art XGBoost aggregation strategies, without needing to implement them from scratch.