Flower AI Summit 2026·April 15–16·London

@mainthread/federated-fraud-detection

8
0
flwr new @mainthread/federated-fraud-detection

Federated Financial Fraud Detection with XGBoost and Flower

This example demonstrates a federated fraud detection system for real-world financial environments. Each participating institution (bank, exchange, payment provider) trains a local XGBoost model on its private transaction data. The server aggregates all local models into a FedXGBBagging ensemble without any raw data ever leaving the institution.

Architecture

┌─────────────────────────────────────────────────────────┐
│                        Server                           │
│   Collects client models → FedXGBBagging ensemble       │
│   Voting: soft | hard | weighted_soft                   │
└────────────┬──────────────────────┬────────────────────┘
             │ serialised booster   │ serialised booster
    ┌────────▼────────┐    ┌────────▼────────┐
    │   Client 0      │    │   Client N      │
    │ (Institution A) │    │ (Institution N) │
    │ XGBoost (local) │    │ XGBoost (local) │
    └─────────────────┘    └─────────────────┘

Federation strategy — FedXGBBagging:

  • Every round each client trains a fresh XGBoost booster on its local partition.
  • The booster is serialised to JSON bytes and sent to the server.
  • The server accumulates all boosters across all rounds and wraps them in a FedXGBBagging ensemble for final inference.
  • No raw transaction data is shared between participants.

Project Structure

example-app/
├── frauddetection/
│   ├── __init__.py
│   ├── client_app.py      # Flower ClientApp — local XGBoost train/evaluate
│   ├── server_app.py      # Flower ServerApp — FedXGBBagging ensemble builder
│   ├── task.py            # Data loading, XGBoost training, model serialization
│   └── fed_xgb_bagging.py # FedXGBBagging / FedXGBSimilarity / FedEnsembleLevelSimXGBBagging
├── data/
│   └── # dataset.txt
├── pyproject.toml
└── README.md

Dataset

You can download this example dataset on your local directory (26.3.27.) via a link below: https://drive.google.com/drive/folders/1xYMPfKyCv-f4UWiGVe3WoHHAJ4xIeDUR?usp=drive_link

The bundled example dataset is based on Ethereum transaction records with the following structure:

ColumnDescription
Fraud_LabelBinary label — 1 fraud, 0 normal
Avg min between sent tnxAverage time between outgoing transactions
total Ether sent / receivedVolume statistics
ERC20 most sent token typeCategorical — most-sent ERC20 token
ERC20_most_rec_token_typeCategorical — most-received ERC20 token
… (47 features total)Transaction behaviour statistics

Note: The bundled CSV is provided as a structural example. Replace it with your own labeled dataset (same column schema, label column Fraud_Label) before running experiments. In deployment mode each SuperNode loads its own local CSV independently.

Fetch the App

Install Flower:

pip install flwr

Fetch the app from Flower Hub:

flwr new @frauddetection/federated-fraud-detection

Install Dependencies

cd example-app && pip install -e .

This installs flwr[simulation], xgboost, scikit-learn, pandas, and numpy.

Run the App

The app runs in both simulation and deployment mode without any code changes.

Simulation Engine (recommended for development)

Run with default settings (5 virtual clients, 3 rounds, 50 boosting rounds per client):

flwr run .

Override settings at runtime:

flwr run . --run-config "num-server-rounds=5 local-epochs=100"

Override the data path:

flwr run . --run-config "data-csv=path/to/your_transactions.csv"

TIP

Check the Simulation Engine documentation to learn more about controlling CPU/GPU resources and the number of virtual SuperNodes.

Deployment Engine (production)

In deployment mode, each SuperNode loads its own pre-split CSV from a local path specified via node_config.

  1. Prepare one CSV file per institution (same column schema as the example dataset).

  2. Launch each SuperNode pointing to its local data:

flower-supernode \
    --insecure \
    --superlink <SUPERLINK-FLEET-API> \
    --node-config="data-path=/path/to/institution_A_transactions.csv"
  1. Launch the run:
flwr run . <SUPERLINK-CONNECTION> --stream

TIP

Follow the Deployment Engine guide to set up secure TLS communications and SuperNode authentication for production federations.

Configuration Reference

All settings live in pyproject.toml under [tool.flwr.app.config]:

KeyDefaultDescription
num-server-rounds3Number of federated rounds
fraction-evaluate1.0Fraction of clients used for distributed evaluation
local-epochs50XGBoost boosting rounds per client per FL round
data-csvdata/preprocessed_Ethereum_cleaned_v2.csvPath to CSV (simulation mode)

Using the Final Ensemble

After training completes the server saves all client models to ./final_ensemble/. Load the ensemble for inference:

import glob
from frauddetection.fed_xgb_bagging import FedXGBBagging

ensemble = FedXGBBagging(
    model_paths=glob.glob("final_ensemble/*.json"),
    voting="soft",           # "soft" | "hard" | "weighted_soft"
    config={"bank_name_round_number": "production_v1"},
)

y_pred, y_prob = ensemble.predict(X_test)
metrics = ensemble.evaluate_predictions(y_test, y_pred, y_prob)
print(metrics)  # accuracy, precision, recall, f1, roc_auc, pr_auc

Advanced: Similarity-based Federation

The fed_xgb_bagging.py module also provides two advanced federation classes:

  • FedXGBSimilarity — merges trees from multiple clients at the tree level using structural, threshold, signal, and data-proxy similarity metrics. Use this when you want a single merged model rather than a voting ensemble.

  • FedEnsembleLevelSimXGBBagging — selects the most similar subset of client models before bagging, reducing ensemble size while preserving diversity.

Both classes share the same predict / evaluate_predictions interface as FedXGBBagging.