@mainthread/federated-fraud-detection
flwr new @mainthread/federated-fraud-detectionFederated Financial Fraud Detection with XGBoost and Flower
This example demonstrates a federated fraud detection system for real-world financial environments. Each participating institution (bank, exchange, payment provider) trains a local XGBoost model on its private transaction data. The server aggregates all local models into a FedXGBBagging ensemble without any raw data ever leaving the institution.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Server │
│ Collects client models → FedXGBBagging ensemble │
│ Voting: soft | hard | weighted_soft │
└────────────┬──────────────────────┬────────────────────┘
│ serialised booster │ serialised booster
┌────────▼────────┐ ┌────────▼────────┐
│ Client 0 │ │ Client N │
│ (Institution A) │ │ (Institution N) │
│ XGBoost (local) │ │ XGBoost (local) │
└─────────────────┘ └─────────────────┘
Federation strategy — FedXGBBagging:
- Every round each client trains a fresh XGBoost booster on its local partition.
- The booster is serialised to JSON bytes and sent to the server.
- The server accumulates all boosters across all rounds and wraps them in a FedXGBBagging ensemble for final inference.
- No raw transaction data is shared between participants.
Project Structure
example-app/
├── frauddetection/
│ ├── __init__.py
│ ├── client_app.py # Flower ClientApp — local XGBoost train/evaluate
│ ├── server_app.py # Flower ServerApp — FedXGBBagging ensemble builder
│ ├── task.py # Data loading, XGBoost training, model serialization
│ └── fed_xgb_bagging.py # FedXGBBagging / FedXGBSimilarity / FedEnsembleLevelSimXGBBagging
├── data/
│ └── # dataset.txt
├── pyproject.toml
└── README.md
Dataset
You can download this example dataset on your local directory (26.3.27.) via a link below: https://drive.google.com/drive/folders/1xYMPfKyCv-f4UWiGVe3WoHHAJ4xIeDUR?usp=drive_link
The bundled example dataset is based on Ethereum transaction records with the following structure:
| Column | Description |
|---|---|
| Fraud_Label | Binary label — 1 fraud, 0 normal |
| Avg min between sent tnx | Average time between outgoing transactions |
| total Ether sent / received | Volume statistics |
| ERC20 most sent token type | Categorical — most-sent ERC20 token |
| ERC20_most_rec_token_type | Categorical — most-received ERC20 token |
| … (47 features total) | Transaction behaviour statistics |
Note: The bundled CSV is provided as a structural example. Replace it with your own labeled dataset (same column schema, label column Fraud_Label) before running experiments. In deployment mode each SuperNode loads its own local CSV independently.
Fetch the App
Install Flower:
pip install flwr
Fetch the app from Flower Hub:
flwr new @frauddetection/federated-fraud-detection
Install Dependencies
cd example-app && pip install -e .
This installs flwr[simulation], xgboost, scikit-learn, pandas, and numpy.
Run the App
The app runs in both simulation and deployment mode without any code changes.
Simulation Engine (recommended for development)
Run with default settings (5 virtual clients, 3 rounds, 50 boosting rounds per client):
flwr run .
Override settings at runtime:
flwr run . --run-config "num-server-rounds=5 local-epochs=100"
Override the data path:
flwr run . --run-config "data-csv=path/to/your_transactions.csv"
TIP
Check the Simulation Engine documentation to learn more about controlling CPU/GPU resources and the number of virtual SuperNodes.
Deployment Engine (production)
In deployment mode, each SuperNode loads its own pre-split CSV from a local path specified via node_config.
-
Prepare one CSV file per institution (same column schema as the example dataset).
-
Launch each SuperNode pointing to its local data:
flower-supernode \ --insecure \ --superlink <SUPERLINK-FLEET-API> \ --node-config="data-path=/path/to/institution_A_transactions.csv"
- Launch the run:
flwr run . <SUPERLINK-CONNECTION> --stream
TIP
Follow the Deployment Engine guide to set up secure TLS communications and SuperNode authentication for production federations.
Configuration Reference
All settings live in pyproject.toml under [tool.flwr.app.config]:
| Key | Default | Description |
|---|---|---|
| num-server-rounds | 3 | Number of federated rounds |
| fraction-evaluate | 1.0 | Fraction of clients used for distributed evaluation |
| local-epochs | 50 | XGBoost boosting rounds per client per FL round |
| data-csv | data/preprocessed_Ethereum_cleaned_v2.csv | Path to CSV (simulation mode) |
Using the Final Ensemble
After training completes the server saves all client models to ./final_ensemble/. Load the ensemble for inference:
import glob from frauddetection.fed_xgb_bagging import FedXGBBagging ensemble = FedXGBBagging( model_paths=glob.glob("final_ensemble/*.json"), voting="soft", # "soft" | "hard" | "weighted_soft" config={"bank_name_round_number": "production_v1"}, ) y_pred, y_prob = ensemble.predict(X_test) metrics = ensemble.evaluate_predictions(y_test, y_pred, y_prob) print(metrics) # accuracy, precision, recall, f1, roc_auc, pr_auc
Advanced: Similarity-based Federation
The fed_xgb_bagging.py module also provides two advanced federation classes:
-
FedXGBSimilarity — merges trees from multiple clients at the tree level using structural, threshold, signal, and data-proxy similarity metrics. Use this when you want a single merged model rather than a voting ensemble.
-
FedEnsembleLevelSimXGBBagging — selects the most similar subset of client models before bagging, reducing ensemble size while preserving diversity.
Both classes share the same predict / evaluate_predictions interface as FedXGBBagging.