@flwrlabs/fed-fin-fraud
flwr new @flwrlabs/fed-fin-fraudFederated Financial Fraud Detection with PyTorch and Flower
This example demonstrates federated learning for financial fraud detection using Flower and PyTorch on a tabular transaction dataset.
It uses a federated version of the PaySim dataset (flwrlabs/fed-fraud-paysim-banks) and supports both:
- IID partitioning
- Natural partitioning by bank (BankID)
The project includes:
- A multi-layer perceptron (MLP) model with LayerNorm and dropout
- Feature preprocessing (normalization + feature engineering + one-hot encoding)
- Handling of class imbalance via weighted sampling and loss weighting
- Evaluation across multiple classification thresholds
Fetch the App
Install Flower:
pip install flwr
Fetch the app:
flwr new @flwrlabs/fed-fin-fraud
Then, install dependencies:
cd fed-fin-fraud && pip install -e .
Project structure:
fed-fin-fraud ├── fed_fraud │ ├── __init__.py │ ├── client_app.py # Client-side training logic │ ├── server_app.py # Server-side orchestration and evaluation │ └── task.py # Model, preprocessing, training, evaluation ├── pyproject.toml # Dependencies and configuration └── README.md
Run the App
You can run this Flower App in both simulation and deployment mode.
Run with the Simulation Engine
In simulation mode:
-
Dataset is automatically loaded from Hugging Face
-
Training data is partitioned across clients:
- iid → random split
- natural → grouped by BankID
Run with default configuration:
flwr run .
Override configuration (example):
flwr run . --run-config "num-server-rounds=5 batch-size=512"
Key configuration options (from pyproject.toml):
- num-server-rounds: number of FL rounds
- local-epochs: local training epochs
- batch-size: training batch size
- hidden-dim-1, hidden-dim-2: model size
- dropout: dropout rate
- use-class-weights: handle class imbalance
- partitioner: iid or natural
- learning-rate-max/min: cosine annealing schedule
Model
The model is a fully connected neural network (MLP):
-
Input: engineered tabular features
-
Two hidden layers with:
- LayerNorm (optional)
- ReLU activation
- Dropout
-
Output: single logit for binary classification
Data Pipeline
Dataset:
- flwrlabs/fed-fraud-paysim-banks
Processing steps:
-
Standardization of numeric features
-
Feature engineering:
- balance deltas
- transaction inconsistencies
-
One-hot encoding of transaction type
-
Construction of final feature vector
Class imbalance handling:
- Weighted sampling (WeightedRandomSampler)
- Optional pos_weight in loss function
Supports:
- Simulation mode via FederatedDataset
- Deployment mode via load_from_disk
Training
Each client:
-
Receives global model weights
-
Trains locally using:
- BCEWithLogitsLoss
- Optional class weighting
- Gradient clipping
-
Uses cosine annealing learning rate schedule
Evaluation
Server-side evaluation:
-
Uses centralized test split
-
Computes metrics at multiple thresholds:
- Accuracy
- Precision
- Recall
- F1-score
- PR-AUC (average precision)
Thresholds evaluated:
0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.99
Run with the Deployment Engine
To run in deployment mode, prepare local dataset partitions.
Step 1: Prepare data
Partition and store the dataset locally (e.g., using Flower Datasets or custom pipeline).
Step 2: Start SuperNodes
flower-supernode \ --insecure \ --superlink <SUPERLINK-FLEET-API> \ --node-config="data-path=/path/to/local_partition"
Step 3: Run federation
flwr run . <SUPERLINK-CONNECTION> --stream
Benchmarking and System Metrics
This app writes a benchmark summary next to the standard Flower result pickle:
result_<run-name>_communication.json
The summary includes per-round and total communication volume:
- total_comm_bytes
- comm_bytes_total per training round
Enable system metric tracking with:
flwr run . <SUPERLINK-CONNECTION> --stream --run-config "benchmark-system-metrics=true"
When enabled, the benchmark summary also includes:
- client_train_time_sec
- server_aggregation_time_sec
- round_wall_clock_sec
- client_peak_cpu_memory_mb
- client_peak_gpu_memory_mb
Server-side centralized evaluation can be disabled for benchmark-only runs:
flwr run . <SUPERLINK-CONNECTION> --stream --run-config "benchmark-run-server-eval=false"
Dataset Fingerprint Verification
FedFinFraud supports a preflight dataset fingerprint check before training. Enable it with:
flwr run . <SUPERLINK-CONNECTION> --stream --run-config "benchmark-verify-dataset=true"
The server asks each connected client for its partition metadata, then verifies:
- expected client count
- partition IDs
- dataset version
- number of examples
- dataset fingerprint
The verification result is written into result_<run-name>_communication.json under verification. If any partition does not match the benchmark manifest, the run fails before training.
Expected deployment fingerprints for flwrlabs/fed-fraud-paysim-banks with natural partitioning:
| Client | Partition ID | Examples | Dataset fingerprint |
|---|---|---|---|
| 0 | 0 | 1374326 | c35c860d5f75655c0b53312ba8a5b555eeb97b20ac3610b8aabec724e69ef1f7 |
| 1 | 1 | 1202535 | b3cc17127b78219d7a8d18c60bb88fc59f041012acef3ef5efa37a64ba42cd3f |
| 2 | 2 | 1145272 | d311265f241964508546973919ee804d1f19615d464cf5819f40e6a1daf06d06 |
| 3 | 3 | 1030745 | 3c22c4e9d5ffebed3076b86bbfa330f9425431237a4c74a8ce06511346c821bc |
| 4 | 4 | 973480 | cf41d0946d13478a69785086a252ac6a4d0e1a1ba77943eb587b77abdba647f9 |
Notes
- Designed for highly imbalanced fraud detection
- Uses PR-AUC as a key evaluation metric
- Supports both research (simulation) and real-world deployment
- Automatically uses GPU if available