@bahaaelden/subgroup-seq-agg

Sequential Subgroup Aggregation in Federated Learning

A Flower Federated Learning application where clients train in sequential phases, allowing overlapping clients to carry learned knowledge between groups.

Author: Bahaa-Elden Ali Abdelghany

💡 The Core Concept

In standard Federated Learning (FL), clients typically train together in a single global aggregation process where updates from many clients are combined into one global model.

This project introduces Sequential Grouping.

Instead of training all clients simultaneously, we divide them into phased groups that train one after another.

Phase 1 → Group 1 trains
Phase 2 → Group 2 trains
Phase 3 → Group 3 trains

Each group may use a different federated aggregation strategy.

Example:

Group 1 → FedAvg
Group 2 → FedProx
Group 3 → QFedAvg

The model produced by one group becomes the starting baseline for the next group, forming a training pipeline across client groups.

🌉 Bridge Nodes (Knowledge Carry-Over)

The most important feature of this project is how it handles clients that belong to multiple groups.

These clients act as bridges, transferring knowledge between training phases.

Example

Suppose Node 4 belongs to Group 1 and Group 3.

Phase 1
Group 1 → Nodes: 1,2,3,4
Result → Model M1
Node 4 participates in generating M1.

Phase 2
Group 2 → Nodes: 5,6
Result → Model M2
Node 4 does not participate in this phase.

Phase 3
Group 3 → Nodes: 4,7
Since Node 4 trained in Phase 1, it carries its learned parameters forward.
Node 4 therefore initializes the new group with its previously trained model, effectively bridging knowledge between phases.
Node 7, which is new to the pipeline, begins training from Node 4’s parameters.

If multiple bridge nodes exist, the system uses the model from the most recently trained bridge node as the initialization seed for the new group.

In this way, overlapping nodes propagate knowledge across sequential training phases.

🎯 Use Cases

Sequential subgroup aggregation is useful when clients or data sources are heterogeneous and cannot all participate in the same training phase.

Multi-Organization Federated Learning

Phase 1 → Hospital Network A
Phase 2 → Hospital Network B
Phase 3 → Hospital Network C

Bridge nodes transfer knowledge between organizations without sharing raw data.

Multimodal Federated Learning

Group 1 → Vision clients
Group 2 → Audio clients
Group 3 → Multimodal clients

Clients that support multiple modalities naturally act as bridges between training stages.

Edge–Cloud Hierarchical Training

Phase 1 → Edge devices
Phase 2 → Regional aggregators
Phase 3 → Global aggregation

Bridge nodes propagate learned representations between levels.

Resource-Constrained Federated Systems

In very large deployments, not all devices can participate simultaneously.

Sequential grouping allows:

controlled scheduling of clients
reduced communication load
scalable training phases

Strategy Experimentation

Researchers can evaluate different aggregation strategies sequentially.

Group 1 → FedAvg
Group 2 → FedProx
Group 3 → QFedAvg

This makes it possible to observe how earlier strategies influence downstream training.

🚀 Quick Start

Install Dependencies

pip install flwr[simulation] flwr-datasets[vision] torch torchvision numpy

Or install locally:

pip install .

Run the Simulation
```
flwr run .
```

⚙️ Configuration (No Python Required)

The entire training pipeline is controlled through pyproject.toml.

No Python code needs to be modified.

Under the [tool.flwr.app.config] section you define your sequential training groups.

Example: Two-Stage Training Pipeline

[tool.flwr.app.config]
dataset = "mnist"
num-groups = 2
rounds-per-group = 2

# --- Group 1 (Trains First) ---
group-1-name = "Group 1"
group-1-nodes = "1,2,3,4"
group-1-strategy = "QFedAvg"
group-1-carry-over = true

# --- Group 2 (Trains Second) ---
group-2-name = "Group 2"
group-2-nodes = "4,5,6"
group-2-strategy = "FedProx"
group-2-strategy-params = '{"proximal_mu": 0.1}'
group-2-carry-over = true

⚠️ For the bridge mechanism to work, carry-over must be enabled.

🔧 Supported Aggregation Strategies

The following strategies are tested and supported:

FedAvg
FedMedian
FedAdam
FedAdagrad
FedYogi

Some strategies require patched implementations to ensure compatibility with NumPy-based parameter operations.

🔁 Conditionally Supported Strategies

The following strategies may be available depending on the Flower version and installed dependencies:

FedProx
FedAvgM
FedTrimmedAvg
Krum
MultiKrum
Bulyan
QFedAvg
FedXgbBagging
FedXgbCyclic

These strategies are loaded using a try/except compatibility mechanism. If a strategy is unavailable in the current Flower environment, it will not be registered.

🗂️ Project Structure

subgroup_sequential_aggregation/
│
├── server_app.py
├── client_app.py
├── hierarchical_strategy.py
├── group.py
├── patched_strategies.py
├── task.py
└── pyproject.toml

File Roles

pyproject.toml — Defines sequential groups and aggregation strategies.
server_app.py — Runs the Flower server and registers the sequential aggregation pipeline.
client_app.py — Defines the training logic executed by each simulated client node.
task.py — Contains the PyTorch model and dataset loading logic.
hierarchical_strategy.py — Core engine managing sequential phases and bridge node carry-over.
group.py — Defines group layers and assigns nodes to strategies.
patched_strategies.py — Provides compatibility fixes for certain Flower strategies.

📊 Outputs

After training completes, results are saved in the results/ directory.

Example:

results/
├── metrics_history.json
├── final_model.npz
├── model_group_1.npz
└── model_group_2.npz

Output Files

metrics_history.json — Complete training metrics across all rounds and groups.
final_model.npz — The final global model produced after the last group.
model_group_X.npz — Snapshots of the model at the end of each group phase.

⚠️ Limitations

While Sequential Subgroup Aggregation enables flexible phased training, several limitations exist.

Sequential Execution
Groups train strictly one after another, which may increase total training time compared to fully parallel federated learning.

Bridge Node Dependency
Knowledge transfer between groups requires overlapping clients. If no bridge nodes exist between two groups, the next group starts from the latest global model.

Strategy Compatibility
Not all Flower strategies are fully supported. Some require patched implementations or depend on the Flower version.

Simulation-Focused Design
The current project is primarily designed for Flower simulation environments and may require additional infrastructure for large-scale production deployments.

📚 Citation

If you use this project in your research, please cite:

@INPROCEEDINGS{10206255,
  author={Abdelghany, Bahaa-Elden A. and Fernández-Veiga, M. and Fernández-Vilas, A. and Hassan, Ammar M. and Abdelmoez, Walid M. and El-Bendary, Nashwa},
  booktitle={2022 32nd International Conference on Computer Theory and Applications (ICCTA)}, 
  title={Scheduling and Communication Schemes for Decentralized Federated Learning}, 
  year={2022},
  doi={10.1109/ICCTA58027.2022.10206255}
}