@bahaaelden/subgroup-seq-agg
flwr new @bahaaelden/subgroup-seq-aggSequential Subgroup Aggregation in Federated Learning
A Flower Federated Learning application where clients train in sequential phases, allowing overlapping clients to carry learned knowledge between groups.
Author: Bahaa-Elden Ali Abdelghany
π‘ The Core Concept
In standard Federated Learning (FL), clients typically train together in a single global aggregation process where updates from many clients are combined into one global model.
This project introduces Sequential Grouping.
Instead of training all clients simultaneously, we divide them into phased groups that train one after another.
- Phase 1 β Group 1 trains
- Phase 2 β Group 2 trains
- Phase 3 β Group 3 trains
Each group may use a different federated aggregation strategy.
Example:
- Group 1 β FedAvg
- Group 2 β FedProx
- Group 3 β QFedAvg
The model produced by one group becomes the starting baseline for the next group, forming a training pipeline across client groups.
π Bridge Nodes (Knowledge Carry-Over)
The most important feature of this project is how it handles clients that belong to multiple groups.
These clients act as bridges, transferring knowledge between training phases.
Example
Suppose Node 4 belongs to Group 1 and Group 3.
Phase 1
Group 1 β Nodes: 1,2,3,4
Result β Model M1
Node 4 participates in generating M1.
Phase 2
Group 2 β Nodes: 5,6
Result β Model M2
Node 4 does not participate in this phase.
Phase 3
Group 3 β Nodes: 4,7
Since Node 4 trained in Phase 1, it carries its learned parameters forward.
Node 4 therefore initializes the new group with its previously trained model, effectively bridging knowledge between phases.
Node 7, which is new to the pipeline, begins training from Node 4βs parameters.
If multiple bridge nodes exist, the system uses the model from the most recently trained bridge node as the initialization seed for the new group.
In this way, overlapping nodes propagate knowledge across sequential training phases.
π― Use Cases
Sequential subgroup aggregation is useful when clients or data sources are heterogeneous and cannot all participate in the same training phase.
Multi-Organization Federated Learning
- Phase 1 β Hospital Network A
- Phase 2 β Hospital Network B
- Phase 3 β Hospital Network C
Bridge nodes transfer knowledge between organizations without sharing raw data.
Multimodal Federated Learning
- Group 1 β Vision clients
- Group 2 β Audio clients
- Group 3 β Multimodal clients
Clients that support multiple modalities naturally act as bridges between training stages.
EdgeβCloud Hierarchical Training
- Phase 1 β Edge devices
- Phase 2 β Regional aggregators
- Phase 3 β Global aggregation
Bridge nodes propagate learned representations between levels.
Resource-Constrained Federated Systems
In very large deployments, not all devices can participate simultaneously.
Sequential grouping allows:
- controlled scheduling of clients
- reduced communication load
- scalable training phases
Strategy Experimentation
Researchers can evaluate different aggregation strategies sequentially.
- Group 1 β FedAvg
- Group 2 β FedProx
- Group 3 β QFedAvg
This makes it possible to observe how earlier strategies influence downstream training.
π Quick Start
-
Install Dependencies
pip install flwr[simulation] flwr-datasets[vision] torch torchvision numpyOr install locally:
pip install . -
Run the Simulation
flwr run .
βοΈ Configuration (No Python Required)
The entire training pipeline is controlled through pyproject.toml.
No Python code needs to be modified.
Under the [tool.flwr.app.config] section you define your sequential training groups.
Example: Two-Stage Training Pipeline
[tool.flwr.app.config] dataset = "mnist" num-groups = 2 rounds-per-group = 2 # --- Group 1 (Trains First) --- group-1-name = "Group 1" group-1-nodes = "1,2,3,4" group-1-strategy = "QFedAvg" group-1-carry-over = true # --- Group 2 (Trains Second) --- group-2-name = "Group 2" group-2-nodes = "4,5,6" group-2-strategy = "FedProx" group-2-strategy-params = '{"proximal_mu": 0.1}' group-2-carry-over = true
β οΈ For the bridge mechanism to work, carry-over must be enabled.
π§ Supported Aggregation Strategies
The following strategies are tested and supported:
- FedAvg
- FedMedian
- FedAdam
- FedAdagrad
- FedYogi
Some strategies require patched implementations to ensure compatibility with NumPy-based parameter operations.
π Conditionally Supported Strategies
The following strategies may be available depending on the Flower version and installed dependencies:
- FedProx
- FedAvgM
- FedTrimmedAvg
- Krum
- MultiKrum
- Bulyan
- QFedAvg
- FedXgbBagging
- FedXgbCyclic
These strategies are loaded using a try/except compatibility mechanism. If a strategy is unavailable in the current Flower environment, it will not be registered.
ποΈ Project Structure
subgroup_sequential_aggregation/
β
βββ server_app.py
βββ client_app.py
βββ hierarchical_strategy.py
βββ group.py
βββ patched_strategies.py
βββ task.py
βββ pyproject.toml
File Roles
- pyproject.toml β Defines sequential groups and aggregation strategies.
- server_app.py β Runs the Flower server and registers the sequential aggregation pipeline.
- client_app.py β Defines the training logic executed by each simulated client node.
- task.py β Contains the PyTorch model and dataset loading logic.
- hierarchical_strategy.py β Core engine managing sequential phases and bridge node carry-over.
- group.py β Defines group layers and assigns nodes to strategies.
- patched_strategies.py β Provides compatibility fixes for certain Flower strategies.
π Outputs
After training completes, results are saved in the results/ directory.
Example:
results/
βββ metrics_history.json
βββ final_model.npz
βββ model_group_1.npz
βββ model_group_2.npz
Output Files
- metrics_history.json β Complete training metrics across all rounds and groups.
- final_model.npz β The final global model produced after the last group.
- model_group_X.npz β Snapshots of the model at the end of each group phase.
β οΈ Limitations
While Sequential Subgroup Aggregation enables flexible phased training, several limitations exist.
Sequential Execution
Groups train strictly one after another, which may increase total training time compared to fully parallel federated learning.
Bridge Node Dependency
Knowledge transfer between groups requires overlapping clients. If no bridge nodes exist between two groups, the next group starts from the latest global model.
Strategy Compatibility
Not all Flower strategies are fully supported. Some require patched implementations or depend on the Flower version.
Simulation-Focused Design
The current project is primarily designed for Flower simulation environments and may require additional infrastructure for large-scale production deployments.
π Citation
If you use this project in your research, please cite:
@INPROCEEDINGS{10206255, author={Abdelghany, Bahaa-Elden A. and FernΓ‘ndez-Veiga, M. and FernΓ‘ndez-Vilas, A. and Hassan, Ammar M. and Abdelmoez, Walid M. and El-Bendary, Nashwa}, booktitle={2022 32nd International Conference on Computer Theory and Applications (ICCTA)}, title={Scheduling and Communication Schemes for Decentralized Federated Learning}, year={2022}, doi={10.1109/ICCTA58027.2022.10206255} }