@flwrlabs/fed-audio-tagging

Federated Audio Tagging with Flower and PyTorch

This project implements federated learning for environmental sound classification using Flower, PyTorch, and torchaudio.

It uses a federated version of the UrbanSound8K dataset and supports:

IID partitioning
Natural partitioning by client (clientID)

The system performs audio tagging by converting raw audio into log-mel spectrograms and training a compact CNN.

Fetch the App

Install Flower:

pip install flwr

Fetch the app:

flwr new @flwrlabs/fed-audio-tagging

Then, install dependencies:

cd fed-audio-tagging && pip install -e .

Project structure:

fed-audio-tagging
├── fedaudio
│   ├── __init__.py
│   ├── client_app.py   # Client-side training
│   ├── server_app.py   # Server-side orchestration
│   └── task.py         # Model, data processing, training, evaluation
├── pyproject.toml

Run the App

You can run this Flower App in both simulation and deployment modes.

Run with the Simulation Engine

In simulation mode:

Dataset is automatically downloaded (flwrlabs/fed-urbansound8K)
Data is partitioned across clients:
- iid → random split
- natural → grouped by clientID

Run with default settings:

flwr run .

Override configuration:

flwr run . --run-config "num-server-rounds=20 batch-size=16"

Key configuration options (from pyproject.toml):

num-server-rounds: number of FL rounds
local-epochs: local training epochs
batch-size: batch size
fraction-train: fraction of participating clients
learning-rate-max/min: cosine annealing schedule
partitioner: iid or natural

Model

The model is a compact CNN for spectrogram classification:

Input: log-mel spectrograms
3 convolutional blocks with:
- BatchNorm
- ReLU activation
- Max pooling
Dropout regularization
Global average pooling
Fully connected classifier

Data Pipeline

Dataset:

Hugging Face: flwrlabs/fed-urbansound8K

Processing steps:

Load raw audio (bytes or file path)
Resample to 16 kHz
Pad or trim to fixed length (4 seconds)
Convert to mel spectrogram
Convert to log scale (dB)
Normalize features

This produces input tensors of shape:

(batch, 1, n_mels, time)

Supports:

Simulation mode via FederatedDataset
Deployment mode via load_from_disk

Training

Each client:

Receives the global model
Trains locally using:
- CrossEntropyLoss
- Adam optimizer
Applies cosine annealing learning rate

Clients return:

Updated model weights
Training loss and dataset size

Evaluation

Server-side evaluation:

Uses centralized test split
Reports:
- Loss
- Accuracy

Run with the Deployment Engine

To run in deployment mode:

Step 1: Prepare local datasets

Prepare audio datasets in Hugging Face format and store locally.

Step 2: Start SuperNodes

flower-supernode \
    --insecure \
    --superlink <SUPERLINK-FLEET-API> \
    --node-config="data-path=/path/to/local_dataset"

Step 3: Run federation

flwr run . <SUPERLINK-CONNECTION> --stream

Benchmarking and System Metrics

This app writes a benchmark summary next to the standard Flower result pickle:

result_<run-name>_communication.json

The summary includes per-round and total communication volume:

total_comm_bytes
comm_bytes_total per training round

Enable system metric tracking with:

flwr run . <SUPERLINK-CONNECTION> --stream --run-config "benchmark-system-metrics=true"

When enabled, the benchmark summary also includes:

client_train_time_sec
server_aggregation_time_sec
round_wall_clock_sec
client_peak_cpu_memory_mb
client_peak_gpu_memory_mb

Server-side centralized evaluation can be disabled for benchmark-only runs:

flwr run . <SUPERLINK-CONNECTION> --stream --run-config "benchmark-run-server-eval=false"

Notes

Designed for audio classification tasks
Uses log-mel spectrograms as features
Handles variable-length audio via padding/trimming
Efficient CNN architecture for edge devices
Automatically uses GPU if available