@uncledecart/federated-umap
flwr new @uncledecart/federated-umapFederated UMAP with Flower
Federated UMAP is a privacy-preserving dimensionality reduction algorithm that produces a 2D UMAP embedding across multiple clients without any client ever sharing raw data.
Each client optimises a shared set of landmark points Y against its local data using Maximum Mean Discrepancy (MMD) gradient descent. The server aggregates landmark updates via FedAvg, then reconstructs the full pairwise distance matrix using the Nyström approximation and runs UMAP on the result.
Two federated embeddings are logged to Weights & Biases (W&B) for comparison:
| Embedding | Description |
|---|---|
| Nyström D̂ | UMAP on the Nyström-reconstructed global distance matrix (Algorithm 2) |
| K_XY features | UMAP on Gaussian kernel similarities to the landmarks |
Project Structure
federated-umap
├── federated_umap/
│ ├── __init__.py
│ ├── client_app.py # Defines ClientApp
│ ├── server_app.py # Defines ServerApp
│ └── task.py # FedMMDClient, data loading
├── pyproject.toml # Project metadata and configuration
└── README.md
W&B Setup
Before running, authenticate with W&B:
wandb login
Results are logged to the project defined by wandb-project in pyproject.toml (default: federated-umap). Override at runtime:
flwr run . --run-config "wandb-project=my-project"
Running the App
Run with the Simulation Engine
Install the dependencies defined in pyproject.toml as well as the pytorchexample package.
cd example-app && pip install -e .
Run with default settings:
flwr run .
You can also override some of the settings for your ClientApp and ServerApp defined in pyproject.toml. For example:
flwr run . --run-config "num-server-rounds=5 learning-rate=0.05"
For more information check Configuration Section
Run with the Deployment Engine
To run this App using Flower's Deployment Engine we recommend first creating some demo data using Flower Datasets. For example:
Install Flower datasets
pip install "flwr-datasets['vision']"
Create dataset partitions and save them to disk
flwr-datasets create ylecun/mnist --num-partitions 2 --out-dir demo_data
The above command will create two IID partitions of the MNIST dataset and save them in a demo_data directory. Next, you can pass one partition to each of your SuperNodes like this:
flower-supernode \ --insecure \ --superlink <SUPERLINK-FLEET-API> \ --node-config="data-path=/path/to/demo_data/partition_0"
Finally, ensure the environment of each SuperNode has all dependencies installed. Then, launch the run via flwr run but pointing to a SuperLink connection that specifies the SuperLink your SuperNode is connected to:
flwr run . <SUPERLINK-CONNECTION> --stream
Configuration
All keys live under [tool.flwr.app.config] in pyproject.toml and can be overridden with --run-config.
| Key | Default | Description |
|---|---|---|
| num-server-rounds | 100 | Number of federated rounds |
| local-epochs | 2 | Local gradient descent steps per round |
| n-y | 500 | Number of UMAP landmark points |
| dataset | ylecun/mnist | HuggingFace dataset identifier (Simulation only) |
| feature-column | image | Dataset column containing images |
| label-column | label | Dataset column containing class labels |
| feature-dim | 784 | Flattened feature dimension (28×28 for MNIST) |
| umap-max-samples | 10000 | Max points fed into UMAP at the final round |
| wandb-project | federated-umap | W&B project name |
Specify the number of virtual SuperNodes and their resources in ~/.flwr/config.toml:
[superlink.local] options.num-supernodes = 10 options.backend.client-resources.num-cpus = 2 options.backend.client-resources.num-gpus = 1.0