Quickstart PyTorch LightningΒΆ
In this federated learning tutorial we will learn how to train an AutoEncoder model on MNIST using Flower and PyTorch Lightning. It is recommended to create a virtual environment and run everything within a virtualenv.
Then, clone the code example directly from GitHub:
git clone --depth=1 https://github.com/adap/flower.git _tmp \
&& mv _tmp/examples/quickstart-pytorch-lightning . \
&& rm -rf _tmp && cd quickstart-pytorch-lightning
This will create a new directory called quickstart-pytorch-lightning containing the following files:
quickstart-pytorch-lightning
βββ pytorchlightning_example
β βββ client_app.py # Defines your ClientApp
β βββ server_app.py # Defines your ServerApp
β βββ task.py # Defines your model, training and data loading
βββ pyproject.toml # Project metadata like dependencies and configs
βββ README.md
Next, activate your environment, then run:
# Navigate to the example directory
$ cd path/to/quickstart-pytorch-lightning
# Install project and dependencies
$ pip install -e .
By default, Flower Simulation Engine will be started and it will create a federation of
4 nodes using FedAvg
as the aggregation strategy. The dataset will be partitioned
using Flower Datasetβs IidPartitioner
. To run the project, do:
# Run with default arguments
$ flwr run .
With default arguments you will see an output like this one:
Loading project configuration...
Success
INFO : Starting FedAvg strategy:
INFO : βββ Number of rounds: 3
INFO : βββ ArrayRecord (0.39 MB)
INFO : βββ ConfigRecord (train): (empty!)
INFO : βββ ConfigRecord (evaluate): (empty!)
INFO : βββ> Sampling:
INFO : β βββFraction: train (0.50) | evaluate ( 0.50)
INFO : β βββMinimum nodes: train (2) | evaluate (2)
INFO : β βββMinimum available nodes: 2
INFO : βββ> Keys in records:
INFO : βββ Weighted by: 'num-examples'
INFO : βββ ArrayRecord key: 'arrays'
INFO : βββ ConfigRecord key: 'config'
INFO :
INFO :
INFO : [ROUND 1/3]
INFO : configure_train: Sampled 2 nodes (out of 4)
INFO : aggregate_train: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'train_loss': 0.0487}
INFO : configure_evaluate: Sampled 2 nodes (out of 4)
INFO : aggregate_evaluate: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'eval_loss': 0.0495}
INFO :
INFO : [ROUND 2/3]
INFO : configure_train: Sampled 2 nodes (out of 4)
INFO : aggregate_train: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'train_loss': 0.0420}
INFO : configure_evaluate: Sampled 2 nodes (out of 4)
INFO : aggregate_evaluate: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'eval_loss': 0.0455}
INFO :
INFO : [ROUND 3/3]
INFO : configure_train: Sampled 2 nodes (out of 4)
INFO : aggregate_train: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'train_loss': 0.05082}
INFO : configure_evaluate: Sampled 2 nodes (out of 4)
INFO : aggregate_evaluate: Received 2 results and 0 failures
INFO : βββ> Aggregated MetricRecord: {'eval_loss': 0.0441}
INFO :
INFO : Strategy execution finished in 159.24s
INFO :
INFO : Final results:
INFO :
INFO : Global Arrays:
INFO : ArrayRecord (0.389 MB)
INFO :
INFO : Aggregated ClientApp-side Train Metrics:
INFO : { 1: {'train_loss': '4.8696e-02'},
INFO : 2: {'train_loss': '4.1957e-02'},
INFO : 3: {'train_loss': '5.0818e-02'}}
INFO :
INFO : Aggregated ClientApp-side Evaluate Metrics:
INFO : { 1: {'eval_loss': '4.9516e-02'},
INFO : 2: {'eval_loss': '4.5510e-02'},
INFO : 3: {'eval_loss': '4.4052e-02'}}
INFO :
INFO : ServerApp-side Evaluate Metrics:
INFO : {}
INFO :
Each simulated ClientApp (two per round) will also log a summary of their local training process. Expect this output to be similar to:
# The left part indicates the process ID running the `ClientApp`
(ClientAppActor pid=38155) βββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββ
(ClientAppActor pid=38155) β Test metric β DataLoader 0 β
(ClientAppActor pid=38155) β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
(ClientAppActor pid=38155) β test_loss β 0.045175597071647644 β
(ClientAppActor pid=38155) βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ
You can also override the parameters defined in the [tool.flwr.app.config]
section
in pyproject.toml
like this:
# Override some arguments
$ flwr run . --run-config num-server-rounds=5
Note
Check the source code
of this tutorial in examples/quickstart-pytorch-lightning
in the Flower GitHub
repository.