--- tags: [tabular, federated analytics, omop] dataset: [artificial] framework: [pandas] --- # Federated Analytics with OMOP CDM using Flower [View on GitHub](https://github.com/adap/flower/blob/main/examples/federated-analytics) This example will show you how you can use Flower to run federated analytics workloads on distributed SQL databases, which are applicable to many biostatistics and healthcare use cases. You will use an artificial dataset generated in adherence to the [OMOP Common Data Model](https://www.ohdsi.org/data-standardization/), which uses the OHDSI standardized vocabularies widely adopted in clinical domains. You will also run this example with Flower's Deployment Engine to demonstrate how each SuperNode can be configured to connect to different PostgreSQL databases, respectively, while the database connection will be handled using SQLAlchemy and the `psycopg` adapter which is the latest implementation of the PostgreSQL adapter for Python. ## Set up the project ### Prerequisites This example will make use of PostgreSQL databases running in containers. Make sure you have Docker and Docker Compose v2 installed. ### Clone the project After cloning the project, this will create a new directory called `federated-analytics` containing the following files: ```shell federated-analytics ├── federated-analytics │ ├── client_app.py # Defines your ClientApp │ ├── server_app.py # Defines your ServerApp │ └── task.py # Defines your database connection and data loading ├── pyproject.toml # Project metadata like dependencies and configs └── README.md ``` ### Install dependencies and project Install the dependencies defined in `pyproject.toml` as well as the `federated_analytics` package. ```shell # From a new python environment, run: pip install -e . ``` ### Start PostgreSQL databases First, create a database initialization script that defines the OMOP CDM table structure and generates sample data. Create a file named `db_init.sh`: ```bash #!/usr/bin/env bash SEED=${DB_SEED:-0.42} psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL CREATE TABLE person_measurements ( person_id INTEGER PRIMARY KEY, age INTEGER, bmi FLOAT, systolic_bp INTEGER, diastolic_bp INTEGER, ldl_cholesterol FLOAT, hba1c FLOAT ); SELECT setseed($SEED); INSERT INTO person_measurements SELECT gs AS person_id, 20 + (random() * 60)::INT AS age, 18 + (random() * 15) AS bmi, 100 + (random() * 40)::INT AS systolic_bp, 60 + (random() * 25)::INT AS diastolic_bp, 70 + (random() * 120) AS ldl_cholesterol, 4.5 + (random() * 4) AS hba1c FROM generate_series(1, 100) gs; EOSQL ``` Make it executable: ```shell chmod +x db_init.sh ``` Next, create a script to start the PostgreSQL containers. Create a file named `db_start.sh`: ```bash #!/usr/bin/env bash set -e N=${1:-2} # number of PostgreSQL databases (default = 2) BASE_PORT=5433 { echo "services:" for i in $(seq 1 "$N"); do PORT=$((BASE_PORT + i - 1)) # Set a seed for each of the database for producing different random data SEED=$(echo "scale=2; $i / 100" | bc) cat < [!NOTE] > To start more than two databases, pass the desired number as the first argument to the script, e.g. `./db_start.sh 3`. ### Update Flower Config SuperLink connections are defined in the [Flower Configuration](https://flower.ai/docs/framework/main/en/ref-flower-configuration.html) file. If it doesn't exist, this file is created automatically for you when you use the Flower CLI command. Open the `config.toml` file (usually located at `$HOME/.flwr`) and add a new SuperLink connection at the end: ```toml [superlink.local-deployment] address = "127.0.0.1:9093" insecure = true ``` ### Run with the Deployment Engine For a basic execution of this federated analytics app, activate your environment and start the SuperLink process in insecure mode: ```shell flower-superlink --insecure ``` Next, start two Supernodes and connect them to the SuperLink. You will need to specify different `--node-config` values so that each SuperNode will connect to different PostgreSQL databases. The `db-url` parameter accepts a standard PostgreSQL connection string in the format `postgresql+psycopg://username:password@host:port/database`. Each SuperNode should point to a different database port (5433 and 5434) corresponding to the containers started by `db_start.sh`. In this example, the database URLs are set individually and passed to the `--node-config` parameter: ```shell flower-supernode \ --insecure \ --superlink 127.0.0.1:9092 \ --clientappio-api-address 127.0.0.1:9094 \ --node-config="db-url='postgresql+psycopg://flwrlabs:flwrlabs@localhost:5433/flwrlabs'" ``` ```shell flower-supernode \ --insecure \ --superlink 127.0.0.1:9092 \ --clientappio-api-address 127.0.0.1:9095 \ --node-config="db-url='postgresql+psycopg://flwrlabs:flwrlabs@localhost:5434/flwrlabs'" ``` Finally, run the Flower App and follow the `ServerApp` logs to track the execution of the run: ```shell flwr run . local-deployment --stream ``` You can also override some of the settings for your `ClientApp` and `ServerApp` defined in `pyproject.toml`. For example: ```shell flwr run . local-deployment --run-config "selected-features='age,bmi'" --stream ``` The steps above are adapted from this [how-to guide](https://flower.ai/docs/framework/how-to-run-flower-with-deployment-engine.html). After that, you might be intersted in setting up [secure TLS-enabled communications](https://flower.ai/docs/framework/how-to-enable-tls-connections.html) and [SuperNode authentication](https://flower.ai/docs/framework/how-to-authenticate-supernodes.html) in your federation. If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the [Flower with Docker](https://flower.ai/docs/framework/docker/index.html) documentation.