compute_frequencies¶
- compute_frequencies(partitioner: Partitioner, column_name: str, verbose_names: bool = False, max_num_partitions: int | None = None) DataFrame [source]¶
Compute the frequencies of unique values in a given column in the partitions.
The frequencies sum up to 1 for a given partition id. This function takes into account all possible labels in the dataset when computing the count for each partition (assign 0 as the size when there are no values for a label in the partition).
- Parameters:
partitioner (Partitioner) – Partitioner with an assigned dataset.
column_name (str) – Column name identifying label based on which the count will be calculated.
verbose_names (bool) – Whether to use verbose versions of the values in the column specified by column_name. The verbose value are possible to extract if the column is a feature of type ClassLabel.
max_num_partitions (Optional[int]) – The maximum number of partitions that will be used. If greater than the total number of partitions in a partitioner, it won’t have an effect. If left as None, then all partitions will be used.
- Returns:
dataframe – DataFrame where the row index represent the partition id and the column index represent the unique values found in column specified by column_name (e.g. represeting the labels). The value of the dataframe.loc[i, j] represnt the ratio of the label j to the total number of sample of in partition i.
- Return type:
pd.DataFrame
Examples
Generate DataFrame with label counts resulting from DirichletPartitioner on cifar10
>>> from flwr_datasets import FederatedDataset >>> from flwr_datasets.partitioner import DirichletPartitioner >>> from flwr_datasets.metrics import compute_frequencies >>> >>> fds = FederatedDataset( >>> dataset="cifar10", >>> partitioners={ >>> "train": DirichletPartitioner( >>> num_partitions=20, >>> partition_by="label", >>> alpha=0.3, >>> min_partition_size=0, >>> ), >>> }, >>> ) >>> partitioner = fds.partitioners["train"] >>> counts_dataframe = compute_frequencies( >>> partitioner=partitioner, >>> column_name="label" >>> )