Merger¶
- class Merger(merge_config: dict[str, tuple[str, ...]])[source]¶
Bases:
object
Merge existing splits of the dataset and assign them custom names.
Create new DatasetDict with new split names corresponding to the merged existing splits (e.g. “train”, “valid” and “test”).
- Parameters:
merge_config (Dict[str, Tuple[str, ...]]) – Dictionary with keys - the desired split names to values - tuples of the current split names that will be merged together
Examples
Create new DatasetDict with a split name “new_train” that is created as a merger of the “train” and “valid” splits. Keep the “test” split.
>>> # Assuming there is a dataset_dict of type `DatasetDict` >>> # dataset_dict is {"train": train-data, "valid": valid-data, "test": test-data} >>> merger = Merger( >>> merge_config={ >>> "new_train": ("train", "valid"), >>> "test": ("test", ) >>> } >>> ) >>> new_dataset_dict = merger(dataset_dict) >>> # new_dataset_dict is >>> # {"new_train": concatenation of train-data and valid-data, "test": test-data}
Methods
resplit
(dataset)Resplit the dataset according to the merge_config.