fiftyone.utils.random¶
Random sampling utilities.
Functions:
|
Generates a random partition of the samples in the collection according to the specified split fractions. |
|
Generates a random sample of size |
|
Generates a random sample of size |
-
fiftyone.utils.random.
random_split
(sample_collection, split_fracs, seed=None)¶ Generates a random partition of the samples in the collection according to the specified split fractions.
Example:
import fiftyone as fo import fiftyone.utils.random as four import fiftyone.zoo as foz # A dataset with `ground_truth` detections and no tags dataset = ( foz.load_zoo_dataset("quickstart") .select_fields("ground_truth") .set_field("tags", []) ).clone() # # Generate a random sample and encode results via tags # four.random_split(dataset, {"train": 0.7, "test": 0.2, "val": 0.1}) print(dataset.count_sample_tags()) # {'train': 140, 'test': 40, 'val': 20} # # Generate a random sample in-memory # view1, view2 = four.random_split(dataset, [0.5, 0.5]) assert len(view1) + len(view2) == len(dataset) assert set(view1.values("id")).isdisjoint(set(view2.values("id")))
- Parameters
sample_collection – a
fiftyone.core.collections.SampleCollection
split_fracs –
can be either of the following:
a dict mapping tag strings to split fractions in
[0, 1]
. In this case, the partition is denoted by tagging each sample with its assigned splita list of split fractions in
[0, 1]
. In this case, a corresponding list offiftyone.core.view.DatasetView
instances containing the partition is returned
In either case, the split fractions are normalized so that they sum to 1, if necessary
seed (None) – an optional random seed
- Returns
one of the following
None
, ifsplit_fracs
is a dicta tuple of
fiftyone.core.view.DatasetView
instances, ifsplit_fracs
is a list
-
fiftyone.utils.random.
weighted_sample
(sample_collection, k, weights, tag=None, exact=True, seed=None)¶ Generates a random sample of size
k
from the given collection such that the probability of selecting each sample is proportional to the given per-sample weights.Example:
import fiftyone as fo import fiftyone.utils.random as four import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("cifar10", split="train") # Sample proportional to label length weights = dataset.values(F("ground_truth.label").strlen()) sample = four.weighted_sample(dataset, 10000, weights) # Plot results plot = fo.CategoricalHistogram( "ground_truth.label", order=lambda kv: -len(kv[0]), # order by label length init_view=sample, ) plot.show()
- Parameters
sample_collection – a
fiftyone.core.collections.SampleCollection
k – the number of samples to select
weights – an array of per-sample weights
tag (None) – an optional sample tag to use to encode the results
exact (True) – whether to tag exactly
k
samples (True) or sample so that the expected number of samples isk
(False)seed (None) – an optional random seed to use
- Returns
a
fiftyone.core.view.DatasetView
containing the sample
-
fiftyone.utils.random.
balanced_sample
(sample_collection, k, path, tag=None, exact=True, seed=None)¶ Generates a random sample of size
k
from the given collection such that the expected histogram ofpath
values in the sample is uniform.Example:
import fiftyone as fo import fiftyone.utils.random as four import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("cifar10", split="train") # Sample proportional to label length weights = dataset.values(F("ground_truth.label").strlen()) view1 = four.weighted_sample(dataset, 10000, weights) # Now take a balanced sample from this unbalanced sample view2 = four.balanced_sample(view1, 2000, "ground_truth.label") # Plot results plot1 = fo.CategoricalHistogram("ground_truth.label", init_view=dataset) plot2 = fo.CategoricalHistogram( "ground_truth.label", order=lambda kv: -len(kv[0]), # order by label length init_view=view1, ) plot3 = fo.CategoricalHistogram("ground_truth.label", init_view=view2) plot = fo.ViewGrid([plot1, plot2, plot3]) plot.show()
- Parameters
sample_collection – a
fiftyone.core.collections.SampleCollection
k – the number of samples to select
path – the categorical field against which to sample, e.g.,
"ground_truth.label"
tag (None) – an optional sample tag to use to encode the results
exact (True) – whether to tag exactly
k
samples (True) or sample so that the expected number of samples isk
(False)seed (None) – an optional random seed to use
- Returns
a
fiftyone.core.view.DatasetView
containing the sample