The Something-Something V2 dataset currently contains 108, 499 videos across 174 labels, with duration ranging from 2 to 6 seconds. Labels are
textual descriptions based on templates, such as “Dropping
[something] into [something]” containing slots (“[something]”) that serve as placeholders for objects. Crowdworkers provide videos where they act out the templates.
They choose the objects to perform the actions on and enter
the noun-phrase describing the objects when uploading the
videos.
The dataset is split into train, validation and test-sets in
the ratio of 8:1:1. The splits were created so as to ensure
that all videos provided by the same worker occur only in
one split (train, validation, or test).In its current version, the dataset was generated by 1133
crowd workers with an average of 127.32 workers per class.
More details can be found
here.