SSv2

The Something-Something V2 dataset currently contains 108, 499 videos across 174 labels, with duration ranging from 2 to 6 seconds. Labels are textual descriptions based on templates, such as “Dropping [something] into [something]” containing slots (“[something]”) that serve as placeholders for objects. Crowdworkers provide videos where they act out the templates. They choose the objects to perform the actions on and enter the noun-phrase describing the objects when uploading the videos. The dataset is split into train, validation and test-sets in the ratio of 8:1:1. The splits were created so as to ensure that all videos provided by the same worker occur only in one split (train, validation, or test).In its current version, the dataset was generated by 1133 crowd workers with an average of 127.32 workers per class.

More details can be found here.

Sample Frames