HMDB-51

HMDB dataset has been collected from various sources, mostly from movies, and a small proportion from public databases such as the Prelinger archive, YouTube and Google videos. The dataset contains 6849 clips divided into 51 action categories, each containing a minimum of 101 clips. The actions categories can be grouped in five types: General facial actions smile, laugh, chew, talk. Facial actions with object manipulation: smoke, eat, drink. General body movements: cartwheel, clap hands, climb, climb stairs, dive, fall on the floor, backhand flip, handstand, jump, pull up, push up, run, sit down, sit up, somersault, stand up, turn, walk, wave. Body movements with object interaction: brush hair, catch, draw sword, dribble, golf, hit something, kick ball, pick, pour, push something, ride bike, ride horse, shoot ball, shoot bow, shoot gun, swing baseball bat, sword exercise, throw. Body movements for human interaction: fencing, hug, kick someone, kiss, punch, shake hands, sword fight.

In addition to the label of the action category, each clip is annotated with an action label as well as a meta-label describing the property of the clip. Because HMDB51 video sequences are extracted from commercial movies as well as YouTube, it represents a fine multifariousness of light conditions, situations and surroundings in which the action can appear, captured with different camera types and recording techniques such as points of view. The point of view is another criterion of subdivision the HMDB supports. For an all-around coverage the perspectives frontal, lateral (right and left) and backwards view of motions are distinguishable. In addition we have two distinct categories namely “no motion” and “camera motion”. The later is the result of zooming, traveling shots and camera shaking, etc.

More details can be found here.

Sample Frames