Mona Lee

DEVELOPMENT

To implement our project, we first did some initial data collection of popular TikTok dances and then set up a 3-stage pipeline, consisting of a pre-processing stage, normalization stage, and finally our machine learning stage. In the pre-processing stage, we used OpenPose to output the keypoints of each frame for each video of a specific dance and parsed those keypoints. The associated labels, which represent how accurate a dance was (ranging from 1 to 10) as compared to the "model" dance (10), were parsed in a .csv file.

For the normalization stage, we utilized OpenPose’s normalization feature to convert the keypoint coordinates from their original scale to a [0, 1] scale. This is helpful for videos taken with different frame dimensions and for dancers at varying distances from the camera. Additionally, we centered the frame around the nose keypoint by setting its coordinate values to [0.5, 0.5] and translated all other keypoint coordinates accordingly.

Finally, we built a neural network using PyTorch for the purpose of evaluating how accurate interpretations of particular TikTok dances were as compared to a “model” rendition. The network will output a number in the range 0 to 10 that describes how accurate a video of the dance is as compared to this “model” rendition. We used a 16/4/5 (training/validation/test) split of the data on the “All About Cake” dance, and 11/2/2 on the rest.

Reference Videos — List of dances used as well as the "best" reference video for each.

Best Reference Videos — Pose estimation examples on the "best" reference video for selected dances.