Recognizing self-stimulatory behavior using spatio-temporal convolutional neural network

Date of Publication


Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science


College of Computer Studies


Computer Science

Thesis Adviser

Merlin Teodosia C. Suarez

Defense Panel Chair

Joel P. Ilao

Defense Panel Member

Merlin Teodosia C. Suarez
Rafael A. Cabredo
Ryan Samuel M. Dimaunahan


Autism Spectrum Disorder (ASD) is a neuro-developmental disability that affects cognitive and motor skills, social communication and interaction. One of the most visible and quantifiable indicator of autism is a behavioral cue called self-stimulatory behavior or repetitive movements. In recent years, there has been significant efforts in researching the use of technology to help diagnose and monitor ASD. This is in response to the alarming increase in the rate of children affected by ASD. An increase of over 200% was recorded from 2010 to 2014. One of the main research focus is applying technology in observing and monitoring self-stimulatory behaviors since impractical amount of time is needed to perform manual observation. Having an automated system that is able to recognize self-stimulatory behavior can help not only the medical professional but also the child, caregiver, and the parents.

Currently, researchers are utilizing sensory data or videos along with traditional machine learning techniques to recognize self-stimulatory behavior. However, application of deep learning, a state-of-the-art machine learning technique, is still subject to further studies. Deep learning has been able to surpass traditional machine learning techniques in different domains. Convolutional neural network, a popular deep learning technique, showed great results in image processing and is being extended to handle video clips to take advantage of the temporal features. Despite being in an infancy stage, spatio-temporal convolutional neural network has already shown competitive or better result than traditional machine learning techniques that use hand-crafted features. This research proposes the use of spatio-temporal convolutional neural network to recognize self-stimulatory behavior. Using SSBD dataset as the basis, this research introduces a new self-stimulatory behavior dataset YTstimming dataset. Furthermore, this research introduces a different data splitting scheme for benchmarking purposes. The best performing spatio-temporal convolutional neural network has a low validation accuracy of 44.37% on a 5-fold cross validation test. However, the model was able to generalize well with a test accuracy of 68.60%. The best performing model is achieved using the SSBD and YTstimming dataset to ne-tune a pre-trained C3D model of on Sports1M Dataset. Lastly, this research creates a prototype that identifies time frames of the occurrence of self-stimulatory behavior in a video.

Abstract Format






Accession Number


Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer disc ; 4 3/4 in.


Machine learning; Neural networks (Computer science)

This document is currently not available here.