A model for age and gender profiling of social media accounts based on post contents


College of Computer Studies


Software Technology

Document Type

Conference Proceeding

Source Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


11302 LNCS

First Page


Last Page


Publication Date



The growth of social networking platforms such as Facebook and Twitter has bridged communication channels between people to share their thoughts and sentiments. However, along with the rapid growth and rise of the Internet, the idea of anonymity has also been introduced wherein user identities are easily falsified and hidden. Hence, presenting difficulty for businesses to give accurate advertisements to specific account demographics. As such, this study searched for the best model to identify gender and age group of Filipino social media accounts through analyzing post contents. Two model structures for the classifier namely, the stacked/combined structure and the parallel structure were experimented on. Different types of features including those based on socio-linguistics, grammar, characters and words were considered. The results show that different model structures, features, feature reduction and classification algorithms apply to age classification and gender classification. For Facebook and Twitter, the best model for classifying age was Support Vector Classifier (SVC) with least absolute shrinkage and selection operator (Lasso) on a parallel model structure for Facebook, while a combined model structure is best for Twitter. For gender classification, the best model for Facebook used Ridge Classifier (RC), while the best model for Twitter used SVC, both utilizing Lasso on a parallel model structure. The features that were dominant in age classification for both Facebook and Twitter were word-based, socio-linguistic features and post time, while socio-linguistic features, specifically netspeak, were important in gender classification for both platforms. Based on the differences of the features affecting the performance of the models, Facebook and Twitter data must be analyzed separately as the posts found in these two platforms differ significantly. © 2018, Springer Nature Switzerland AG.


Digitial Object Identifier (DOI)



Consumer profiling; Computational linguistics; Machine learning

Upload File


This document is currently not available here.