A model for age and gender profiling of social media accounts based on post contents

College

College of Computer Studies

Department/Unit

Software Technology

Document Type

Conference Proceeding

Source Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

11302 LNCS

First Page

113

Last Page

123

Publication Date

1-1-2018

Abstract

The growth of social networking platforms such as Facebook and Twitter has bridged communication channels between people to share their thoughts and sentiments. However, along with the rapid growth and rise of the Internet, the idea of anonymity has also been introduced wherein user identities are easily falsified and hidden. Hence, presenting difficulty for businesses to give accurate advertisements to specific account demographics. As such, this study searched for the best model to identify gender and age group of Filipino social media accounts through analyzing post contents. Two model structures for the classifier namely, the stacked/combined structure and the parallel structure were experimented on. Different types of features including those based on socio-linguistics, grammar, characters and words were considered. The results show that different model structures, features, feature reduction and classification algorithms apply to age classification and gender classification. For Facebook and Twitter, the best model for classifying age was Support Vector Classifier (SVC) with least absolute shrinkage and selection operator (Lasso) on a parallel model structure for Facebook, while a combined model structure is best for Twitter. For gender classification, the best model for Facebook used Ridge Classifier (RC), while the best model for Twitter used SVC, both utilizing Lasso on a parallel model structure. The features that were dominant in age classification for both Facebook and Twitter were word-based, socio-linguistic features and post time, while socio-linguistic features, specifically netspeak, were important in gender classification for both platforms. Based on the differences of the features affecting the performance of the models, Facebook and Twitter data must be analyzed separately as the posts found in these two platforms differ significantly. © 2018, Springer Nature Switzerland AG.

html

Digitial Object Identifier (DOI)

10.1007/978-3-030-04179-3_10

Keywords

Consumer profiling; Computational linguistics; Machine learning

Upload File

wf_yes

This document is currently not available here.

Share

COinS