Analyzing the Influence of Language Use on Sentiment Analyzers Using Reviews Gathered From Internet Platforms
Document Types
Paper Presentation
School Name
De La Salle University, Manila
Track or Strand
Science, Technology, Engineering, and Mathematics (STEM)
Research Advisor (Last Name, First Name, Middle Initial)
Cheng, Charibeth, K.
Start Date
23-6-2025 1:30 PM
End Date
23-6-2025 3:00 PM
Zoom Link/ Room Assignment
Y304
Abstract/Executive Summary
In the Philippines, where Filipino-English code-switching thrives in online reviews, sentiment analysis struggles to capture nuanced customer feedback on platforms like Shopee and Google Maps. This study examines how code-switching and code-mixing affect six sentiment analysis tools—VADER, a Filipino Sentiment Analyzer (FSA), and four large language models (ChatGPT-4o, DeepSeek-R1, Gemini 2.0 Flash, Perplexity Auto)—and compares sentence-level versus review-level performance. Using a Shopee Philippines and Google Maps review dataset compiled by Cosme and De Leon (2024), 1,000 reviews were sampled in two stages, where 2,500 were first randomly selected, and from this, 1,000 were chosen to balance the sentiment distribution. These reviews were split into 2,589 sentences and annotated for sentiment (positive, negative, neutral, or mixed) and linguistic attributes (English, Filipino, or both; monolingual or multilingual; and code-switching or code-mixing). Model performance was evaluated using macro-averaged F1-scores, with accuracy also considered. Results show that code-switching and code-mixing degrade performance, with F1-scores dropping by 35% for VADER, 8% for FSA, and 9% for LLMs across review and sentence levels. VADER’s English bias limits its efficacy, averaging around 0.30 F1. FSA offers good resilience but performs modestly with a score of 0.40. LLMs were the most robust, maintaining scores around 0.82. Sentence-level analysis outperformed review-level, aided by localized cues. Positive and negative sentiments were classified accurately, while neutral and mixed sentiments remained challenging. These findings highlight the need for tools tailored to multilingual low-resource settings, guiding improvements in sentiment analysis for global e-commerce.
Keywords
code-switching, natural language processing, sentiment analysis, large language models, internet platform reviews
Research Theme (for Paper Presentation and Poster Presentation submissions only)
Computer and Software Technology, and Robotics (CSR)
Initial Consent for Publication
no
Statement of Originality
yes
Analyzing the Influence of Language Use on Sentiment Analyzers Using Reviews Gathered From Internet Platforms
In the Philippines, where Filipino-English code-switching thrives in online reviews, sentiment analysis struggles to capture nuanced customer feedback on platforms like Shopee and Google Maps. This study examines how code-switching and code-mixing affect six sentiment analysis tools—VADER, a Filipino Sentiment Analyzer (FSA), and four large language models (ChatGPT-4o, DeepSeek-R1, Gemini 2.0 Flash, Perplexity Auto)—and compares sentence-level versus review-level performance. Using a Shopee Philippines and Google Maps review dataset compiled by Cosme and De Leon (2024), 1,000 reviews were sampled in two stages, where 2,500 were first randomly selected, and from this, 1,000 were chosen to balance the sentiment distribution. These reviews were split into 2,589 sentences and annotated for sentiment (positive, negative, neutral, or mixed) and linguistic attributes (English, Filipino, or both; monolingual or multilingual; and code-switching or code-mixing). Model performance was evaluated using macro-averaged F1-scores, with accuracy also considered. Results show that code-switching and code-mixing degrade performance, with F1-scores dropping by 35% for VADER, 8% for FSA, and 9% for LLMs across review and sentence levels. VADER’s English bias limits its efficacy, averaging around 0.30 F1. FSA offers good resilience but performs modestly with a score of 0.40. LLMs were the most robust, maintaining scores around 0.82. Sentence-level analysis outperformed review-level, aided by localized cues. Positive and negative sentiments were classified accurately, while neutral and mixed sentiments remained challenging. These findings highlight the need for tools tailored to multilingual low-resource settings, guiding improvements in sentiment analysis for global e-commerce.
https://animorepository.dlsu.edu.ph/conf_shsrescon/2025/paper_csr/5