Analyzing the Influence of Language Use on Sentiment Analyzers Using Reviews Gathered From Internet Platforms

Document Types

Paper Presentation

School Name

De La Salle University, Manila

Track or Strand

Science, Technology, Engineering, and Mathematics (STEM)

Research Advisor (Last Name, First Name, Middle Initial)

Cheng, Charibeth, K.

Start Date

23-6-2025 1:30 PM

End Date

23-6-2025 3:00 PM

Zoom Link/ Room Assignment

Y304

Abstract/Executive Summary

In the Philippines, where Filipino-English code-switching thrives in online reviews, sentiment analysis struggles to capture nuanced customer feedback on platforms like Shopee and Google Maps. This study examines how code-switching and code-mixing affect six sentiment analysis tools—VADER, a Filipino Sentiment Analyzer (FSA), and four large language models (ChatGPT-4o, DeepSeek-R1, Gemini 2.0 Flash, Perplexity Auto)—and compares sentence-level versus review-level performance. Using a Shopee Philippines and Google Maps review dataset compiled by Cosme and De Leon (2024), 1,000 reviews were sampled in two stages, where 2,500 were first randomly selected, and from this, 1,000 were chosen to balance the sentiment distribution. These reviews were split into 2,589 sentences and annotated for sentiment (positive, negative, neutral, or mixed) and linguistic attributes (English, Filipino, or both; monolingual or multilingual; and code-switching or code-mixing). Model performance was evaluated using macro-averaged F1-scores, with accuracy also considered. Results show that code-switching and code-mixing degrade performance, with F1-scores dropping by 35% for VADER, 8% for FSA, and 9% for LLMs across review and sentence levels. VADER’s English bias limits its efficacy, averaging around 0.30 F1. FSA offers good resilience but performs modestly with a score of 0.40. LLMs were the most robust, maintaining scores around 0.82. Sentence-level analysis outperformed review-level, aided by localized cues. Positive and negative sentiments were classified accurately, while neutral and mixed sentiments remained challenging. These findings highlight the need for tools tailored to multilingual low-resource settings, guiding improvements in sentiment analysis for global e-commerce.

Keywords

code-switching, natural language processing, sentiment analysis, large language models, internet platform reviews

Research Theme (for Paper Presentation and Poster Presentation submissions only)

Computer and Software Technology, and Robotics (CSR)

Statement of Originality

yes

This document is currently not available here.

Share

COinS
 
Jun 23rd, 1:30 PM Jun 23rd, 3:00 PM

Analyzing the Influence of Language Use on Sentiment Analyzers Using Reviews Gathered From Internet Platforms

In the Philippines, where Filipino-English code-switching thrives in online reviews, sentiment analysis struggles to capture nuanced customer feedback on platforms like Shopee and Google Maps. This study examines how code-switching and code-mixing affect six sentiment analysis tools—VADER, a Filipino Sentiment Analyzer (FSA), and four large language models (ChatGPT-4o, DeepSeek-R1, Gemini 2.0 Flash, Perplexity Auto)—and compares sentence-level versus review-level performance. Using a Shopee Philippines and Google Maps review dataset compiled by Cosme and De Leon (2024), 1,000 reviews were sampled in two stages, where 2,500 were first randomly selected, and from this, 1,000 were chosen to balance the sentiment distribution. These reviews were split into 2,589 sentences and annotated for sentiment (positive, negative, neutral, or mixed) and linguistic attributes (English, Filipino, or both; monolingual or multilingual; and code-switching or code-mixing). Model performance was evaluated using macro-averaged F1-scores, with accuracy also considered. Results show that code-switching and code-mixing degrade performance, with F1-scores dropping by 35% for VADER, 8% for FSA, and 9% for LLMs across review and sentence levels. VADER’s English bias limits its efficacy, averaging around 0.30 F1. FSA offers good resilience but performs modestly with a score of 0.40. LLMs were the most robust, maintaining scores around 0.82. Sentence-level analysis outperformed review-level, aided by localized cues. Positive and negative sentiments were classified accurately, while neutral and mixed sentiments remained challenging. These findings highlight the need for tools tailored to multilingual low-resource settings, guiding improvements in sentiment analysis for global e-commerce.

https://animorepository.dlsu.edu.ph/conf_shsrescon/2025/paper_csr/5