Authentic or Artificial?: A Stylometric Analysis on the Characteristics of Filipino AI-Generated Texts
Document Types
Paper Presentation
Research Theme (for Paper Presentation and Poster Presentation submissions only)
Media and Philippine Studies (MPS)
School Name
De La Salle University
Track or Strand
Science, Technology, Engineering, and Mathematics (STEM)
Research Advisor (Last Name, First Name, Middle Initial)
Cheng, Charibeth
Start Date
25-6-2026 10:30 AM
End Date
25-6-2026 12:00 PM
Zoom Link/ Room Assignment
Online - https://zoom.us/j/94569671692?pwd=Fj3c3ELOebE6QbqbJOOH9wMuildoEc.1 Meeting ID: 945 6967 1692 | Passcode: research
Abstract/Executive Summary
Artificial intelligence (AI) is rapidly advancing in its ability to produce texts, images, and audio with human-like fluency. However, this capability has created a growing difficulty in distinguishing AI-generated content from human-written text (HWT), which may contribute to the spread of misinformation and a decrease in academic integrity. Despite the growing body of stylometric research on AI-generated texts, existing studies have focused almost exclusively on English-language corpora. To the best of our knowledge, this study is the first to conduct a stylometric analysis of Al-generated academic texts written in Filipino, addressing a significant gap in the literature on Al detection in non-English languages. The study analyzes the lexical and syntactic differences between Filipino AI-generated texts (FAIGT) and human-written academic articles using stylometric methods, including word frequency analysis and part-of-speech n-gram patterning. Beyond comparing FAIGT and HWT, the study also examines whether prompt specificity influences the stylometric profile of Al-generated texts by systematically varying the level of detail in the prompts used to generate FAIGT samples. FAIGT samples were generated using Gemini [version], while HWT samples were collected from the Dalumat Journal of De La Salle University, an open-access Filipino academic publication. The results showed that HWT exhibited greater lexical diversity and variability, while FAIGT demonstrated lower lexical diversity but higher consistency across all prompt conditions, a pattern this study terms synthetic consistency. Syntactic analysis further revealed that while both corpora employed similar grammatical structures, FAIGT exhibited a higher frequency of repeated part-of-speech sequences. These findings provide linguistic insights that can inform the development of Al detection frameworks for Filipino academic texts, contributing to efforts to preserve academic integrity in Philippine educational institutions.
Keywords
AI-generated text, human-written text, stylometry, lexical diversity, syntactic analysis
Initial Consent for Publication
yes
Statement of Originality
yes
Authentic or Artificial?: A Stylometric Analysis on the Characteristics of Filipino AI-Generated Texts
Artificial intelligence (AI) is rapidly advancing in its ability to produce texts, images, and audio with human-like fluency. However, this capability has created a growing difficulty in distinguishing AI-generated content from human-written text (HWT), which may contribute to the spread of misinformation and a decrease in academic integrity. Despite the growing body of stylometric research on AI-generated texts, existing studies have focused almost exclusively on English-language corpora. To the best of our knowledge, this study is the first to conduct a stylometric analysis of Al-generated academic texts written in Filipino, addressing a significant gap in the literature on Al detection in non-English languages. The study analyzes the lexical and syntactic differences between Filipino AI-generated texts (FAIGT) and human-written academic articles using stylometric methods, including word frequency analysis and part-of-speech n-gram patterning. Beyond comparing FAIGT and HWT, the study also examines whether prompt specificity influences the stylometric profile of Al-generated texts by systematically varying the level of detail in the prompts used to generate FAIGT samples. FAIGT samples were generated using Gemini [version], while HWT samples were collected from the Dalumat Journal of De La Salle University, an open-access Filipino academic publication. The results showed that HWT exhibited greater lexical diversity and variability, while FAIGT demonstrated lower lexical diversity but higher consistency across all prompt conditions, a pattern this study terms synthetic consistency. Syntactic analysis further revealed that while both corpora employed similar grammatical structures, FAIGT exhibited a higher frequency of repeated part-of-speech sequences. These findings provide linguistic insights that can inform the development of Al detection frameworks for Filipino academic texts, contributing to efforts to preserve academic integrity in Philippine educational institutions.
https://animorepository.dlsu.edu.ph/conf_shsrescon/2026/BoA_MPS/3