Authentic or Artificial?: A Stylometric Analysis on the Characteristics of Filipino AI-Generated Texts

Document Types

Paper Presentation

Research Theme (for Paper Presentation and Poster Presentation submissions only)

Media and Philippine Studies (MPS)

School Name

De La Salle University

Track or Strand

Science, Technology, Engineering, and Mathematics (STEM)

Research Advisor (Last Name, First Name, Middle Initial)

Cheng, Charibeth

Start Date

25-6-2026 10:30 AM

End Date

25-6-2026 12:00 PM

Zoom Link/ Room Assignment

Online - https://zoom.us/j/94569671692?pwd=Fj3c3ELOebE6QbqbJOOH9wMuildoEc.1 Meeting ID: 945 6967 1692 | Passcode: research

Abstract/Executive Summary

Artificial intelligence (AI) is rapidly advancing in its ability to produce texts, images, and audio with human-like fluency. However, this capability has created a growing difficulty in distinguishing AI-generated content from human-written text (HWT), which may contribute to the spread of misinformation and a decrease in academic integrity. Despite the growing body of stylometric research on AI-generated texts, existing studies have focused almost exclusively on English-language corpora. To the best of our knowledge, this study is the first to conduct a stylometric analysis of Al-generated academic texts written in Filipino, addressing a significant gap in the literature on Al detection in non-English languages. The study analyzes the lexical and syntactic differences between Filipino AI-generated texts (FAIGT) and human-written academic articles using stylometric methods, including word frequency analysis and part-of-speech n-gram patterning. Beyond comparing FAIGT and HWT, the study also examines whether prompt specificity influences the stylometric profile of Al-generated texts by systematically varying the level of detail in the prompts used to generate FAIGT samples. FAIGT samples were generated using Gemini [version], while HWT samples were collected from the Dalumat Journal of De La Salle University, an open-access Filipino academic publication. The results showed that HWT exhibited greater lexical diversity and variability, while FAIGT demonstrated lower lexical diversity but higher consistency across all prompt conditions, a pattern this study terms synthetic consistency. Syntactic analysis further revealed that while both corpora employed similar grammatical structures, FAIGT exhibited a higher frequency of repeated part-of-speech sequences. These findings provide linguistic insights that can inform the development of Al detection frameworks for Filipino academic texts, contributing to efforts to preserve academic integrity in Philippine educational institutions.

Keywords

AI-generated text, human-written text, stylometry, lexical diversity, syntactic analysis

Statement of Originality

yes

This document is currently not available here.

Share

COinS
 
Jun 25th, 10:30 AM Jun 25th, 12:00 PM

Authentic or Artificial?: A Stylometric Analysis on the Characteristics of Filipino AI-Generated Texts

Artificial intelligence (AI) is rapidly advancing in its ability to produce texts, images, and audio with human-like fluency. However, this capability has created a growing difficulty in distinguishing AI-generated content from human-written text (HWT), which may contribute to the spread of misinformation and a decrease in academic integrity. Despite the growing body of stylometric research on AI-generated texts, existing studies have focused almost exclusively on English-language corpora. To the best of our knowledge, this study is the first to conduct a stylometric analysis of Al-generated academic texts written in Filipino, addressing a significant gap in the literature on Al detection in non-English languages. The study analyzes the lexical and syntactic differences between Filipino AI-generated texts (FAIGT) and human-written academic articles using stylometric methods, including word frequency analysis and part-of-speech n-gram patterning. Beyond comparing FAIGT and HWT, the study also examines whether prompt specificity influences the stylometric profile of Al-generated texts by systematically varying the level of detail in the prompts used to generate FAIGT samples. FAIGT samples were generated using Gemini [version], while HWT samples were collected from the Dalumat Journal of De La Salle University, an open-access Filipino academic publication. The results showed that HWT exhibited greater lexical diversity and variability, while FAIGT demonstrated lower lexical diversity but higher consistency across all prompt conditions, a pattern this study terms synthetic consistency. Syntactic analysis further revealed that while both corpora employed similar grammatical structures, FAIGT exhibited a higher frequency of repeated part-of-speech sequences. These findings provide linguistic insights that can inform the development of Al detection frameworks for Filipino academic texts, contributing to efforts to preserve academic integrity in Philippine educational institutions.

https://animorepository.dlsu.edu.ph/conf_shsrescon/2026/BoA_MPS/3