Date of Publication
8-5-2023
Document Type
Master's Thesis
Degree Name
Master of Science in Electronics and Communications Engineering
Subject Categories
Electrical and Computer Engineering
College
Gokongwei College of Engineering
Department/Unit
Electronics And Communications Engg
Thesis Advisor
Edwin Sybingco
Defense Panel Chair
Argel Bandala
Defense Panel Member
Ryan Vicerra
Anthony Jose
Abstract/Summary
The use of conversational agents can be extremely beneficial in many areas such as government offices, schools, banks, malls, etc. where people often make inquiries and responses from personnel can take some time. Many of these areas, however, have inquiries that involve domain-specific vocabulary and most likely do not have a large amount of data or computational resources to properly train a complex natural language processing (NLP) model. This paper proposes a method for creating a domain-specific virtual assistant using Generative Pre-Trained Transformer-3 (GPT-3) to generate paraphrases on a relatively small dataset, and a Sentence Transformer (SBERT) model with a distilled version of BERT (DistilBERT) base, pretrained on the Quora Question Pairs dataset, and fine-tuned on the augmented dataset. This method of creating a model is evaluated on the MS MARCO, SemEval, and PubMed datasets using mean average precision (MAP), precision at k (P@k), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR) as performance metrics. The method was also demonstrated using a small dataset of 188 frequently asked questions from the De La Salle University website that also includes domain-specific vocabulary. The implementation of the fine-tuned model was demonstrated on a simple webpage and the results were found to be satisfactory.
Abstract Format
html
Language
English
Format
Electronic
Keywords
Chatbots; Natural language processing (Computer science); Human-computer interaction
Recommended Citation
Roque, M. C. (2023). A domain specific virtual assistant using paraphrase generation for data augmentation and Ssentence transformers on limited data. Retrieved from https://animorepository.dlsu.edu.ph/etdm_ece/28
Upload Full Text
wf_yes
Embargo Period
8-12-2024