Date of Publication

8-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Science Education Major in Biology

Subject Categories

Science and Mathematics Education

College

Br. Andrew Gonzalez FSC College of Education

Department/Unit

Science Education

Thesis Advisor

Maricar S. Prudente

Defense Panel Chair

Voltaire Mallari Mistades

Defense Panel Member

Mary Jane C. Flores
Lydia S. Roleda
Socorro E. Aguja
Denis Dyvee R. Errabo

Abstract/Summary

This study developed and psychometrically validated a Biology Item Bank using the Item Response Theory Four-Parameter Logistic (IRT–4PL) model, aimed at providing a standardized pool of calibrated items for Senior High School STEM students preparing for biology-intensive and health-allied college programs. The development process followed a multi-phase validation protocol integrating expert evaluation, empirical testing, and advanced psychometric modeling. An initial pool of 120 multiple-choice items was constructed and reviewed by five biology educators through online focus group discussions. Items were evaluated for content accuracy, linguistic clarity, and curricular relevance, and were classified based on Bloom’s revised taxonomy across six cognitive levels. A pilot validation confirmed semantic and content appropriateness, after which the test was administered to 1,017 STEM students from a private university in Metro Manila. Both dichotomous and polytomous scoring were employed, enabling robust distractor analysis. Reliability analysis yielded a strong Cronbach’s alpha (α = 0.920), which improved slightly (α = 0.923) after the removal of underperforming items. Additional distractor diagnostics resulted in revisions and refinements, producing an 88-item calibrated pool. Structural validity was established through exploratory factor analysis (KMO = 0.879; Bartlett’s test, p < .001) and confirmatory factor analysis, which demonstrated acceptable fit indices (RMSEA = 0.013, SRMR = 0.025, TLI = 0.916, CFI = 0.932). Item-level calibration under the IRT–4PL model provided parameter estimates for discrimination (a), difficulty (b), guessing (c), and slipping (d). Results indicated a small number of items with misfit or overfitting, while the majority performed within psychometric expectations. The Item Characteristic Curves (ICCs) displayed the psychometric soundness of retained items across cognitive domains. Based on integrated statistical and expert criteria, the final classification consisted of 18 retained items, 20 revised, 16 reassigned to alternative domains, and 66 rejected due to psychometric flaws. This study affirms the utility of the IRT–4PL model in developing item banks for high-stakes assessments. The finalized test, rigorously validated, provides a dependable source of calibrated items for biology assessments and diagnostic purposes. Moreover, the study recommends extending the IRT–4PL framework to the development of item banks in other science domains, ensuring validity, fairness, and pedagogical alignment in assessment design.

Keywords: Biology test, IRT–4PL, item bank, discrimination, difficulty, guessing, slipping, factor analysis, latent trait/construct

Abstract Format

html

Language

English

Format

Electronic

Keywords

Biology—Ability testing; Educational tests and measurements; Item response theory; Psychometrics

Upload Full Text

wf_yes

Embargo Period

9-2028

Available for download on Friday, September 01, 2028

Share

COinS