Dissemination

Conferences

(1) George Sammit, Zhongjie Wu, Yihao Wang, Zhongdi Wu, Akihito Kamata, Joe Nese, and Eric C. Larson (2022). Automated Prosody Classification for Oral Reading Fluency with Quadratic Kappa Loss and Attentive X-vectors. International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), Singapore.

Automated prosody classification in the context of oral reading fluency is a critical area for the objective evaluation of students’ reading proficiency. In this work, we present the largest ataset to date in this domain. It includes spoken phrases from over 1,300 students assessed by multiple trained raters. Moreover, we investigate the usage of X-Vectors and two variations thereof that incorporate weighted attention in classifying prosody correctness. We also evaluate the usage of quadratic weighted kappa loss to better accommodate the inter-rater differences in the dataset. Results indicate improved performance over baseline convolutional and current state-of-the-art models, with prosodic correctness accuracy of 86.4%.


Related Works


CORE

Journal Articles

(4) Nese, J. F. T. (2022). Comparing the growth and predictive performance of a traditional oral reading fluency measure with an experimental novel measure. AERA Open, 8, 1-19.

Curriculum-based measurement of oral reading fluency (CBM-R) is used as an indicator of reading proficiency, and to measure at risk students’ response to reading interventions to help ensure effective instruction. The purpose of this study was to compare model-based words read correctly per minute (WCPM) scores (computerized oral reading evaluation [CORE]) with Traditional CBM-R WCPM scores to determine which provides more reliable growth estimates and demonstrates better predictive performance of reading comprehension and state reading test scores. Results indicated that in general, CORE had better (a) within-growth properties (smaller SDs of slope estimates and higher reliability), and (b) predictive performance (lower root mean square error, and higher \(R^2\), sensitivity, specificity, and area under the curve values). These results suggest increased measurement precision for the model-based CORE scores compared with Traditional CBM-R, providing preliminary evidence that CORE can be used for consequential assessment.

(3) Nese, J. F. T., & Kamata, A. (2021). Evidence for automated scoring and shorter passages of CBM-R in early elementary school. School Psychology, 36, 47-59.

Curriculum-based measurement of oral reading fluency (CBM-R) is widely used across the United States as a strong indicator of comprehension and overall reading achievement, but has several limitations including errors in administration and large standard errors of measurement. The purpose of this study is to compare scoring methods and passage lengths of CBM-R in an effort to evaluate potential improvements upon traditional CBM-R limitations. For a sample of 902 students in Grades 2 through 4, who collectively read 13,766 passages, we used mixed-effect models to estimate differences in CBM-R scores and examine the effects of (a) scoring method (comparing a human scoring criterion vs. traditional human or automatic speech recognition [ASR] scoring), and (b) passage length (25, 50, or 85 words, and traditional CBM-R length). We also examined differences in word score (correct/incorrect) agreement rates between human-to-human scoring and human-to-ASR scoring. Our results indicated that ASR can be applied in schools to score CBM-R, and that scores for shorter passages are comparable to traditional passages.

(2) Nese, J. F. T., & Kamata, A. (2021). Addressing the large standard error of traditional CBM-R: Estimating the conditional standard error of a model-based estimate of CBM-R. Assessment for Effective Intervention, 47, 53-58.

Curriculum-based measurement of oral reading fluency (CBM-R) is widely used across the country as a quick measure of reading proficiency that also serves as a good predictor of comprehension and overall reading achievement, but it has several practical and technical inadequacies, including a large standard error of measurement (SEM). Reducing the SEM of CBM-R scores has positive implications for educators using these measures to screen or monitor student growth. The purpose of this study was to compare the SEM of traditional CBM-R words correct per minute (WCPM) fluency scores and the conditional SEM (CSEM) of model-based WCPM estimates, particularly for students with or at risk of poor reading outcomes. We found (a) the average CSEM for the model-based WCPM estimates was substantially smaller than the reported SEMs of traditional CBM-R systems, especially for scores at/below the 25th percentile, and (b) a large proportion (84%) of sample scores, and an even larger proportion of scores at/below the 25th percentile (about 99%) had a smaller CSEM than the reported SEMs of traditional CBM-R systems.

(1) Kara, Y., Kamata, A., Potgieter, C., & Nese, J. F. (2020). Estimating model-based oral reading fluency: A Bayesian approach. Educational and Psychological Measurement, 80, 847-869.

Oral reading fluency (ORF), used by teachers and school districts across the country to screen and progress monitor at-risk readers, has been documented as a good indicator of reading comprehension and overall reading competence. In traditional ORF administration, students are given one minute to read a grade-level passage, after which the assessor calculates the words correct per minute (WCPM) fluency score by subtracting the number of incorrectly read words from the total number of words read aloud. As part of a larger effort to develop an improved ORF assessment system, this study expands on and demonstrates the performance of a new model-based estimate of WCPM based on a recently developed latent-variable psychometric model of speed and accuracy for ORF data. The proposed method was applied to a data set collected from 58 fourth-grade students who read four passages (a total of 260 words). The proposed model-based WCPM scores were also evaluated through a simulation study with respect to sample size and number of passages read.

Conferences

(9) Kamata, A., Kara, Y., Potgieter, C. J., & Nese, J. F. T. (2020, March). Equating oral reading fluency scores: A model-based approach. Paper accepted for presentation at the 8th annual Texas Universities Educational Statistics and Psychometrics Meeting, College Station, TX. (Conference Canceled)

This study demonstrates and evaluates equating procedures for the model-based oral reading fluency (ORF) scores estimated by a latent binomial-lognormal model. Preliminary results showed that model-based ORF scores should be preferred over observed words correct per minute measures for passages that have been equated by the common-item non-equivalent group design.

(8) Nese, J. F. T., Anderson, D., & Kamata, A. (2020, April). Preliminary consequential validity evidence for a computerized oral reading fluency assessment. Paper accepted for presentation at the annual meeting of the American Educational Research Association (AERA), San Francisco, CA. (Conference Canceled)

Curriculum-based measurement of oral reading fluency (ORF) is used to identify students at-risk for poor learning outcomes through screening assessments, and to monitor student progress to help guide and inform instructional decision-making. The purpose of this study was to compare the consequential validity properties of CORE and a traditional ORF assessment (easyCBM) for students in Grades 2 through 4. We found good evidence for the predictive and concurrent validity of CORE (comparable to traditional ORF), and improved properties of reliability for CORE compared to traditional ORF from a longitudinal design. We discuss the implications of these results for practitioners using these classroom assessments.

(7) Kamata, A., Kara, Y., Potgieter, C. J., & Nese, J. F. T. (2020, April). Equating oral reading fluency scores: A model-based approach. Paper accepted for presentation at the annual meeting of National Council on Measurement in Education, San Francisco, CA. (Conference Canceled)

This study demonstrates and evaluates equating procedures for the oral reading fluency (ORF) scores estimated by a latent binomial-lognormal joint model. Preliminary results showed that model-based ORF scores should be preferred over observed words correct per minute measures for passages that have been equated by the common-item non-equivalent group design.

(6) Nese, J. F. T. & Kamata, A. (2020, February). Reducing the standard error of measurement (SEM) of oral reading fluency (ORF). Poster accepted for presentation at the annual meeting of the Pacific Coast Research Conference (PCRC), Coronado, CA. (Conference Canceled)

The standard error of measurement (SEM) is a measure of precision of an assessment score. The smaller the SEM, the more precise the score. Reducing the SEM of curriculum-based measures (CBM) of oral reading fluency (ORF or CBM-R) scores has positive implications for teachers using these measures. With a sample of N = 1,021 students in Grades 2 through 4, we compared our CORE CSEM results to the reported traditional SEM of observed WCPM for the following CBM-R systems: aimsweb Plus, DIBELS 8th Edition, easyCBM, and FastBrdige. The average estimated CORE CSEM across all grades was lower than the SEMs of all reference CBM-R for all grades; i.e., Grade 2 = 5.15, Grade 3 = 5.47, and Grade 4 = 7.63. In addition, across all grades, 84% of estimated CORE SEs were less than the reference SEM of 8 WCPM (Grade 2 = 92%, Grade 3 = 95%, and Grade 4 = 63%). For students at/below the 20th percentile, the CORE estimated mean CSEMs were 3-4 WCPM, substantially smaller than the reported SEM of traditional CBM-R systems. Lower CSEM estimates make CORE better suited for measuring ORF as a more precise score will lead to more accurate instructional decisions.

(5) Nese, J. F. T. & Kamata, A. (2020, February). Accuracy of speech recognition in oral reading fluency for diverse student groups. Poster presented at the annual meeting of the Council for Exceptional Children (CEC), Portland, OR.

Automatic speech recognition (ASR) can be used to score oral reading fluency (ORF) assessments to ameliorate current inadequacies (e.g., administration errors, high opportunity cost), and represents an important part of a larger solution to improve traditional ORF. But more research is needed on how ASR performs for diverse student groups. The purpose of this study is to examine the accuracy of ORF scores as generated by ASR compared to humans, and in particular, differential effects for students with disabilities (SWD) and those receiving English language (EL) supports. The total sample size was N = 650 students. Across Grades 2 to 4, the ORF word score agreement rates between human criterion and ASR were significantly lower for SWDs compared to their non-SWD/non-EL peers. There was no such difference for EL students. The differences in ORF WCPM scores between human and ASR were not exacerbated for SWD or EL students. We speculate that the ASR may be less accurate than a human scorer for SWDs at the word level, but the difference in scoring is mitigated when scores are aggregated at the passage level.

(4) Nese, J. F. T., Kamata, A., & Kahn, J. (2017, April). Predictors of low agreement between automated speech recognition and human scores. Poster presented at the annual meeting of the National Council on Measurement in Education (NCME), San Antonio, TX.

Despite prevalent use and practical application, the current and standard assessment of oral reading fluency (ORF) presents considerable limitations which reduces its validity in estimating growth and monitoring student progress, including: (a) high cost of implementation; (b) tenuous passage equivalence; and (c) bias, large standard error, and tenuous reliability. To address these limitations, the Computerized Oral Reading Evaluation (CORE) system contains an automated scoring algorithm based on a speech recognition engine and a novel latent variable psychometric model. The purpose of this study is to investigate potential student and passage predictors of low agreement between an automated speech recognition (ASR) engine and human scores of words read correctly in student oral reading fluency passages. We fit a cross-classified, variable exposure Poisson model to estimate agreement and found that the majority of variance was found at the student and recording levels, and that student demographic variables explained only a small amount (13%) of the student-level variance.

(3) Nese, J. F. T., Alonzo, J., Biancarosa, G., Kamata, A., & Kahn, J. (2017, February). Text messages: Examining different estimates of text complexity. Poster presented at the Annual Meeting of the Pacific Coast Research Conference (PCRC), Coronado, CA.

The purpose of this study was to provide comparisons of quantitative text complexity estimates based on text features to estimates of text difficulty based on student performance. Specifically, these comparisons are situated in the context of curriculum-based measurement (CBM) assessment of oral reading fluency (ORF), where the passages range from 20 to 105 words. We administered 330 ORF passages – 110 at Grades 2, 3, 4 to 910 students (Grade 2 = 259, Grade 3 = 329, Grade 4 = 322). Each passage was an original work of narrative fiction, and targeted readability at the mid-year level for each grade. Students were assessed online, via laptops in a one-to-one administration, during which each student read approximately 3 long, 5 medium, and 10 short passages. Flesch-Kincaid aligned closest to grade level as an artifact of the passage development WCPM and Latent fluency generally increase across lengths and grades. All estimates increased across grades except: Formality remained fairly stable across Grades 3 & 4, Flesch-Kincaid and Automated Readability Index (ARI) estimates remained stable across lengths. The measure with the lowest correlations among all measures was Formality, perhaps a different dimension of text complexity. The highest correlations observed were among ARI, Flesch-Kincaid, and WCPM by length (which increased by length), which may be partly an artifact of passage development, and partly a similarity between the formulas of ARI and Flesch-Kincaid (particularly for lower grade texts with less multisyllabic words). But, based on within grade correlations and previous research, this is a spurious relation resulting from the “developmental” nature of the scales.

(2) Nese, J. F. T., Alonzo, J., & Kamata, A. (2016, April). Comparing passage lengths and human vs. speech recognition scoring or oral reading fluency. Paper presented at the annual meeting of the American Educational Research Association (AERA), Washington, DC.

The purpose of this study is to explore a computerized oral reading fluency (ORF) system that uses speech recognition software (CORE). Sample sizes were 127 for Grade 2, 158 for Grade 3, and 162 for Grade 4. Passages were administered via computer: 18 passages (3 long, 5 medium, 10 short). Mixed model approach with two within-subject variables to test the mean WCPM and error rate differences between passage length, scoring method, and their interaction. The length factor included three categories short, medium, and long. The scoring method factor included three categories: Real-Time, Recorded Audio, and ASR. Across grades, significant main effects for passage LENGTH and scoring METHOD, no significant interaction effect, and mixed results for pairwise comparisons. Recorded Audio and Real-Time scores were different across grades, but Recorded Audio and ASR scores were quite similar for all passage lengths and grades. Real-Time scores were higher than both the ASR and Recording. The mean error rates ranged from 3% to 10%, when they were disaggregated by grade, scoring methods, and passage length. Error rates were highest for ASR, shorter passages, and lower grade levels, and lowest for Real-Time scoring. The timed passage duration was consistently greater (approximately 1-2 secs) the for ASR scoring methods than the Real-Time scoring method. Because the ASR and Recording scoring methods used the same time duration to compute WCPM, this would lead to decreased WCPM scores compared to the Real-Time scoring method. Average ASR vs. Real-Time Cohen’s kappa: Grade 2 = .82, Grade 3 = .90, and Grade 4 = .91.

(1) Nese, J. F. T., Kamata, A., & Alonzo, J. (2015, July). Exploring the evidence of speech recognition and shorter passage length in Computerized Oral Reading Fluency (CORE). In K. Cummings (Chair), Assessment fidelity in reading research: Effects of examiner, reading passage, and scoring methods. Symposium conducted at the Society for the Scientific Study of Reading (SSSR), Hawaii.

Assessing reading fluency is critical because it functions as an indicator of comprehension and overall reading achievement. Although theory and research demonstrate the importance of ORF proficiency, traditional ORF assessment practices are lacking as sensitive measures of progress for educators to make instructional decisions. The purpose of this study is to compare traditional ORF measures/administration to a computerized ORF assessment system based on speech recognition software (CORE). Using WCPM scores as the outcome, we compare: (a) traditional ORF passages to CORE passages, (b) CORE passage lengths, and (c) scoring methods. We used a general linear model with two within-subject factors, to test the mean WCPM score differences between passage length, scoring method, and their interaction. We found that CORE passages, whether short, medium, or long, tended to have higher WCPM means than the Traditional ORF passages. Real-time scoring tended to have higher WCPM means than both Audio Recording and ASR scoring types, and the ASR and Recording scores were quite similar across passage length and grade, providing preliminary evidence that the speech recognition scoring engine can score passages as well as human administrators in real settings.


Computational Tools for Model-Based ORF

Conferences

(1) Kamata, A., & Nese, J. F. T. (2022, April). Introduction to model-based approach to oral reading fluency assessment. In A. Kamata’s (Chair) Model-based Approach to Oral Reading Fluency Assessment. National Council on Measurement in Education (NCME), San Diego, CA.

(2) Potgieter, C., Kara, Y., & Kamata, A. (2022, April). Estimating passage parameters by the model-based approach to ORF assessment. In A. Kamata’s (Chair) Model-based Approach to Oral Reading Fluency Assessment. National Council on Measurement in Education (NCME), San Diego, CA.

(3) Potgieter, C., Kamata, A., Kara, Y., Somsong, S., & Wang, K. P. (2022, April). Estimating fluency scores by the model-based approach to orf assessment. In A. Kamata’s (Chair) Model-based Approach to Oral Reading Fluency Assessment. National Council on Measurement in Education (NCME), San Diego, CA.

(2) Somsong, S., Wang, K. P., Le, N., & Kara, Y. (2022, April). Evaluation of various estimators for model-based fluency scores. In A. Kamata’s (Chair) Model-based Approach to Oral Reading Fluency Assessment. National Council on Measurement in Education (NCME), San Diego, CA.

(1) Kara, Y., Nese, J. F. T., & Kamata, A. (2022, April). Practical implications of the model-based approach to orf assessment. In A. Kamata’s (Chair) Model-based Approach to Oral Reading Fluency Assessment. National Council on Measurement in Education (NCME), San Diego, CA.