All AI-powered test scores are grounded in extensive research, providing a reliable basis for evaluating the test-taker's language proficiency.
Contact us »
Our assessment evaluates a test-taker’s English language skills in real-life situations, especially in professional and academic contexts where English communication is essential.
To achieve this, the AI-powered test assesses the accuracy, variety, and clarity of spoken and written English, covering both linguistic and some pragmatic aspects.
Our adaptive listening and reading tests are based on authentic audio and text passages.
The test is structured to enable the use of scores for evaluating the test-taker's language proficiency. All score interpretations, as well as the tests themselves, are built on a strong theoretical foundation.
The Speaknow assessment is a computer-based test of spoken English. One of the primary uses of the exam is for higher education admissions or placement into an English language programs. As such, it is necessary to evaluate the adequacy of the exam to provide information for making decisions about the English levels of candidates for higher education. The ability of an exam to measure what it purports to measure is known as validity. There are many different methods for establishing validity. This paper explains the construct validity of the Speaknow assessment for use in higher education English programs.
There are two main methods of making admissions and placement decisions. One method is through standardized testing, and a second is through in-house testing. Standardized testing has the advantage of providing scores according to a recognized scale, and providing comparability with other, similar programs. External, standardized exams also have the benefit to the program of being more cost effective for the program, because students are generally the ones who pay for these exams. In-house exams have the advantage of being customizable and specified to the needs of the programs. They have the additional benefit to the students of shifting the cost for testing onto the program, rather than requiring often costly external exams (Ling, et al., 2014).
Personal interviews are often a part of English placement and admissions tests, such as the IELTS, and are also a frequent component of in-house tests. As such, interviews are a logical means of testing the validity of a standardized exam for making placement decisions.
The question that this study explored was how well do the scores of the Speaknow Assessment correspond to the scores of face-to-face interviewers for placement in English programs.
51 prospective students at a teachers’ college in Israel volunteered to participate in the study. All of the students were interviewed by two different faculty members, who scored the students’ speech using CEFR aligned rubrics. Most, but not all of the faculty members have experience using the CEFR. Although each student was interviewed by two different faculty members, the interviewers were not necessarily consistent for each student.
After the interviews, the students took the Speaknow Assessment of speaking and listening. The college compiled the exam and interview scores and provided them to Speaknow with identifying information removed.
The agreement of the Speaknow scores with the college interviewers is reported below. Both Light’s Kappa, and ICC scores are reported. The agreement of the Speaknow CEFR scores with those of the two raters shows that overall, the agreement between the Speaknow test scores and the college rater score is closer (0.923 and 0.931) than are the scores of the two college raters to each other (0.882).
Cohen’s Weighted Kappa scores are also reported. These scores take into account the size of the difference between the scores. Using this test, the relationship of the scores of the Speaknow Assessment with each of the raters (0.826, 0.825) is stronger than the scores between the raters (0.713).
Conclusions can not be drawn about the skill of any of the individual raters, because, as stated above, the interviewers were randomized. Overall, of the 51 exams scored, the Speaknow score agreed with both of the raters 27 times (53%), and with one of the raters in 22 cases (43%). In total, there was exact agreement between the Speaknow CEFR score and one of the interviewers in 96% of the cases.
Interrater Reliability
n | rater | statistic | z | p | |
---|---|---|---|---|---|
Light's Kappa | 51 | 2 | 0.681 | 9.67 | 0.00 |
Interrater Reliability
Method | Cohen's Kappa for 2 Raters (Weights: equal) |
---|---|
Subjects | 51 |
Raters | 2 |
Agreement % | 74.5 |
Kappa | 0.826 |
z | 9.13 |
p-value | <.001 |
Intraclass correlation coefficient
Subjects | Raters | Subject variance | Rater variance | Residual variance | Consistency | Agreement | |
---|---|---|---|---|---|---|---|
Value | 51 | 2 | 1.87 | 0.0251 | 0.132 | 0.934 | 0.923 |
Interrater Reliability
n | rater | statistic | z | p | |
---|---|---|---|---|---|
Light's Kappa | 51 | 2 | 0.657 | 9.41 | 0.00 |
Interrater Reliability
Method | Cohen's Kappa for 2 Raters (Weights: equal) |
---|---|
Subjects | 51 |
Raters | 2 |
Agreement % | 72.5 |
Kappa | 0.825 |
z | 9.16 |
p-value | <.001 |
Intraclass correlation coefficient
Subjects | Raters | Subject variance | Rater variance | Residual variance | Consistency | Agreement | |
---|---|---|---|---|---|---|---|
Value | 51 | 2 | 1.86 | 0.00431 | 0.133 | 0.933 | 0.931 |
Interrater Reliability
n | rater | statistic | z | p | |
---|---|---|---|---|---|
Light's Kappa | 51 | 2 | 0.444 | 6.57 | 4.89e-11 |
Interrater Reliability
Method | Cohen's Kappa for 2 Raters (Weights: equal) |
---|---|
Subjects | 51 |
Raters | 2 |
Agreement % | 54.9 |
Kappa | 0.713 |
z | 7.93 |
p-value | <.001 |
Intraclass correlation coefficient
Subjects | Raters | Subject variance | Rater variance | Residual variance | Consistency | Agreement | |
---|---|---|---|---|---|---|---|
Value | 51 | 2 | 1.90 | 0.00196 | 0.253 | 0.883 | 0.882 |
The Speaknow Assessment was more consistent with any one interviewer’s assessment of the students’ English proficiency than were individual raters with each other. The high agreement between the Speaknow Assessment and the human interviewer scores (0.96) indicates that the Speaknow Assessment is a valid means of assessing students’ English proficiency level for use in a college program.