CityViews: As Standardized Tests Loom, Improvements are Illusory

Print More

7674804806_d45c6965ed_o

Shannan Muskopf

The New York State Education Department (SED) and Education and Commissioner MaryEllen Elia are campaigning to convince parents that changes have been made to the statewide exams that “will improve the testing experience for students and the validity of the assessments.” There will be fewer test questions, a shift to untimed testing and a replacement of the previous test vendor.

The changes had already been set forth in the Foreword to each Educator Guide to the 2016 Common Core Tests that was prepared for posting last October. They were broadcast in a January SED memo to superintendents and all public school principals. Since then, the news has filtered down to teachers and the word has spread to parents at meetings and individually.

The advance work continued last week when NYC Schools Chancellor Farina sent a letter to parents echoing the changes and urging them to support their children in taking the tests.

Three days of English Language Arts (ELA) tests will commence on April 5th. Math starts on April 13th and will also be given over a three-day period.

The uniformity and on-message nature of these communications mark an effort to make the testing program more appealing to any who might consider opting their children out of the exams.

But what are the changes being underscored, are they improvements, and should parents believe they’re worthwhile?

A decrease in the number of test questions. Day 1 is devoted to Multiple-Choice (M-C) items designed to measure reading comprehension. Last year, looking at Grades 3 and 4, there were five passages followed by 30 items. Now, there will be four readings and 24 items—a 20 percent reduction.

Shorter tests, however, aren’t necessarily better. They tap into smaller samples of what students know. Even if tests are composed of high caliber items, having fewer will provide less reliable information on which to base presumably relevant judgments about students or meaningfully compare school and district performance in the area being assessed.

A sudden lowering of the item count throws the ball back to critics who feel there is too much testing. But friends or foes of testing alike should keep their eyes on that ball. The issue has less to do with item quantity than item quality. Is a chain with 24 weak links stronger than one with 30?

Not emphasized is the fact that one of the four reading passages and six of the 24 M-C items will be embedded in Book 1 for try-out purposes. Performance on the trial items doesn’t count in the score a student gets. So, one-quarter of the time and effort children spend on Day 1 will be given over to field testing material for the test vendor to use on future exams.

Constructively, this means that children will have only 18 M-C items to demonstrate their level of achievement. If trying to gauge reading proficiency based on 24 operational items was precarious last year, how much more dubious will it be to claim that a sample of 18 is sufficient now?

(On Days 2 and 3, the number of Constructed Response Questions (CRQs), where students have to produce an answer, will remain nearly the same this year. There will be six readings again, but nine questions instead of ten.)

A shift to untimed testing. This concession was made to educators and parents who saw that children have had trouble finishing the common core-aligned tests since 2013. The difficulty of the items and the stress placed on children struggling to complete them were widely cited. This year SED has moved to tests that will not be timed. Children will be allowed to proceed at their own pace without a clock.

It seems humane but presents a conundrum. The Educator Guide says that “The tests must be administered under standard conditions and the directions must be followed carefully. The same test administration procedures must be used with all students so that valid inferences can be drawn from the test results.” How can this be reconciled will the removal of time limits?

The same page of the Guide makes a further statement to compound the dilemma: “Given that the spring 2016 tests have no time limits, schools and districts have the discretion to create their own approach to ensure that all students who are productively working are given the time they need to continue to take the tests.” Procedures are not spelled out to allow students as much time as they need.

With all that, the very same page in the 2016 Guide offers that: “On average, students will likely need approximately 60–70 minutes of working time to complete each test session.” The 2015 Guide said the Grades 3 and 4 tests were “designed so most students would complete testing in about 50 minutes,” adding that “students will be permitted 70 minutes to complete the test…. This design provides ample time for students who work at different paces.” Why was 70 minutes to finish 30 items enough time in 2015, but not to complete 24 this April?

Faced with the reality of testing 1.2 million students this year, a de facto 70-minute testing period will probably prevail. But timing won’t be uniform in every school, confounding comparisons that standardization affords. This change, however, gives SED a chance to claim it is addressing shortcomings in the tests by giving children a benefit that is arguably more apparent than real.

Change to a new testing vendor. In November, Questar Assessments, Inc. was awarded a five-year $44.7 million contract with SED to develop the ELA and math examinations. But NCS Pearson, Inc., which held the previous five-year contract, amounting to $38 million through December 2015, was quietly given an extension until the end of June.

The amendment called on Pearson to draw questions from its item bank to develop April 2016’s operational exams. Pearson also supplied embedded field test items for next month’s exams, as well as items that will be tried out in separate (aka stand-alone) field tests in May or June. This material will be the basis for constructing the 2017 operational exams.

Contrary to the carefully crafted impression SED wishes to convey, Pearson has been engaged in an additional cycle of item development. Yes, in a sense, Questar replaced Pearson, but the prior vendor has had a significant continuing role in the 2016 and 2017 testing program.

The state’s party line of improved testing is better served by keeping Pearson out of the picture. The prior vendor’s Common Core-aligned ELA and math products were panned for being badly developed because of poor field testing methods resulting in items that proved to be faulty and far too difficult, especially for English Language Learners and special needs students, when put to operational use. Mentioning Pearson’s name would disrupt SED’s drumbeat of change.

Closer examination reveals that shortening the tests, removing their time limits and working with a new testing partner are not quite the changes SED wants us to believe in. Public faith in the words of New York’s top education officials will not be restored by giving us one side of the story.

Let us hope that the changes that count—recent changes in the composition of the Board of Regents, placing policy-making in the hands of educators attuned to the needs of schools, teachers, parents and children—will end the dark days that allowed bureaucrats to mislead us.

  • Richard WJ DiSalvo

    The elimination of time limits hasn’t made much sense to me. Does untimed testing in practice just mean school-chosen time limits (rather than statewide time limits)?

    • Two Teachers

      No one is sure, there has been no effort to define what “making progress” looks like or means – the head of the teachers union in NYC is suggesting teachers insist administrators decide when kids should stop so no one tries to blame anything in teachers.

  • Even in Australia

    Also, the schools have been given no guidance on logistical issues that the untimed tests pose: what are students who finish early to do, now that they may have to wait much longer for their classmates to finish? Still just sit and do nothing? And what if testing runs into a class’s regularly scheduled lunch period?