Psst! Four test questions are missing. They were on the New York State English Language Arts (ELA) exams given last April designed to measure students against the Common Core Standards. Evidently, the State Education Department pulled them after seeing the results.

The mystery evokes the 2012 Case of the Pineapple and the Hare when SED was publicly forced to scrap a reading passage and six baffling test items. Is history repeating itself, but in the dark this time, to ward off embarrassment and the protests that more bad testing would fuel?

Three items were yanked from the Grade 3 test results. Proof they existed lies in the 2014 Test Guides that SED put on its web site before the exams were given. A single item was struck from the Grade 7 test results and also banished from a sample of questions and answers the state released afterward to help parents and the public understand the Core exams. Taken together, their disappearance suggests problems with test publisher Pearson’s development and SED’s approval of poor test material.

Now, the fate of a few items may strike readers as a quibble. But given the political and—for students and teachers, personal—stakes riding on these exams, details matter a great deal. A single question can be the difference between a student being deemed proficient, or below and making progress, or not. And teacher competency is being graded on the basis of student performance.

While SED can decide after the fact not to count certain questions that are flawed or ambiguous, if those items are present on test day they serve to ramp up kids’ already high stress levels. What’s more, the state pays a lot of money—and offers up thousands of students as guinea pigs—so Pearson can field test items and devise demanding exams that purportedly tap children’s thinking ability.

Although the omissions I’ve detected affect only two grades, those grades encompass 138,500 kids in the city and 378,000 statewide, all with teachers, principals and families.

Grade 3’s test originally had 31 multiple-choice items worth one point apiece, eight short answer questions worth two points each and two four-point questions calling for extended responses. The test’s maximum raw score was 55 points.

The lost items can be identified from gaps in the Grade 3 Test Map that was posted after results for all questions became known to SED. The Map gives the number of each test item. The gap between item #28 and #31 pinpoints the deletion of questions #29 and #30, the two final multiple-choice items children faced on Test Day 1. Effectively, SED erased the answers of 190,000 8-year-olds. Imagine if a teacher was caught tampering in such a way with even a handful of answer sheets!

And #47, the final 4-point question on Test Day 3, dropped off the Map. It required an extended response intended to be cognitively demanding and gauge the future of third graders as SED saw them setting out on the aspirational path to college and careers. Could it be that SED decided to eliminate #47 because too many youngsters couldn’t reach it in 70 minutes? Or, after three days, were kids so muddled by the rigorous Core-related tests that they wrote incomprehensible answers?

The sudden omission of these three questions accounts for six points out of 55, 11 percent of the 2014 ELA taken statewide. The reduction substantially alters the composition and results of the test.

But in some ways, the tale of Grade 7’s item #8 is more telling. It was one of 35 multiple-choice items that had been field tested by the publisher and chosen for operational use on Test Day 1. That’s all we would have known about #8 except for two things.

Last August, SED disclosed half of the test questions for “review and use” and explained “why the correct answer is correct.” There had been mounting pressure from parents who wanted to know more about the Core tests which sent scores plummeting the year before.

The material included a fictional text about baseball called “The Girl Who Threw Butterflies.” Five items associated with the story were revealed. Each correct answer was justified. Reasons were offered to show the other answer choices, known as “distractors,” were wrong.

Then, someone showed me a copy of the test despite SED’s warning not to “reproduce or transmit by any means” any part of the test. Teachers had been ordered not to discuss details. On the actual test, however, there were six questions, not five. The item remaining invisible was #8.

I invite readers to use the link to Grade 7’s annotated questions (see page 16 of this pdf). Read the baseball story. Try to figure out the correct answer:

8. How is baseball Molly watches on television different from baseball Molly plays in real life?
A. The teams on television are quieter than Molly’s team.
B. The coaches on television are younger than Molly’s coach.
C. On television, baseball looks easy, but in real life, the game is challenging.
D. On television, baseball focuses on the star players, but in real life, the game is cooperative.

One wonders why SED might have killed the item. Might there be no answer? Could there be more than one correct answer? Perhaps, a higher percentage of students selected one or two confusing distractors than chose the answer SED deemed to be right? Maybe the item is biased against a certain group of students. Any of the above would give it a failing grade.

This need not be left to speculation. Shortly after the April exams were scored, SED and Pearson had item-by-item statistics in hand showing the percentage of children getting each question right and the percentage choosing the distractors. The data enabled them to study how every item functioned and determine which to keep, which to release and which pineapples to bury. All of that data should be released immediately.

But the state has not issued the testing program’s 2014 Technical Report, due from the publisher in December. It would present stats that allow independent researchers and testing specialists to examine all items and appraise the value of the exams. The 2013 Report was not posted until last July—four months after the 2014 tests had been given. Following this lagging pattern, expect the 2015 exams to be completed in April before too-little, too-late insights can be gained into whether the 2014 Core-aligned exams were on track. (As it is, the state already has cut back on item statistics it previously made available routinely. It no longer includes data on how many children selected the “wrong choices,” which can tip off substandard items.)

So, students, what do we draw from these revelations?

A) Clearly items that Pearson claimed were vetted by review panels and experts were unrefined and no better than field test items that somehow passed muster only to flop in prime time.

B) SED’s dirty secret is out of the bag: Its performance-defining cutoff scores are set after tests are given—in this case, after the raw score distribution had been studied and truncated.

C) SED plays fast and loose with data at its disposal, withholding information from the public that paid for it.

D) Efforts to classify students and evaluate teachers that rest on such shaky grounds are indefensible and unsustainable.

E) All of the above.

“E” certainly seems like the smart choice. But we can’t know for sure until an outside investigation is conducted into how SED and Pearson have run the testing program. Parents should hold their children out of all statewide tests until SED comes clean by providing complete and timely item analysis data and is able to demonstrate that the test results are relevant to the purposes they are being bent to serve—in other words, until there are meaningful alternative assessment programs in place.

Transparency in all matters concerning educational testing is a moral imperative. We must demand passage of revised Truth-in-Testing legislation, opening the testing process to sunshine and scrutiny, restoring its balance and something immeasurable—a level of trust in educational leadership that’s been missing too long.

* * * *
Dennis Tompkins, spokesman for the New York State Education Department, submitted this response: “The processes Fred Smith criticizes are standard psychometric practice. I’d suggest Mr. Smith consult with the National Council on Measurement in Education and other professional assessment organizations. He really should try to understand an issue before he attempts to critique it. A little knowledge can be an embarrassing thing.”
* * * *
Fred Smith, a testing specialist and consultant, was an administrative analyst for the New York City public schools. He’s a member of Change the Stakes, a parent advocacy group.

Follow City Limits’ other investigations. Get our free, weekly newsletter.