This article of the New York Times appeared in January and I had it saved to write about and then fell down my column of drafts. However it is still very timely. This is the very study debunked by the New York Times when it released the data of 18,000 city teachers a few days ago.
Don’t take me wrong: I also believe that a good teacher is better than a mediocre one. The issue at contention here is the means of measuring. I do not believe in standardized tests. Their usefulness/cost ratio is extremely low in my opinion.
The New Jersey Department of Education has just contracted yesterday with Rutgers University to dissect a new teacher evaluation system being tried out in 10 school districts across the state.
Acting Education Commissioner Chris Cerf says the findings by a review team from the Rutgers Graduate School of Education will be used to guide implementation of the new system in the 2013-14 school year.
Unlike the proposed New Jersey study, the Harvard Study (HS) was carried out by a team of economists; not educators. It encompasses data from a period of over 20 years. It matches data from classrooms to the same students later on in life, through their income tax returns. That raises my first suspicion: The data, by its sheer size and span of time, is unverifiable. The end of the study is abundant in data compilation which has been statistically processed. Overall, I fail to find the rigor expected in a scientific paper. The number of assumptions made and liberties taken with the data throughout the study is astounding. The study is highly manipulative (not in the popular sense but in the scientific sense of molding data to fit a theory). They also revise the work of others. Although the study claims to be empirical, it is in fact highly theoretical and it reveals so much so at the very conclusion – which I copied.
Controlling for numerous factors, including students’ backgrounds, the researchers found that the value-added scores consistently identified some teachers as better than others, even if individual teachers’ value-added scores varied from year to year. Nonetheless, many other factors which occur in classroom are ignored by the researchers.
One sentence: “If you leave a low value-added teacher in your school for 10 years, rather than replacing him with an average teacher, you are hypothetically talking about $2.5 million in lost income,” said Professor Friedman, one of the coauthors.
That “hypothetically” bothers me greatly. The difference between hypothesis and hyperbola is not large. Nonetheless, I am absolutely sure that a good teacher has a positive impact on students. The issue here – I stress again – is whether any model of standard student test can be used, not in theory but in the real classroom, to measure teacher effectiveness and more importantly, whether such testing is worth the cost in both time and money.
After identifying excellent, average and poor teachers, the economists then set out to look at their students over the long term, analyzing information on earnings, college matriculation rates, the age they had children, and where they ended up living.
They found a direct proportionality between good teachers and successful adults. The entire study is below in PDF:
Here is the fundamental question that the Harvard Study claims to answer:
Does Value-Added Accurately Measure Teacher Quality?
“Recent studies by Kane and Staiger (2008) and Rothstein (2010) among others have reached connecting conclusions about whether VA estimates are biased by student sorting (i.e., whether Assumption 1 in Section 2.2 holds). In this section, we revisit this debate by presenting new tests for bias in VA estimates.”
Another point that the Harvard Study claims to correct:
“This is the reason that Rothstein (2010)
fth grade teachers, whose students have had above average
fourth grade gains, have systematically lower estimated value-added scores than teachers whose students underperformed in
the prior year.”
That was another discovery among the 18,000 reports published last Sunday in the NYT. In other words, a teacher who has excellent students in one year, will necessarily under-perform in the next.
BTW, this was page 21 of the HS.
This is a second entry:
“One important caveat to these calculations is that they assume that teacher effectiveness does not vary with classroom characteristics. Our estimates of VA only identify the component of teacher quality that is orthogonal to lagged test scores and the other characteristics that we control for to account for sorting. That is, teachers are evaluated relative to the average quality of teachers with similar students, not relative to the population. Thus, while we can predict the effects of selecting teachers among those assigned to a sub-population of similar students, we cannot predict the impacts of policies that reassign teachers to randomly selected classrooms from the population (Rubin, Stuart, and Zanutto 2004). This is a limitation in all existing value-added measures of teacher quality and could have signi cant implications for their use if teaching quality interacts heavily with student attributes. Lockwood and McCa¤rey (2009) argue that such interactions are small relative to the overall variation in teacher VA. In addition, our estimates based on teaching staff changes suggest that VA is relatively stable as teachers switch to different grades or schools.”
“Nevertheless, further work is needed on this issue if a policymaker is considering reassigning teachers across classrooms and seeks a global ranking of their relative quality.”
This paper has presented evidence that existing value-added measures are informative about teachers long-term impacts. However, two important issues must be resolved before one can determine whether VA should be used to evaluate teachers. First, using VA measures in high-stakes evaluations could induce responses such as teaching to the test or cheating, eroding the signal in VA measures. This question can be addressed by testing whether VA measures from a high stakes testing environment provide as good of a proxy for long-term impacts as they do in our data. If not, one may need to develop metrics that are more robust to such responses, as in Barlevy and Neal (2012). Districts may also be able to use data on the persistence of test score gains to identify test manipulation, as in Jacob and Levitt (2003), and thereby develop a more robust estimate of VA. Second, one must weigh the cost of errors in personnel decisions against the mean of the benefits from improving teacher value-added. We quantifi ed mean earnings gains from selecting teachers on VA but did not quantify the costs imposed on teachers or schools from the turnover generated by such policies.”
“As we noted above, even in the low-stakes regime we study, some teachers in the upper tail of the VA distribution have test score impacts consistent with test manipulation. If such behavior becomes more prevalent when VA is actually used to evaluate teachers, the predictive content of VA as a measure of true teacher quality could be compromised.”
This ends page 50 of the HS. I suggest you go through the entire study, and form you own conclusions.
Testing in Europe does count for much more than it does in the United States. Students are sent to a variety of schools after taking a test around the end of what would be our middle school grades. Those that aren’t found to have the academic drive and ability are not sent forward to an academic high school. They are sent to trade school or guided to apprenticeships. We may be the only developed nation that strives to place every child in an academic high school. Even some of our former county vocational schools have left their original tracks and become places with medical academies and such.
Do we have a much bigger problem with our education system than the few poor teachers? Trying to educate everyone in higher level math classes holds back those that can excel. State testing costs millions of dollars and only tests two, though critical subjects, language arts and math. A lot of high-power people with vested economic interests and political connections are tied into this and reaping the millions.