What is at stake with Testing (Part 3 – Teachers)

Tests with “high stakes” are those that have significant consequences (positive or negative) for students, teachers, or schools.  In my first post, I linked the origin of the phrase to the accountability movements of the late 1990s and 2000s.

Next, we looked at the PSSA and the stakes for students.  My conclusion was that the stakes for students are low.   Today we look at the what our educators have at risk with PSSAs.

PSSA – Teacher ‘Stakes’

For our teachers, there is the potential for real impact.  If teacher performance is linked to PSSA results, then it is possible the a teacher might experience financial, career, and reputation impact from the PSSAs.   But to determine the actual impact, we need to understand the inner-workings of the our “teacher effectiveness system”.

Teacher Evaluation SystemIf you thought local schools had the autonomy to evaluate teachers in their own way, you would be wrong! Pennsylvania has a state-mandated teacher evaluation system that was established under Act 82 in 2012.   The stated purpose of the system, as evidenced by its name, is to improve Teacher Effectiveness.   How is this to be accomplished?  According to our lawmakers, it is done by explicitly connecting teacher performance ratings to student learning, as evidenced by standardized test results.

As seen in the chart, up to 30% of a teacher’s annual performance evaluation comes from students’ PSSA test scores:  15% comes from the building-level results, and up to 15% comes from the teacher’s own results if the teacher instructs in a tested subject area and grade. (Note that ~80% of teachers do not teach in tested subject areas and grades.)

Under Act 82, a teacher is also evaluated on other dimensions, including classroom observations across four practice domains (50%), and student learning objectives (20%).

This evaluation system can be critiqued on many dimensions (I will save that for a future post).  For the purposes of this discussion, let’s accept the evaluation system as a given and assume that it is impartially administered, which I think is a reasonable assumption.

An Uncertain Outcome?

There are two ultimate performance ratings that a classroom teacher can receive:  satisfactory or unsatisfactory.  And it is critical to understand that only an unsatisfactory rating leads to differential consequences.    PSSA Teacher Stakes

What is the probability of receiving an “unsatisfactory” rating under this evaluation system?   For that, we only have to look at the ratings actually received.   In 2013-14, 98.2% of PA teachers were rated ‘satisfactory’.  If we exclude charter schools and focus only on public school teachers, that percentage rises to 99.8% (source).   In other words, 998 out of every 1,000 classroom teachers received a satisfactory rating under this system.  Remember that this result includes a 15% to 30% weighting of PSSA results.

Why are ‘unsatisfactory’ ratings so rare?   The mechanics of the evaluation system, with section scores and weightings, makes it very hard to receive any rating other than ‘satisfactory’.  A teacher must receive low scores on each of the four measured dimensions in order to cross the ‘unsatisfactory’ threshold.   (See Classroom Teacher Rating Form.)    So even if a teacher has failing PSSA result, they must also fail in almost all other dimensions to be rated ‘unsat’.  (Whether this is a ‘bug’ or a ‘feature’ is up for debate.)

In a school district like UCFSD, where building-level SPP results are exemplary  almost every year, it is even more difficult for a teacher to receive an “unsat” rating.  Mathematically, our teachers could receive “failing” ratings in PSSAs as well as 5 of 6 other dimensions, and still achieve a very-low-but-still-satisfactory rating:

PDE Teacher Evaluation

Lest we focus only on the negative situation,  it should by now also be obvious that teachers whose students excel on the PSSA will only receive the same “satisfactory” ratings as other teachers. Although there are three gradations of “satisfactory” within the evaluation system (needs improvement, proficient, and distinguished), each of these ratings translates to the same single ‘satisfactory’ box at the bottom of the performance review.  So great student PSSA results can lead to higher underlying performance ratings, but do not change the final rating.

I will note again that I am merely describing how the evaluation system functions, not offering any judgment of whether this is an effective or ineffective way to evaluate our teachers.  That is a complex discussion for another day.

But I think the way the evaluation system functions today is clear:   there is a very, very low probability of the PSSA affecting any teacher’s annual performance evaluation, positively or negatively.

Is Pay at Stake?

But let’s assume (contrary to the mathematical realities of the teacher evaluation system) that somehow the PSSA is the deciding factor for a teacher, pushing his or her rating down into the “unsat” range.  What are the consequences of receiving a low rating?

Any compensation consequences are governed by local Collective Bargaining Agreements (CBA).  And any employment consequences are governed by the state teacher tenure system.

Pay for PerformanceAs far as pay is concerned, there are no consequences for a “unsat” rating.  Performance is not a dimension of teacher compensation under almost all CBAs, including UCFSD’s.  As stipulated in the collective bargaining agreement, salaries grow with years of service and as advanced degrees and graduate school credits are earned.   Performance ratings, whether satisfactory or unsatisfactory, do not change pay progression through the salary scale.  So there is no impact to pay from an “unsatisfactory” rating.  Any contractual salary increases due to ‘steps’ or ‘prep’ are provided to “sat” and “unsat” teachers alike.

Is Employment at Stake?

However, an “unsat” rating can have consequences for continued employment.  Under the tenure system in Pennsylvania, a teacher may be dismissed after receiving two consecutive “unsat” ratings, at least 4 months apart.  However, by law the second unsatisfactory rating must be based on classroom observations only (not the teacher evaluation form and not student test results) and be supplemented by a detailed narrative (source).  So test results cannot be used to derive the 2nd “unsat” rating.   Therefore student test results could theoretically start the ball rolling toward dismissal, but are forbidden to be the grounds for the ultimate dismissal.

On the other end of the performance spectrum, great PSSA results bring a teacher few (if any)  positive consequences.  Unlike in the private sector, high performance does not result in faster promotion, extra pay increases, or performance bonuses.  And there is no prestige benefit, as results are not shared with the public.

In the private sector, performance ratings are sometimes used during downsizings and layoffs to determine who to layoff first.   Is there a risk of PSSA results being used as the deciding factor to either keep or lay off a teacher during a budget crisis?   No — Pennsylvania is one of nine U.S. states that uses a “last-hired, first-fired” seniority-based layoff system.  Years of service is the only factor considered, not job performance (on PSSAs or on any other dimension).

Is Reputation at Stake?

Teacher PSSA results are not available to the public, so there is no reputational risk to individual teachers from test scores.   Of course the administration and principals do see the results, and do discuss these results with teachers.   From what I hear anecdotally, most teachers take this feedback seriously and in the spirit in which it is intended:  helping professionals grow to better serve students.   Is this a “consequence” of testing?   In my view this kind performance feedback is routine across almost all modern organizations, and it does not rise to the level of a significant “consequence”.

Student test results and a teacher’s own PVAAS score can, of course, have an emotional impact on a teacher.  Those who receive higher PVAAS scores may rightfully take pride and encouragement from those results.  And some with lower scores may feel discouraged, or (with some justification) take issue with the accuracy of the result.  But these are personal reflections conducted in private.   Teachers get feedback all of this time, from students, parents, administrators, peers, and from their own inner voice as they self-evaluate.  One’s own PVAAS report is another piece of feedback.

It is often argued that the reputation of teachers as a whole is at stake when entire schools are rated under NCLB, based on test results.  For example, if a school is rated as ‘failing’, how does that make individual teachers feel who teach at that school?  While I don’t see the label itself as a high-stakes consequence, I do agree that there is a reputational consequence felt by individual teachers (and especially administrators) when the label is applied to an entire school.  On the flip side, teacher’s in high-rated school districts like UCF enjoy the benefits of the school’s high reputation, which are evidenced through PSSA (and other) metrics.


So for teachers in Pennsylvania, the stakes of PSSAs seem fairly low.  Student test results almost never impact individual teacher performance ratings.  Test results do not (by themselves) affect one’s employment.  Test results never affect one’s pay.  And the results do not affect one’s individual reputation, although the reputation of groups of teachers can be impacted by the overall results achieved in specific grades, schools, and districts.

The stakes of PSSAs and NCLB are (by design) greater then under the prior system.  Nevertheless, the stakes for educators are still fairly low.  This is especially true in UCFSD, where high student test scores remove the little down-side risk that is present under the Teacher Accountability System.