Testing season is once again upon us. Over the next two weeks thousands of Tennessee students will take their TCAP and End of Course exams, which will then be used to calculate their actual vs expected growth using models created at the University of Tennessee by William Sanders that predicts where students should end up based off of past growth and other factors such as income, ethnicity and poverty rate. I’ll be running a few pieces over the next couple of weeks looking at testing policies and providing some insights related to data and policies. This post in particular looks at the power of individual variables to predict student test scores.
In general I support the construction of and use of growth models like TVAAS. They are not perfect but when constructed well they do give us a reasonable predictive measure to compare against actual student growth. They should not be taken as the entire story but rather as one of many snapshots that can be used to evaluate student, teacher, school and district performance.
That said, many people think that its foolish to look at holistic measures like TVAAS because they think that the vast majority of test scores can be explained by one or two individual variables. While I can’t recreate the entire TVAAS system, I thought it would be interesting to do some simple analysis on the predictive power of some often cited culprits of low educational achievement here in Tennessee. I was able to find information on the following 5 for almost all districts in Tennessee:
- Percent of the population classified as economically disadvantaged
- Percent of the population below the poverty rate
- Per pupil expenditures
- Student teacher ratio
- Household income
Fortunately the state provides a large database of all past TVAAS scores for districts. Using this I pulled up the student 3-year-average growth data for every district in the state averaged all subjects together (EOC only, not TCAP). I then did a simple graph of comprehensive testing growth vs each individual category and inserted a trend line. Shown below are the results, followed by some brief observations
Here are my five key takeaways from this little experiment:
Every Category Contains Large Variances - The first thing you should see is a HUGE amount of variation regardless of the measure used. Districts with the same level in each graph can vary by as much as 15-20 growth points. This suggests that no one variable, such as poverty, income, class size or education spending should be considered the “silver bullet.” If there was, we would see the data conform to a much more linear trend line.
Measures of Economic Success have Some Predictive Power, but They Aren’t the Only Culprit - All three graphs (income, poverty and %economic disadvantaged) suggest that the economic success of a district does contain some predictive power in student growth. For example, an extra 100 dollars of income in a district is predicted to raise growth by 0.01 points, while a 1% increase in the poverty rate on average is predicted to reduce growth by 0.23 points.
That said, the impact is small in all three cases and considerable variation exists, suggestion that poverty alone is NOT the huge predictor of educational outcomes that critics of reformist policies often believe it is. We can’t simply say “its poverty” when this clearly shows a wide variation even specific poverty levels
Increased Student/Teacher Ratios Actually RAISE Scores - I found this one VERY interesting given that student teacher ratio and its cousin, class size, is often cited as a deterministic policy by both reformers and anti-reformers alike. I found the student teacher ratios online for each district by dividing students enrolled by teacher number (not a perfect measure but its the best I can do) and then graphed them against growth rates.
What we find is that for every extra student per teacher is predicted to increase growth by 0.26 points. I plan to do some additional digging on this one to see what the research suggests about class sizes, but it should be noted that no district reported class size above 22 students, so perhaps they simply hadn’t reached the threshold for student/teacher ratio where at growth starts to drop.
Expenditures Per Pupil Do Count – Every extra dollar spent increased growth by 0.0006 points. This might not seem huge, but when you extrapolate it into the thousands of dollars, a $1000 increase in expenditures is projected to increase student test scores by 0.6 points, and a $3000 difference means a predicted 3 point difference in growth. Again though, we see a wide variation even between districts with similar scores, so its difficult to draw any definitive conclusions.
No Singular Culprit Exists - So what can we take away from this? Hopefully this data makes the point that there is no one clear culprit for poor performance in our school districts. All of them play a role in improving our schools and we are best when we come up with a comprehensive plan to address each of them. It also suggests to me that we should be wary when anyone claims to be able to tell you the silver bullet to fixing education.
Please also feel free to post in the comments if you would like to see any other variables graphed against student growth scores! I picked out these five because they are easy to find and often cited as crucial players in educational inequity, but I’m willing to take a look at others as well. Ones that I would have loved to run but couldn’t find data for are; growth vs teacher evaluation scores, growth vs percent of kids enrolled in PreK and growth vs charter school concentration.
[Update 4/28/14, 4:00 PM: thanks to Nashville Native for pointing out that I didn't include my data sources. Here are my sources for each individual measure:]
Percent economically disadvantaged
Student/Teacher Ratio, poverty rate and 3-year growth averages can all be found at the state of Tennessee’s website for multiple years at this page. I pulled the data several months ago and when I just tried again the links weren’t working unfortunately, but you can find it all here
You can find similar growth data at this portion of the Department of Education’s website (not in excel format unfortunately)
Follow Bluff City Education on Twitter @bluffcityed and look for the hashtag #iteachiam and #TNedu to find more of our stories. Please also like our page on facebook
Categories: Testing
Since most people who are calling for lower class sizes are calling for it in the early grades (K-2), your graph isn’t particularly relevant. I would love class sizes of 15 in the early grades, and around 25 or 30 in the high school grades. Since your data can’t distinguish between the ratios in early versus later grades, it’s not useful. Look at the research, such as the Tennessee STAR study, performed by professional researchers, to see whether research shows lower class sizes increase student achievement in the lower grades (they show that they do). https://www.princeton.edu/futureofchildren/publications/docs/05_02_08.pdf
My graph says nothing about class sizes in the earlier grades and is only relevant to later grades. I believe I make it clear that I use only high school data early in the piece.
Well, I’m glad I was able to make it a little clearer for your readers.
I do think that its interesting to note that my class sizes include the entire district, K-12. The main conclusion I think that we could draw is that in the long run, a coverall policy of simply hiring more teachers isn’t going to be enough in itself to bridge the gap between low and high performing schools.
I also find it interesting that the only way we can recognize the impact made by smaller class size is through the early grade standardized tests you seem to despise. Maybe tests in the early grades have a role to play after all…
Yes, non-high stakes tests have a role to play. I am against using the results in a high stakes manner. I hope that clarifies my position for you (I have told you this before…).
So in theory you’d be ok with a yearly test in K-2 as long as the results weren’t used in a high stakes manner?
No, I wouldn’t. Would you like to know why?
Up to you.
Also, poverty level IS the the huge predictor of educational outcomes when you look at achievement, not growth. Achievement is the end goal, isn’t it? So that is what we should focus on in this particular discussion, not growth.
Achievement should be the long term goal but in the short run I’m more interested in growth because I think it represents a more fair measure of the value a teacher, school or district has added to students. looking at achievement on an absolute scale may not capture the full impact of what has been added to students during the course of a year. This is why I chose to focus on growth.
Check this out: https://edpolicy.stanford.edu/sites/default/files/publications/getting-teacher-evaluation-right-challenge-policy-makers.pdf Highly relevant to our discussion.
From that article: “The research base is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools”
Page 5:
“For all of these reasons, most researchers have concluded that VAM is not appropriate as
a primary measure for evaluating individual teachers”
Emphasis here on “primary.” They don’t discount using it in some capacity. Hence why I specify that VAM is a snapshot and should be one of many measures, along with classroom observations, measures of professional development, student surveys, etc. Personally I don’t think it should be more than 33% maximum and in all fairness should be lower. Additionally, non-tested teachers shouldn’t have to take school-wide data, we need to come up with alternative methods like the portfolio system for art teachers created here in Memphis.
“In general, such measures should be used only in a low-stakes fashion when they are part of an integrated analysis of what the teacher is doing and who is being taught.”
This report, written by experts in statistics, does not support your position. In fact, these experts are firmly against what you just suggested.
If we’re going to trade quotes from studies I think the one from the RAND study best states my position (found here http://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf):
“It is not clear that VAM estimates would be more harmful than alternative methods currently being used for test-based accountability.” They do recommend further research to identify alternatives to VAM, but right now its the best direct measure we have of teacher impact on student educational outcomes and can be used in a limited amount.
I think the argument is more or less settled on the use of VAM for the evaluations of teachers, schools, or district. It’s junk. Too many scholarly organizations have presented arguments against it. The major red flag, for Tennessee, is that TVAAS has never been available for open scrutiny and peer review as it’s a proprietary measure that costs the state 1.7 million a year.
Thanks for the comments. While I would argue that VAM is junk, I don’t dispute that it has its flaws. I think this section from an article supporting use of VAM data in limited amounts from Brookings states my position well:
“We believe that whenever human resource actions are based on evaluations of teachers they will benefit from incorporating all the best available information, which includes value-added measures…It [VAM] is not a perfect system of measurement, but it can complement observational measures, parent feedback, and personal reflections on teaching far better than any available alternative. ”
That said I would like to see the State of Tennessee open up the formula to scrutiny if for no other reason than I want to see the variables that they use and the weights they assign to each one.
http://www.brookings.edu/research/reports/2010/11/17-evaluating-teachers
What I hear is “It’s not great, but it’s better than nothing and better than anything we’ve seen thus far.” Which, to me, is not a solid argument for keeping something that has been proven to be unreliable, inconsistent, and thus invalid and therefore junk. Conversely, here’s several for you including NEPC at Colorado, Stanford, the NAE, UCLA the American Statistical Association and numerous blogs.
http://bitly.com/bundles/rdsathene/9
The NAE link is bad, here it is
https://www.yumpu.com/en/document/view/1760302/background-paper-getting-teacher-evaluation-right
I found this while looking at “follow the money” blog post concerning Teach Plus, so I didn’t compose the list.
http://www.schoolsmatter.info/2014/04/the-gates-foundations-teach-plus-once.html
Also, you need a source for your data.
Ah, I did forget to include that didn’t I. I’ve included links to the sites where I pulled data from in an update at the bottom of the post.