What’s at stake when we use gender-biased course evaluations

Women bear the brunt of structural discrimination and decreased upward mobility

In U.S. colleges and universities, upward mobility, teacher success, and student learning are at times evaluated based on a metric that does not evaluate teaching. Course evaluations similar to the UW’s are gender biased, but they still persist, even though they are responsible for determining so much, explained Dr. Mary Pat Wenderoth, who is a principal lecturer in the department of biology. Compounded with other factors in academia, female professors suffer the most from this system.

In an article in ScienceOpen Research, “Student evaluations of teaching (mostly) do not measure teaching effectiveness.” The study found that the student evaluations of teaching (SET) were biased against women, “in an amount that is large and statistically significant,” and this bias was significant enough to result in less effective instructors being rated higher and more effective instructors being rated lower. This system favors male professors based on perceived ability rather than concrete, evidence-based material.

In addition to this, the problem can be extrapolated to correspondingly affect people of color, non-heteronormative teachers, and teachers on the spectrum of physical ability. The lack of standardization in the system poses a huge problem when considering the upward mobility and representation of women in academia.

There have been studies done that suggest students in general are inaccurate evaluators. A study at Canada’s Mount Royal University of Canada, which explored the psychology behind course evaluations, discovered that there is almost not correlation in students being able to recognize their own learning and evaluate their teaching as being effective. The study provided definitive evidence showing the inaccuracy SETs. The study challenges the belief that the highest rated professors teach better; in fact, it proved that SETs are a “popularity contest.”

“All learners … are a very poor judge of [their] learning,” Wenderoth said. “In fact, asking students about their learning is kind of silly, because they are not the best judges of that. I think they are very good at knowing if I’ve made the classroom safe, if I’ve created a community of learners, that I’ve encouraged them to become independent learners.”

The validity of this meta-analysis can be seen on sites like RateMyProfessor. Most of the material and feedback on course evaluations and on websites like RateMyProfessor, are only judging feelings in effect. Professors who are “caring,” “humourous,” and “easy-graders” are rated the highest. This leads to reviews being skewed based on factors that are unrelated to whether teaching is effective or not.

“If you want to get a good rating on RateMyProfessor, it really helps to be white and tall and male and rich and hetero and speak with a loud voice; there are a lot of other things that help, but that’s a really good way to get a higher rating,” Dr. Ben Wiggins, manager of program operations in the UW biology department, said.

Wiggins pointed out how the use of RateMyProfessor was neither good nor bad: they still serve some valid purpose. Like course evaluations, the ratings show data about opinion only; “[students] use the available data to get where they want to go,” Wiggins said. There is some useful evidence that students can use such as, reading comments from students, and getting an idea of the class environment. However, Wiggins noted that the material that is presented is heavily skewed toward easy classes and ones that lack challenges. Other faculty echoed this too, explaining how it is reasonable to use bad data when it is the only data available.

“I’m sure I would [use RateMyProfessor], in the same way that I checked out [student comments at] the tables outside the library; as long as you take it as one data point … and [understand] that the data are outliers,” Colette Moore, an associate professor in the department of English, said.

While RateMyProfessor and course evaluations occupy a grey area in teacher evaluation, it is important to note the consequences of their use. Wenderoth and Wiggins both explained the importance of SETs in teacher promotion, review, and evaluation, at UW and other universities. In some universities they may be only form of teacher evaluation. The real issues relating to bias come in when considering the power they have and that they are not an objective measure.

Wenderoth explained the process, which is not UW specific, of receiving promotions. Teachers must submit their SETs; this material, in conjunction with teaching statements, a résumé, and other relevant material in their promotion packet, is reviewed by the next highest rank of faculty and an outside board of peers.

The biases in these individual categories, especially course evaluations, can do a significant amount of damage as they are what departments use to hire and fire teachers, effectively passing or failing them, Wenderoth explained. When considering the fact that course evaluations favor white male teachers, these evaluations, which may seem innocuous to students, are determinant of a lot more.

Course evaluations create roadblocks for female instructors to gain access to the higher-level teacher positions. SETs are only a piece of the puzzle which is part of the larger problem of structural gender bias in education, according to Wenderoth. At most universities, the highest position in tenured faculty is full professor followed by associate professor, and assistant professor. In non-tenured faculty principal lecturer is followed by, senior lecturer, and lecturer (descending in seniority).  

These positions are mostly white male dominated, meaning that those are the people who ultimately make the decision regarding who is promoted and who isn’t. Wenderoth explained how to even get to the level of full professor, female teachers have to work and overcome the biases in their course evaluations and the biases implicitly and explicitly expressed by their seniors.

“For the longest time we only had a couple full professors who were female —  during the seven years [of the hiring process] female professors have to work their butts off to get the publications, to get the grants,” Wenderoth said.

These roadblocks, such as grants, fellowships, journals, and other publications, show the breadth and the difficulty that comes with being a professor. The information of each professor is available to the reviewing committee, specifically the gender of the reviewee. When compounding all of the factors working against female teachers into the equation, the real problem of SETs is one of equity. The many hurdles in place provide evidence of why there is unequal representation in academia. Who wins and who loses when we have an inaccurate and unfair system and what good teaching are problems that are not mutually exclusive.

The current system of evaluations has been used for so long, and it is easy to continue using it. The SETs are more “sensitive to students’ gender bias and grade expectations,” and are used because of their ease.

According to Wenderoth, creating a more objective system with a more standardized rating requires time and energy and more comprehensive analysis of taught material. Creating an assessment rubric, a system of evaluating tests and take home work, and then processing all of this information are seen as steps in the right direction. The data and evidence-based evaluating needed is available and within the grasp of universities to implement.

“But, all of these things take money,” Wenderoth said. “Somebody has to do the classroom observations, somebody has to look at the exams and rate them, and train people to rate them, and then run them through the computer; it’s not a huge task but it’s not a minimal one either.”

Reach contributing writer Thelonious Goerz at

