Bad Research Methods or Just Bad Science?

March 3, 2016; Washington Post and Harvard Gazette

Two Harvard professors recently announced the release of a report to debunk what has become a widely spread study questioning the replicability of published psychological studies.

The Open Science Collaboration (OSC) released a multi-year study in 2015 that was reported to cause significant impact on the field of science. According to the Harvard Gazette, the publication led scientific journals to change their policies and grantmakers to make shifts in funding priorities. The story created “sensational headlines worldwide about the ‘replication crisis’ in psychology’” and the story was Science magazine’s “No. 3 Breakthrough of the Year.”

The OSC is supported by the Center for Open Science, a nonprofit organization with a focus on increasing transparency in research. According to their website, one of their services includes meta-science, which serves as a platform to demonstrate reproducible research methods. The COS’s meta-analysis initiative itself is interesting. For the psychology replication study, 270 expert contributors all agreed to participate in what looks a lot like crowdsourcing for science. The study was listed by OSC as the first in-depth exploration of its kind.

The outcomes from the OSC report drew headlines when less than 40 of 100 published studies that were pulled from three psychology journals were successfully reproduced. However, when the OSC study was reviewed closely, professors from Harvard, along with a professor from the University of Virginia, found serious concerns with the OSC’s research methods.

Probably most alarming was the replication of a study originally conducted at Stanford. The study evaluated the reaction of white students as they watched a video of a black student and three white students discussing admission policies. During the video, one of the white students began to make offensive comments about affirmative action. Researchers found that the observing students looked longer at the black student when they believed he could hear the comments versus when the comments seemed to only be heard by the white students.

The Stanford study was one of the 100 published studies chosen by OSC for replication, but when the OSC team reproduced the study, they conducted the replication in Amsterdam. Daniel Gilbert, who is the Edgar Pierce Professor of Psychology at Harvard and one of the professors who critiqued the OSC studies, said, “They had Dutch students watch a video of Stanford students speaking in English about affirmative action policies at a university 5,000 miles away.” In other words, the students being studied were listening to students in another country and being tested on a topic that does not have relevance to their country.

The Harvard team averred that other studies were not replicated precisely and the OSC did not factor in the statistical error necessary to account for the variance. The team confirmed that when replication of the studies strayed from the original methods used, the failure rate was higher. Gary King, the Albert J. Weatherhead III University Professor at Harvard, who was also one of the professors reviewing the study, said that when the OSC determined their calculations, “they failed to consider that their studies were quite different.”

The team believes that the outcomes would have been successful if conducted correctly, or if the OSC team had accounted for this statistical error. This was listed among several concerns in processes, including the lack of obtaining a true random sample.

The journal Science published a response to the Harvard report on Friday, saying that the belief of the professors that the reproducibility of the studies is actually high is likely very optimistic and “limited by statistical misconceptions.” It goes on to say that neither optimistic nor pessimistic conclusions about reproducibility “are possible” or “yet warranted.” Apparently, it’s still unclear whether the problem is that journals are too quickly publicizing bad science or that we need to improve methods of replication to better test our studies.

Ironically, the OSC’s past research has illustrated that different analysts can evaluate the same data and come to different conclusions, which seems to be the case in the psychology replication debate. It does appear that experts are challenging the work of experts who challenged their work. While the field of psychology cannot afford to have misguided or unwarranted accusations due to false research, if done correctly, this seems to be a healthy direction of debate, collaboration and testing of models. Ideally, those involved will be learning from successes and leveraging lessons learned.—Michelle Lemming