In Part One of this essay, we explored the basics of the
Scientific Method and demonstrated how various scientific fields are constantly
changing their notion of settled science as theories and hypotheses are refined
and technology for testing those hypotheses improves. In Part Two, we now move
on to examine the ways in which errors, miscalculations and deliberate
deceptions contribute to upsetting settled science.
* * *
Scientists are human, too
Yes, scientists are human, with all of the faults and
foibles that implies. The example of the "Bone Wars" between
paleontologists Cope and Marsh should tell us that. While popular culture
prefers to paint Galileo as persecuted by the Church for his science -- indeed,
consequently founding a counter-religious illuminati of scientists -- careful
study of history reveals that Galileo was not "persecuted" for his
beliefs, but rather he was sanctioned by Rome for his personal actions in
defiance of a church order of which he was a member. We certainly have plenty
of parallels today in which it is easy to point to scientists whose behavior
casts a shadow on their own work. Of course, there are a few factors which tend
to assist the process of self-destruction.
The problem of "Publish-or-Perish"
The essential currency of an academic scientist consists of
two items: how many papers they publish, and how well-funded their research.
While many scientists would love to have a job where all they needed to do was
conduct experiments with no obligation to fight for promotion and funding, the
simple truth is that any job must be evaluated by some form of a performance
metric. Within most scientific jobs, that metric consists of having other
scientists evaluate your work and pronounce that it is good. Typically, this
consists of writing up results (and conclusions), submitting them to a
scientific journal, obtaining a favorable review by peers, and then having that
paper published in a journal where others can read it. Most of the evaluation
of "worth" comes from the peer-review process (and more on that
later), since, once published, any confirmation or refutation of the
experimental results must take the form of letters to the editor, or new papers
which agree or disagree with the published results. Letters to the editor are
in fact very rare in science -- not that they are there, but that the number of
letters compared to the number of published papers is really very small (not
all journals accept letters, and even then, there may be 1-2 per issue, while
the number of new papers is often 20-50 per issue).
Studies which produce results and conclusions counter to
those always published must overcome the prior results in both numbers (how
many published papers cite the result or the refutation) and the "Impact
Factor" of the journal in which the study appears. Much as certain
newspapers have reputations based on circulation and the type and number of
articles they print, scientific journals have a similar ranking system based on
a weighted ratio of the number of subscribers divided by the number of
citations of the articles they publish per year. Thus, it is not just how many
articles are published, but how many are read and subsequently cited by other
authors (in other papers and journals). This ratio gives a sense of the
relative impact that a journal has compared to others in its field. Thus an
article in Science or Naturehas 2-5 times the impact factor of an article in
Journal of Neuroscience, and 10-20 times the impact factor of an article in an
open-access, open-review journal such as Frontiers in Neural Science. Any
countervailing research published in a lesser impact journal is much like a
battle of King of the Hill, and requires either repeated publication or getting
the countervailing results into a similar high-impact journal.
Scientists are thus rated on the number of papers they
publish, the impact factor of the journals in which they publish, and
indirectly (via impact factor) how often their research is cited. At research
foundations, corporate laboratories and government agencies, publication is the
main tool used to assess the productivity of a scientist. In academia, there is
one other factor: funding. Within colleges, universities and medical schools,
some or all of a professor's salary may be "hard money" (paid by the
institution) and some "soft money" (paid from research grants). In
fact, almost all salaries in research below the Professor level are paid to a
large degree from the research funds and the greater the scope of research
project, the more funding sources required to support it. Research grants go
through a peer review process similar to research publications; therefore the
number of funded grants is also used as an evaluation tool for salaries and promotion
of academic scientists.
Thus we come to the publish or perish dictum. The
requirement for published papers and grants varies from place-to-place and also
depends on the scientist's status: full professors with tenure need not worry
as much as assistant professors without tenure. I have seen one recommendation
that an academic scientist should publish one paper per year per person in the
lab. Thus, a lab consisting of the investigator, one post-doc, two students and
a full-time technician would want to publish 5 papers (or 4 papers plus one
new, funded research grant) per year; with more credit given for
high-impact articles. Personally, I feel that a lab which tries to publish so
much runs a grave risk of errors, although it is true that larger labs have
more opportunity to publish than small labs, so perhaps some variation on the
rule is appropriate. My own preference has long been for one-two moderately
placed (in terms of impact factor) papers per year, plus presentations at two
scientific meetings a year. Still, given that it takes several weeks of writing
just to produce one paper, plus time for the reviews and revision, publish or
perish soon involves more time writing than researching, and all too often,
rushing results into print before they are fully analyzed.
Accidents occur, and scientists are not immune from them.
Hopefully, errors are caught in the review process; it has certainly happened
to me, and I've caught many errors as a reviewer. Too much pressure to publish
too often (or simply rushing the process), can lead errors that must later be
corrected, either through published retraction, or simply by other lab(s)
finding and reporting to differing results. No scientist truly wants to get a
result published, and then find out later that the results were not valid due
to a decimal point error in statistics... except when the errors are
deliberate...
Lies, Damned Lies, and Statistics!
Since scientists are human, there is always the chance that
one of them (us!) will deliberately manipulate data and or results --
particularly through the use of statistics. Those readers with a background in
statistics will know that the primary use of statistical tests is to determine
whether two sets of observation are different when it is difficult to determine
by other means. Certainly, some scientific results can be clearly determined
without statistics. For example: "Roses are red, violets are blue..."
embodies an observation with a clear difference between the two cases. Ah, but
it is not always so simple. Violets are, in fact a shade of purple, although
statistically speaking, if one were to measure the color hues of violets, it
might be shown to overlap with blue. As scientists, we would phrase the
statistical question as follows: Does the color variation of the population of
violets include the color blue? We further qualify the question to: Is it
likely that if 95% of all violets in existence (also known as the
"population" of violets), they would include the color blue? This,
then, it what is known as a "P < 0.05", or a probability limit of 5%. In other words, does the presence of the color blue fall within 95% of the population of violets, or with the "outliers" comprising the other 5%? Statistical comparison provides a means of answering that question of whether two conditions can produce the same result given the normal, random variability of natural systems.
Now of course, when we get to roses, the situation is very
different. While violets, by definition, have a fairly limited color palette,
roses have a very broad palette -- from white to black, and nearly all colors
in between. Thus in answer to the question: "Are roses red?" We can
say yes. Likewise with: "Are violets blue?" "Yes." Now we
can also look at the additional question: "Are violets red" and, with
a statistical likelihood of at least 5%, answer "No, violets are not red (P < 0.05)." In other words, we accept the hypothesis: Violets are not red. But, "Are roses blue?" Here we have a problem, because the 95% population of rose colors does include the color blue, and we reject the hypothesis: Roses are not blue.
Thus we come to the crux of the problem –- even without
deliberate malfeasance, it is all too easy to misrepresent the results of
scientific experiments when the statistical tests give ambiguous results. From
a scientific standpoint, Roses are red (among other colors) and violets are
blue (to within a couple of shades); in addition, violets are not red (P < 0.05), but roses are not "not-blue." Note that these "statistical" results are dependent on how thoroughly the scientist samples the populations for their test. If we sampled only American Beauty roses, then indeed, roses would be red and not blue, and our statistical confirmation would be valid, but only for the population of flowers we sampled.
A Lack of (Statistical) Power
A paper on this very issue of statistical misuse entitled:
"Power failure: why small sample size undermines the reliability of
neuroscience," by Katherine S. Button et al. appeared in the April 10,
2013 issue of Nature Reviews, Neuroscience, a quite respectable journal
featuring reviews in the field of neuroscience. This report reviewed the
statistical tests reported in Neuroscience research papers from 2011, and
concludes that in the data they sampled, the statistical tests from those
studies were very likely to either accept a hypothesis as true –- when it was
not -– or miss confirming a true hypothesis.
The Nature Reviews article started with a literature search
for meta-analyses in Neuroscience published in 2011, and found 246 articles
that used "meta-analyses" -- essentially combining the data from many
prior papers and re-analyzing those larger data sets for observations that
cannot be seen in small data sets. A meta-analysis thus looks at data gathered
and reported across many primary publications -- original reports from a single
lab. The authors then sorted through those 243 articles for ones that provided
enough information on the original data to allow calculation of statistical
power in about 40 of those papers. Power functions uses the mean, or average,
for a population, a measure of variability, and also determined how large of a
difference can be reliably detected given a limited sample size. In essence, it
is a way of predicting whether a statistical test is, itself, valid. The
Statistical Power Function is the foundation of experimental design, and is the
basis for justifying how many subjects to test, and what is considered a
statistically significant result.
Button et al. concluded that many of the meta-analyses
papers did not have high functions of statistical power, and risked incorrect
interpretations of the statistical comparisons. However –- and this is very
important –- that conclusion did not apply this conclusion to the field of
Neuroscience as a whole. In short, headlines from April 2013 implying that the
study condemns an entire field of science, are false. In perspective, this
article in Nature Reviews Neuroscience sounds a cautionary note regarding the
need for better statistical planning in meta-analysis. What the article does
not do is state that all or even many Neuroscience articles have the same flaw.
In particular, given that this caution applies to making unwarranted
conclusions (and affects our notion of settled science), it again points out
that fact that scientific discovery is an ongoing process, and the very
announcement of settled conclusions sets the research up for scrutiny and
critique. It behooves us all to avoid the danger of schadenfreude by using the
result outside the scope of the study thereby committing the exact same error
pointed out in the paper. Meanwhile, there are other factors at work which
point out the good and bad with respect to scientific research, but frankly,
misinterpretation of statistics pales in comparison to deliberate deception.
The vaccine controversy
One of the more famous incidents of scientific malfeasance
involved a study from 1998 which showed a link between the measles vaccine and
autism. The data showing a causal link was taken from autistic children who had
received the "MMR" vaccine to prevent measles. However, when the
study could not be duplicated, investigation revealed that data was taken from
just 12 out of over 200 children available for the study. This procedure is
often called "cherry-picking" and is used to ensure small variability
within a set of data so that any statistical test come out exactly in a manner
predetermined by the experimenter. On this basis alone, the study was
invalidated and a retraction printed in The Lancet, a high-impact factor
journal for medical research.
This could have been ruled a mistake, given that
experimenters may "cherry-pick" data from human patients given the
ethical concerns of withholding a beneficial treatment, thus impairing the
ability to establish strict controls on the study. Under the circumstances, it
may have been necessary to severely limit the study population if there were
various other factors which could have contributed to the effects and confused
the findings. [Incidentally, this accusation is often raised against other
correlative health studies: smoking, diet, cholesterol, gluten, etc.]
Unfortunately, this case was not about simple misuse of statistics, for there
was a deeper thread of malfeasance. The lead author of the study, Dr. Andrew
Wakefield, had filed a patent for an alternate to the MMR vaccine. Furthermore,
a researcher working under Wakefield's supervision reported that there was no
measles virus present (hence no effect of the MMR vaccine) in the children used
for the initial study; while a former graduate student testified in court
proceedings that Wakefield ignored data which did not fit his hypothesis (that
MMR vaccine was linked to autism).
In the aftermath, Wakefield
resigned his hospital job in the United Kingdom ,
but was later censured by the UK Medical Research Council and banned from was
fired from his medical and research positions, barred from practicing medicine
in the U.K.
He has since moved to the U.S. ,
and despite admitting to the improper study, is still active in promoting the
link between MMR vaccine and autism. In a strange turnabout to the notion of
settled science, Wakefield 's supporters accuse
the medical authorities of U.S.
and U.K.
of the dogmatic approach and failing to acknowledge a link between vaccines and
various diseases and disorders.
If data can be "cherry-picked," statistics can be
misused, and hypotheses incorrectly rejected or confirmed, what are the
protections against scientific malfeasance? What guarantee is there that a
scientific report is valid, even if it goes against the conventional wisdom of
the field? At the same time, how do we tell if a study is making false claims?
The answer is, or should be, peer-review of scientific papers and proposals. A
panel of other scientists reads any submitted paper or grant proposal, reviews
the science for validity -- and recommends acceptance or rejection. At least,
that's the way it is supposed to work, but the process of peer-review has its
faults.
The problems of peer-review
[Disclaimer: I am a "peer-reviewer." I have over
100 scientific primary publications to my credit, and have been asked to review
scientific articles since 1989 and NIH/NSF grant applications since 1998. Thus
my opinion is shaped by 25 years in science participating in, and at the mercy
of, peer review; and my typical workload consists of request to review about
12-18 papers and 5-10 grants per year. While this is what some my field might
consider extensive experience with peer-review, it is also fairly limited in
that it is only within my field, and only with respect to research papers and
grants. By the way, my field required me to be the victim (excuse me – the
recipient) of peer-review for many years (7 in my case) before becoming a
reviewer.]
What drives doubts about the effectiveness of peer-review?
Here are some examples:
• Evidence
that the second generation anti-psychotics and antidepressants such as Abilify
are not as effective as they were shown to be in initial research and clinical
trials [The Truth Wears Off Is there something wrong with the scientific method? by Jonah Lehrer]
• Similar to
the above discussion of statistical power, a 2005 article in Public Library of
Science journal claiming that 50% of published research findings are false due
to statistical inadequacies [Why Most Published Research Findings Are False by John P. A. Loannidis]
• An article
stating that pre-publication peer review does not provide any guarantee of
lasting importance of scientific results
[Classical Peer Review: An Empty Gun by Richard Smith]
• A field of
opinion that pre-publication peer review serves only to limit publications to a
level that meets the print capacity of the available scientific journals. The
advent of on-line internet publication eases the space restrictions, so why not
publish everything and let the broader scientific community sort it out?
[Bulk Publishing Keeps PLoS Afloat by Phil Davis]
and
[Open Access 2.0: Access to Scholarly Publications Moves to a New Phase by Joseph J. Esposito]
• The highly
public "ClimateGate" scandal has reportedly shown abuse of
pre-publication peer-review to publish some articles and block others
[Lord Monckton’s summary of Climategate and its issues by Anthony Watts]
While I do acknowledge that there are some merits to the
points addressed above, I don't believe that peer review is broken per se, but I
do agree that the scientific community as a whole needs to police it better.
Violation of public trust by manipulating the system of peer review is an
egregious act. Sadly it is not unusual for a "good-old-boy" network
to operate in science. First there is the very nature of finding the peers to
review the paper. When a paper is submitted to a journal, the authors provide a
list of names of scientists (peers) who have expertise in the field and should
be able to judge the work on its scientific merits. There is often a second
list of person known or suspected to be biased by virtue of a conflict of
interest. Editors (and funding agencies) are alert to even the appearance of
bias either for or against the authors, but it is often the case that an editor
is unfamiliar with the details of the research and must rely on those
recommendations to choose the peer reviewer. Over the past year I have become
an editor of a journal in my field. It is hard to find enough reviewers willing
to take time out of their research to review papers. Good reviewers get heavy
workloads and many requests simply because they are so good (and available).
Fortunately, most scientists are aware of appearance of bias, and will be more
critical of their friends than a complete stranger. I, for one, try to ensure
that someone I know professionally does not get a "pass" on sloppy
science, since it also reflects poorly on me. In addition, recommending only
"friends" as reviewers won't work –- as an editor, I soon discovered
that only one in ten of the recommended reviewers will accept an assignment,
but that those turning it down will recommend someone else; thus, editors work
down the list until they find two-to-five reviewers (depending on the journal.)
One of the problems in peer review is the "not invented
here" syndrome. An article may be very well-written, but rejected by
multiple journals on the basis of "not appropriate (or too complicated)
for the readership of this journal." When reviewed by scientists with
traditional training within a field, such a paper may be subject to highly
critical reviews or unreasonable demands for additional experimentation or controls.
When that same paper is read by cross-disciplinary scientists, it may receive a
much more favorable (or even enthusiastic) reception. When added to the desire
to get a novel finding into print first and lay claim to a result (thus
upsetting settled science) it can be very frustrating to know that an outside
audience would publish in a heartbeat, while still no getting recognition from
one's peers!
This, however, is where the second and fourth bullet points
above interact. A new model of publishing embodied by the journal PLoS One
(Public Library of Science) is an online publication that does not make value
judgments on the appropriateness of an article, but will subject it an open
review by 2-5 peers whose name appears with the publication. Typical review is
by 2-3 outside reviewers, plus the editor, and is "blind" in that the
author never knows who reviewed the paper. The philosophy of PLoS One is to let
the scientific community sort it all out post-publication; with unlimited
space, there can be publication of every article that passes basic peer-review;
however, the scientific community will decide for itself what is worth keeping.
This is not an entirely bad approach, but it still has problems: (a) the supply
of reviewers is limited (see above), and (b) once released, there is no good
way to retract a publication later determined to be invalid. Thus, publishing
more while maintaining the peer-review process is not necessarily a winning
game. What if there were a way to reduce the burden of peer review by simply
publishing and letting "society" decide what results are worthwhile?
If all scientific publishing were done on the internet, and anyone wanting to
find a particular result just had to search for it, there would be the issue of
deciding which search results to choose: The most recent or the one with the
most links or comments. Simple comment count would also not be enough, since
those comments could entail a running argument of the pros and cons of the
scientific paper. If we institute a judgment of worth or a vote on the
acceptability of a scientific paper, we risk turning Science into a popularity
contest. Consider also the Wikipedia model: Should just anybody -- with or
without formal scientific training –- be able to edit our "WikiScience?"
By far, my strongest counter to claims that peer review is
broken and should be replaced (or scrapped) is that if there are no
gatekeepers, then there is no way to weed out the junk science. The continuing
measles/autism scandal is the perfect example of science by public acclaim; if
it had been subject to greater scrutiny, it may never have been published. In
addition, once it was published, it has been damnably difficulty to remove its
credibility from those who choose to believe. Do I think peer-review is broken?
I certainly think it has been warped; which is good for challenging any notion
of settled science, but is simultaneously dangerous in allowing science to be
subject to public whim.
It's a process, not a conclusion
I do not think that any portion of the scientific process
should be scrapped: from hypothesis generation, to statistical analysis, to
peer review. I do think it needs better watchdogs -- and those watchdogs are
the scientists whose job it is to always keep in mind that their job is to
continually renew the process of science, and never "settle" (pun
intended) for the easy answer or the sloppy science. If a scientist witnesses
abuse of the system, they should be able to speak out and not get shut out
because of political whim. When they find truly novel results, or results that
contradict the settled science, they need to be encouraged to publish the
novelty, correct their mistakes, and avoid the trap of thinking that a result
is a conclusion.
At the same time, the public needs to be better educated so
that they do not get told what to do be manipulative media politicians and yes,
scientists. I would be all for fully open access to science if the public were
educated enough to understand the basics to be able to tell what is and is not good
science. Unfortunately the reality is that there exists a high level in science
where only a very few people worldwide understand or even care. Only time can
judge the worth of such research, the rest requires an educated populace. As
long as there is *any* stratification within the populace based on education,
there will be those who must translate science to the masses, and become a
gatekeeper.
Unfortunately, the gatekeeper position can all too easily be
corrupted as we have seen. Any scientific conclusion which agrees with the
gatekeepers is too easily labeled as a "consensus," while dissenting
opinions are labeled as "fringe," "deniers," or even
"fraud." The section on scientific blunders in the beginning of this
essay certainly highlights the error inherent when new evidence and scientific
results comes along and relegate the former consensus position to the same
historic scrap heap and geocentrism. I will sometimes state that any two
scientists will produce three different scientific opinions. In even the
narrowest aspects of my research field –- with possibly a total of only 200
labs in the world which study the same aspect of Neuroscience –- it is
difficult to get even half of them to agree on any one theory. A true consensus
in the sense of agreement of >90% of scientists in that field would require
so many coincidences, that are mathematically extremely rare.
Internet memes and the love of science.
As stated above, there is a real need for better public
education in science. In fact, an old friend of mine just went to work for the
U.S. Department of Education in a program working to improve the Science,
Technology, Engineering and Mathematics (STEM) curriculum in schools throughout
the country. It is a daunting job, but frankly, it is not helped by the
tendency for people to latch onto internet memes such as the Facebook website
with the non-PG13 name: "I F---ing Love Science." Unfortunately, this
site and others like it, do more damage to the notion of real science (not to
mention perpetuating the false notions of settled science and consensus). The
IFLS site and the pictures they post are often the flashy, colorful end result,
and are more indicative of the skill of the graphic artist than the actual
science. They generally ignore the need for a deeper understanding of the
Scientific Method and the sheer mind-numbing tedium of experimental testing in
order to truly "love" science. While a well-coifed scientific pundit
in a tweed jacket (or an engineer in rumpled lab coat) is lauded by the media,
working scientists are often ignored or distrusted. Very few scientists have
publicists and make-up artists; science is not more "true" because
the experiments bubble menacingly, flash lights on complex equipment, or turn
pretty colors. Someone stated in an online discussion the IFLS memes don't
truly love science, they are merely "admiring its butt as it walks
by."
I can only hope that this exploration of why science is
always changing, and always refining itself, will lead to a stronger, better
educated public, resistant to error and fraud. Our greatest defense against
being fooled or misled –- by changing theories in science, by misinterpretation
or by pseudoscience -- lies with education. A true love of science and a better
understanding of the dangers of thinking that science is settled or represents
a consensus starts with knowledge.
Knowledge is power. Be powerful.
SOURCE: "Why Science is Never Settled", Part Two by Tedd Roberts
Tedd Roberts is the pseudonym of neuroscience researcher Robert E. Hampson, Ph.D., whose cutting edge research includes work on a "Neural Prosthetic" to restore memory function following brain injury. His interest in public education and brain awareness has led him to the goal of writing accurate, yet enjoyable brain science via blogging, short fiction, and nonfiction/science articles for the SF/F community.
See Also:
See Also:
No comments:
Post a Comment