Note from Jeremy: This is a guest post by ecologist and regular commenter Carl Boettiger. Data handling and analytical practices are changing fast in many fields of science. Think for instance of widespread uptake of open source software like R, and the data sharing rules now in place at many journals. Here Carl tries to cut through the hype (and the skepticism) about “Big Data” and related trends, laying out what he sees as the key issues and suggesting ways to address them.
On Tuesday the White House Office of Science and Technology Policy announced the creation of a $37.8 million dollar initiative to promote a “Data Science Culture” in academic institutions, funded by the Gordon and Betty Moore Foundation, Alfred P. Sloan Foundation, and hosted in centers at the universities UC Berkeley, University of Washington, and New York University. Sadly, these announcements give little description of just what such a center would do, beyond repeating the usual the hype of “Big Data.”
Fernando Perez, a research scientist at UC Berkeley closely involved with the process, paints a rather more provocative picture in his own perspective on what this initiative might mean by a “Data Science Culture.” Rather than motivating the need for such a Center merely by expressing terabytes in scientific notation, Perez focuses on something not mentioned in the press releases. In his view, the objective of such a center stems from the observation that:
the incentive mechanisms of academic research are at sharp odds with the rising need for highly collaborative interdisciplinary research, where computation and data are first-class citizens
His list of problems to be tackled by this Data Science Initiative includes some particularly catching references to issues that have raised themselves on Dynamic Ecology before:
- people grab methods like shirts from a rack, to see if they work with the pants they are wearing that day
- methodologists tend to only offer proof-of-concept, synthetic examples, staying largely shielded from real-world concerns
Well that’s a different tune than the usual big data hype[^1]. While it is easy to find anecdotes that support each of these charges, it is more difficult to assess just how rare or pervasive they really are. Though these are not new complaints among ecologists, the solutions (or at least antidotes) proposed in a Data Science Culture given a rather different emphasis. At first glance, the Data Science Culture sounds like the more familiar call for an interdisciplinary culture, emphasizing that the world would be a better place if only domain scientists learned more mathematics, statistics and computer science. It is not.
the problem, part 1: statistical machismo?
As to whether ecologists choose methods to match their pants, we have at least some data beyond anecdote. A survey earlier this year by Joppa et al. (2013) Science) has indeed shown that most ecologists select methods software guided primarily by concerns of fashion (in other words, whatever everybody else uses). The recent expansion of readily available statistical software has greatly increased the number of shirts on the rack. Titles in Ecology reflect the trend of rising complexity in ecological models, such as Living Dangerously with big fancy models and Are exercises like this a good use of anybody’s time?). Because software enables researchers to make use of methods without the statistical knowledge of how to implement them from the ground up, many echo the position so memorably articulated by Jim Clark that we “handing guns to children.” This belittling position usually leads to a call for improved education and training in mathematical and statistical underpinnings (see each of the 9 articles in another Ecology Forum on this topic), or the occasional wistful longing for a simpler time.
the solution, part 1: data publication?
What is most interesting to me in Perez’s perspective on the Data Science Institute in an emphasis on changing incentives more than changing educational practices. Perez characterizes the fundamental objective of the initiative as a cultural shift in which
“The creation of usable, robust computational tools, and the work of data acquisition and analysis must be treated as equal partners to methodological advances or domain-specific results”
While this does not tackle the problem of misuse or misinterpretation of statistical methodology head-on, I believe it is a rather thought-provoking approach to mitigate the consequences of mistakes or limiting assumptions. By atomizing the traditional publication into such component parts: data, text, and software implementation, it becomes easier to recognize each for it’s own contributions. A brilliantly executed experimental manipulation need not live or die on some minor flaw in a routine statistical analysis when the data is a product in its own right. Programmatic access to raw data and computational libraries of statistical tools could make it easy to repeat or alter the methods chosen by the original authors, allowing the consequences of these mistakes to be both understood and corrected. In the current system in which access to the raw data is rare, statistical mistakes can be difficult to detect and even harder to remedy. This in turn places a high premium on the selection of appropriate statistical methods, while putting little selective pressure on the details of the data management or implementation of those methods. Allowing the data to stand by itself places a higher premium on careful collection and annotation of data (e.g. the adoption of metadata standards). To the extent that misapplication of statistical and modeling approaches could place a substantial error rate on the literature (Economist, Ioannidis 2005), independent data publication might be an intriguing antidote.
the problem, part 2: junk software
As Perez is careful to point out, those implementing and publishing methods aren’t helping either. Unreliable, inextensible and opaque computational implementations act both as barriers to adoption and validation. Trouble with scientific software has been well recognized by the literature (e.g. Merali (2010), Nature, Inces et al. (2012), Nature), the news (Times Higher Education) and funding agencies (National Science Foundation). While it is difficult to assess the frequency of software bugs that may really alter the results (though see Inces et al.), designs that will make software challenging or impossible to maintain, scale to larger tasks or extend as methods evolve are more readily apparent. Cultural challenges around software run as deep as they do around data. When Mozilla’s Science Lab undertook a review of code associated with scientific publications, they took some criticism from other advocates of publishing code. I encountered this first hand in replies from authors, editors and reviewers on my own blog post suggesting we raise the bar on the review of methodological implementations. Despite disagreement about where that bar should be, I think we all felt the community could benefit from clearer guidance or consensus on how to review papers in which the software implementation plays an essential part and contribution.
the solution, part 2: software publication?
As in the case of data, educational practices are the route usually suggested to address better programming practices, and no doubt these are important. Once again though, it is interesting to think how a higher incentive on such research products might also improve their quality, or at least facilitate distilling the good from the bad from the ugly, more easily. Yet in this case, I think there is a potential downside as well.
While widespread recognition of its importance will no doubt help bring us faster software, fewer bugs and more user-friendly interfaces, it may do more harm than good. Promotion of software as a product can lead to empire-building, for which ESRI’s ArcGIS might be a poster child. The scientific concepts become increasingly opaque, while training in a conceptually rich academic field gives way to more mindless training in the user interface of a single giant software tool. I believe that good scientific software should be modular — small code bases that can be easily understood, inter-operable, and perform a single task well (the Unix model). This lets us build more robust computational infrastructure tailored to the problem at hand, just as individual Lego bricks may be assembled and reassembled. Unfortunately, I do not see how recognition for software products would promote small modules over vast software platforms, or interoperability with other software instead of an exclusive walled garden.
So, change incentives how?
If this provides some argument as to why one might want to change incentives around data and software publication, I have said nothing to suggest how. After all, as ecologists we’re trained to reflect on the impact a policy would have, not advocate for what should be done about it. If the decision-makers agree about the effects of the given incentives, then choosing what to reward should be easier.
[^1]: Probably for reasons discussed recently on Dynamic Ecology about politicians and dirty laundry.
In a recent issue of Limnology and Oceanography Bulletin, Stuart Hurlbert reviews (UPDATE: link fixed, venue corrected) the new (4th, 2012) edition of Sokal & Rohlf’s classic biostatistical text, Biometry (HT Carl Boettiger). The first sentence of the review gives you the flavor:
Reader be forewarned: were it allowed the title of this review would be “A readable but overblown, incomplete and error-ridden cookbook”.
Tell us how you really feel, Stuart! And to think that sometimes I worry if I’m too tough on other people’s work…
You should click through and read the whole thing. But if you’re not so inclined, here’s a brief summary of Hurlbert’s beefs with Sokal & Rohlf (the book, not the people; I’ll refer to the book as Sokal & Rohlf because that’s what everyone does). Hurlbert says his beefs apply to all editions, not just the most recent one:
- No coverage of experimental design, or sampling design of observational studies. Relatedly, and worse, incorrect or confusing implications about experimental design and sampling design. For instance, there are no formal definitions of key terms like “experiment”, “experimental unit”, “block”, “repeated measures”, etc. Worse, observational studies often are described using experimental terms like “treatment”, “control”, and “randomized block design”. This leads to serious confusion, even about matters as basic as what an experiment is.
- Too much emphasis on “statistical gimmickry” of little or no practical use, such as standardized effect sizes.
- Superficial, cookbook-type treatment of many procedures, with no conceptual framework for understanding why one might want to use those procedures.
- Incorrect, incomplete, and confusing coverage of other matters, from when it’s appropriate to use a one-tailed test, to whether to correct for multiple comparisons (Hurlbert apparently believes you should never do so, and so slams Sokal & Rohlf for insisting on this), and many more.
- Rigid adherence to Neyman-Pearson null hypothesis testing, at the expense of estimation and more refined, quantitative assessment of the evidence for or against any given hypothesis.*
The only value Hurlbert sees in Sokal & Rohlf is as a reference manual for the “recipes” for how to calculate various statistical procedures. He concludes by blaming the popularity of Sokal & Rohlf for what he sees as decades of poor statistical practice in biology. He also laments that no current biostatistical textbook teaches an appropriately-modern philosophy of statistics, in a clear way with a focus on principles, with no errors.
What do you think of all this? I have to say I found it kind of surprising, but not because I revere Sokal & Rohlf. I’ve mostly used it as a reference manual myself. I’d certainly never try to teach from it at any level, if for no other reason than it’s way too voluminous. I guess I always assumed, without really thinking about it, that it was always intended, and mostly used, as a reference manual. Was I wrong to assume that? And while I find Sokal & Rohlf old-fashioned in some ways (e.g., randomization, bootstrapping, and generalized linear models render classical non-parametric tests and data transformations largely irrelevant), that never really bothered me. The first edition came out in 1969; of course it’s going to be old-fashioned. And I don’t know that it’s fair to pick on Sokal & Rohlf and blame it for the purportedly terrible statistical practices of modern biologists, even though the book certainly is popular. Insofar as our statistical practices are terrible (and I don’t know if they are or not), there’s surely plenty of blame to go ’round. And can’t you also give Sokal & Rohlf credit for helping to encourage more biologists to use statistics in the first place? But I’ve never really thought about Sokal & Rohlf all that much, and I actually haven’t cracked it open in years, so I’m sort of a curious bystander here.
As an aside, I found it interesting that such vociferous criticism of Sokal & Rohlf came from someone from basically the same school of statistical thought. Hurlbert isn’t a Bayesian of any stripe, nor is he advocating for computationally-intensive methods, for instance. His criticisms of Sokal & Rohlf mostly aren’t criticisms of what the book sets out to do, they’re mostly criticisms of the book’s execution.
What do you think? Does Sokal & Rohlf deserve the criticism Hurlbert heaps on it? More broadly, what do you see as the biggest problems with how modern biologists teach and use statistics? And what textbook(s) should we be using in our courses in order to fix those problems? (Again, Hurlbert says there’s no biostatistics textbook that’s readable, strong on general principles, and error-free!)
My interest in this isn’t purely academic. I’m not just looking to grab some popcorn and watch proponents and detractors of Sokal & Rohlf argue. 😉 As I noted in a previous post, this fall I’m taking over teaching the introductory undergrad biostats course in my department. So for the first time, I need to think seriously and in great detail about exactly what introductory biostatistical material to teach and how to teach it. I’ve settled on a textbook (Whitlock & Schluter), and I have a tentative list of lectures and the major changes I want to make to the existing labs. But nothing beyond that. And even getting that far has required a lot of thought, in particular about precisely the issues Hurlbert raises. How much emphasis to place on general, unifying principles vs. coverage of specific tests. How much emphasis to place on black-and-white rules of good statistical practice vs. equipping students to make informed judgment calls. Etc.
It occurs to me that teaching biostatistics is something like teaching children good behavior. You start out by teaching kids black-and-white rules, like “don’t lie” and “don’t hit your sister.” And it’s only later that kids learn that good behavior often isn’t black-and-white. Sometimes it’s not only ok to lie (or to hit your sister!), it’s positively a good idea, morally. Heck, there are lots of tricky moral situations that you aren’t even taught about at all until you’re older. And that’s without even getting into competing, mutually-incompatible philosophies as to what good behavior consists of, and what makes it good! So you tell me–what should we be teaching our “kids” about biostatistics if we want to start them down the road towards responsible “adulthood”? (“Don’t fail to correct for multiple comparisons!”)
*Hurlbert actually thinks Sokal & Rohlf should’ve based their book on what Hurlbert calls the “neoFisherian” approach. I confess I’d never heard the term “neoFisherian”, which is Hurlbert’s own recent coinage. Hurlbert has a 2009 paper if you want to find out what he means by “neoFisherian” and why he thinks Neyman-Pearson hypothesis testing is so outdated that it should no longer be taught (UPDATE: link fixed). As far as I can tell, what Hurlbert means by “neoFisherian” doesn’t sound too far from Deborah Mayo’s notion of “error statistics” (which itself is actually not all that far from Neyman-Pearson, or even from some forms of Bayesianism). But it’s a little hard to tell because much of Hurlbert’s paper focuses on what seem to me to be rather nit-picky details of current practice (like conventions for reporting P values). Anyway, I think it would’ve been helpful for Hurlbert to briefly elaborate his own philosophy in his review, rather than just refer to it using a term of his own recent coinage.
About Jeremy FoxI'm an ecologist at the University of Calgary. I study population and community dynamics, using mathematical models and experiments.
View all posts by Jeremy Fox →