To say that I have very strong feelings about standards and technology used in academic research would be a gross understatement. Our current research practices and standards for publication, sharing of data and methods, and reproducible science are embarrassingly bad, and it’s our responsibility to do better. As a gradate student, it seemed that the “right place” to express these sentiments would be my thesis, and so I poured my heart out into some of the introduction and later chapters. It occurred to me that writing a thesis, like many standard practices in academia, is dated to be slow and ineffective - a document that serves only to stand as a marker in time to signify completion of some epic project, to only have eyes laid upon it by possibly 4-6 people, and probably not even that many, as I’ve heard stories of graduate students getting away with copy pasting large amounts of nonsense between an introduction and conclusion and getting away with it. So should I wait many months for this official pile of paper to be published to some Stanford server to be forgotten about before it even exists? No thanks. Let’s talk about this here and now.
This reproducibility crisis comes down to interpretation - the glass can be half full or half empty, but it doesn’t really matter because at the end of the day we just need to pour more water in the stupid glass, or ask why we are wasting out time evaluating and complaining about the level of water when we could be digging wells. The metric itself doesn’t even matter, because it casts a shadow of doubt not only on our discoveries, but on our integrity and capabilities of scientists. Here we are tooting on about “big data” and publishing incremental changes to methods when what we desperately is need paradigm shifts in the most basic, standard practices for conducting sound research. Some people might throw their hands up and say “It’s too big of a problem for me to contribute.” or “The process is too political and it’s unlikely that we can make any significant change.” I would suggest that change will come slowly by way of setting the standard through example. I would also say that our saving grace will come by way of leadership and new methods and infrastructure to synthesize data. Yes, our savior comes by way of example from software development and informatics.
Incentives for Publication
It also does not come as a surprise that the incentive structure for conducting science and publishing is a little broken. The standard practice is to aggressively pursue significant findings to publish, and if it’s not significant, then it’s not sexy, and you can file it away in the “forgotten drawer of shame.” In my short time as a graduate student, I have seen other graduate students, and even faculty anguish over the process of publication. I’ve seen graduate students want to get out as quickly as possible, willing to do just about anything “for that one paper.” The incentive structure renders otherwise rational people into publication-hungry wolves that might even want to turn garbage into published work by way of the science of bullshit. As a young graduate student it is stressful to encounter these situations and know that it goes against what you consider to be a sound practice of science. It is always best to listen to your gut about these things, and to pursue working with individuals that have the highest of standards. This is only one of the reasons that Poldrack Lab is so excellent. But I digress. Given that our incentives are in check, what about the publications themselves?
Even when a result makes it as far as a published paper, the representation of results as static page does not stand up to our current technological capabilities. Why is it that entire careers can be made out of parsing Pubmed to do different flavors of meta-analysis, and a large majority of results seem to be completely overlooked or eventually forgotten? Why is a result a static thing that does not get updated as our understanding of the world, and availability of data, changes? We pour our hearts out into these manuscripts, sometimes making claims that are larger than the result itself, in order to make the paper loftier than it actually is. While a manuscript should be presented with an interesting story to capture the attention of others who may not have interest in a topic, it still bothers me that many results can be over-sensationalized, and other important results, perhaps null or non significant findings, are not shared. Once the ink has dried on the page, the scientist is incentivized to focus on pursuit on the next impressive p-value. In this landscape, we don’t spend enough time thinking about reproducible science. What does it mean, computationally, to reproduce a result? Where do I go to get an overview of our current understanding for some question in a field without needing to read all published research since the dawn of time? It seems painfully obvious to me that continued confidence in our practice of research requires more standardization and best practices for methods and infrastructure that lead to such results. We need informed ways to compare a new claim to everything that came before it.
Lessons from Software Development and Informatics
Should this responsibility for a complete restructuring of practices, the albatross for the modern scientist, be his burden? Probably this is not fair. Informatics, a subset of science that focuses on the infrastructure and methodology of a scientific discipline, might come to his aid. I came into this field because I’m not driven by answering biological questions, but by building tools. I’ve had several high status individuals tell me at different times that someone like myself does not belong in a PhD program, and I will continue to highly disagree. There is a missing level across all universities, across all of academia, and it is called the Academic Software Developer. No one with such a skillset in their right mind would stay in academia when they could be paid two to three fold in industry. Luckily, some of us either don’t have a right mind, or are just incredibly stubborn about this calling that a monetary incentive structure is less important than the mission itself. We need tools to both empower researchers to assess the reproducibility of their work, and to derive new reproducible products. While I will not delve into some of the work I’ve done in my graduate career that is in line with this vision (let’s save that for thesis drivelings), I wlll discuss some important observations about the academic ecosystem, and make suggestions for current scientists to do better.
Reproducibility and Graduate Students
Reproducibility goes far beyond the creation of a single database to deposit results. Factors such as careful documentation of variables and methods, how the data were derived, and dissemination of results unify to embody a pattern of sound research practices that have previously not been emphasized. Any single step in an analysis pipeline that is not properly documented, or does not allow for a continued life cycle of a method or data, breaks reproducibility. If you are a graduate student, is this your problem? Yes it is your problem. Each researcher must think about the habits and standards that he or she partakes in from the initial generation of an idea through the publishing of a completed manuscript. On the one hand, I think that there is already a great burden on researchers to design sound experiments, conduct proper statistical tests, and derive reasonable inferences from those tests. Much of the disorganization and oversight to sound practices could be resolved with the advent of better tools such as resources for performing analysis, visualizing and capturing workflows, and assessing the reproducibility of a result. On the other hand, who is going to create these tools? The unspoken expectation is that “This is someone else’s problem.” Many seem to experience tunnel vision during graduate school. There is no reality other than the individual’s thesis, and as graduate students we are protected from the larger problems of the community. I would argue that the thesis is rather trivial, and if you spend most of your graduate career working on just one project, you did not give the experience justice. I don’t mean to say that the thesis is not important, because graduation does not happen without its successful completion. But rather, graduate school is the perfect time to throw yourself into learning, collaborating on projects, and taking risks. If you have time on the weekends to regularly socialize, go to vineyards, trips, and consistently do things that are not obsessively working on the topic(s) that you claimed to be passionate about when applying, this is unfortunate. If you aim to get a PhD toward the goal of settling into a comfy, high income job that may not even be related to your research, unless you accomplished amazing things during your time as a young researcher, this is also unfortunate. The opportunity cost of these things is that there is probably someone else in the world that would have better taken advantage of the amazing experience that is being a graduate student. The reason I bring this up is because we should be working harder to solve these problems. With this in mind, let’s talk about tiny things that we can do to improve how we conduct research.
The components of a reproducible analysis
A reproducible analysis, in its truest definition, must be easy to do again. This means several key components for the creation and life cycle of the data and methods:
- complete documentation of data derivation, analysis, and structure
- machine accessible methods and data resources
- automatic integration of data, methods, and standards
A truly reproducible analysis requires the collection, processing, documentation, standardization, and sound evaluation of a well-scoped hypothesis using large data and openly available methods. From an infrastructural standpoint this extends far beyond requiring expertise in a domain science and writing skills, calling for prowess in high performance computing, programming, database and data structure generation and management, and web development. Given initiatives like the Stanford Center for Reproducibile Neuroscience, we may not be too far off from “reproducibility as a service.” This does not change the fact that reproducibility starts on the level of the individual researcher.
While an infrastructure that manages data organization and analysis will immediately provide documentation for workflow, this same standard must trickle into the routine of the average scientist before and during the collection of the input data. The research process is not an algorithm, but rather a set of cultural and personal customs that starts from the generation of new ideas, and encompasses preferences and style in reading papers and taking notes, and even personal reflection. Young scientists learn through personal experience and immersion in highly productive labs with more experienced scientists to advise their learning. A lab at a prestigious University is like a business that exists only by way of having some success with producing research products, and so the underlying assumption is that the scientists in training should follow suit. The unfortunate reality is that the highly competitive nature of obtaining positions in research means that the composition of a lab tends to weigh heavily in individuals early in their research careers, with a prime focus on procuring funding for grants to publish significant results to find emotional closure in establishing security of their entire life path thus far. In this depiction of a lab, we quickly realize that the true expertise comes by way of the Principle Investigator, and the expectation of a single human being to train his or her entire army while simultaneously driving innovative discovery in his or her field is outrageous. Thus, it tends to be the case that young scientists know that it’s important to read papers, take notes, and immerse themselves in their passion, but their method of doing this comes by way of personal stumbling to a local optimum, or embodying the stumbling of a slightly larger fish.
Levels of Writing
A distinction must be made between a scientist pondering a new idea, to testing code for a new method, to archiving a procedure for future lab-mates to learn from. We can define different levels of writing based on the intended audience (personal versus shared), and level of privacy (private versus public). From an efficiency standpoint, the scientist has much to gain by instilling organization and recording procedure in personal learning and data exploration, whether it be public or private. A simple research journal means a reliable means to quickly turn around and turn a discovery into a published piece of work. This is an example of personal work, and private may mean that it is stored on an individual’s private online Dropbox, Box, or Google Drive, and public may mean that it is written about on a personal blog or forum. Keeping this kind of documentation, whether it is private or public, can help an individual to keep better track of ideas and learning, and be a more efficient researcher. Many trainees quickly realize the need to record ideas, and stumble on a solution without consciously thinking ahead to what kind of platform would best integrate with a workflow, and allow for future synthesis and use of the knowledge that is recorded.
In the case of shared resources, for computational labs that work primarily with data, an online platform with appropriate privacy and backup is an ideal solution over more fragile solutions such as paper or documents on a local machine. The previously named online platforms for storing documents (Box, Dropbox, and Google Drive), while not appropriate for PI or proprietary documents, are another reasonable solution toward the goal of shared research writing. These platforms are optimized for sharing amongst a select group, and again without conscious decision making, are commonly the resources that lab’s used in an unstructured fashion.
Documentation of Code
In computational fields, it is typically the case that the most direct link to reproducing an analysis is not perusing through research prose, but by way of obtaining the code. Writing is just idealistic idea and hope until someone has programmed something. Thus, a researcher in a computational field will find it very hard to be successful if he or she is not comfortable with version control. Version control keeps a record of all changes through the life cycle of a project. It allows for the tagging of points in time to different versions of a piece of software, and going back in time. These elements are essential for reproducible science practices that are based on sharing of methods and robust documentation of a research process. It takes very little effort for a researcher to create an account with a version control service (for example, http://www.github.com), and typically the biggest barriers to this practice are cultural. A researcher striving to publish novel ideas and methods is naturally going to be concerned over sharing ideas and methods until they have been given credit for them. It also seems that researchers are terrified of others finding mistakes. I would argue if the process is open and transparent, coding is collaborative, and peer review includes review of code, finding a bug (oh, you are a human and make mistakes every now and then?) is rather trivial and not to be feared. This calls for a change not only in infrastructure, but research culture, and there is likely no way to do that other than by slow change of incentives and example over time. It should be natural for a researcher, when starting a new project, to immediately create a repository to organize its life-cycle. While we cannot be certain that services like Github, Bitbucket, and Sourceforge are completely reliable and will exist into infinitum, this basic step can minimally ensure that work is not lost to a suddenly dead hard-drive, and methods reported in the text of a manuscript can be immediately found in the language that produced the result. Researchers have much to gain in being able to collaboratively develop methods and thinking by way of slowly gaining expertise in using these services. If a computational graduate student is not using and established in using Github by the end of his or her career, this is a failure in his or her training as a reproducible scientist.
On the level of documentation in the code itself, this is often a personal, stylistic process that varies by field. An individual in the field of computer science is more likely to have training in algorithms and proper use of data structures and advanced programming ideas, and is more likely to produce computationally efficient applications based on bringing together a cohesive set of functions and objects. We might say this kind of research scientist, by way of choosing to study computer science to begin with, might be more driven to develop tools and applications, and unfortunately for academia will ultimately be most rewarded for pursuing a job in industry. This lack of “academic software developers,” as noted previously, is arguably the prime reason that better, domain-specific, tools do not exist for academic researchers. A scientist that is more driven to answer biological questions sees coding as a means to procure those answers, and is more likely to produce batch scripts that use software or functions provided by others in the field. In both cases, we gripe over “poorly documented” code, which on the most superficial level suggests that the creator did not add a proper comment to each line explaining what it means. An epiphany that sometimes takes years to realize is the idea that documentation of applications lives in the code itself. The design, choice of variable names and data structures, spacing of the lines and functions, and implementation decisions can render a script easy to understand, or a mess of characters that can only be understood by walking through each line in an interactive console. Scientists in training, whether aiming to build elegant tools or simple batch scripts, should be aware of these subtle choices in the structure of their applications. Cryptic syntax and non-intuitive processes can be made up for with a (sometimes seemingly) excessive amount of commenting. The ultimate goal is to make sure that a researcher’s flow of thinking and process is sufficiently represented in his programming outputs.
Documentation Resources for Scientists
A salient observation is that these are all service oriented, web-based tools. The preference for Desktop software such as Microsoft Word or Excel is founded on the fact that Desktop software tends to provide better user experience (UI) and functionality. However, the current trend is that the line is blurring between Desktop and browser, and with the growing trend of browser-based offline tools that work with or without an internet connection, it is only a matter of time until there will be no benefit to using a Desktop application over a web-based one. Research institutions have taken notice of the benefit of using these services for scientists, and are working with some of these platforms to provide “branded” versions for their scientists. Stanford University provides easy access to wikis, branded “Box” accounts for labs to share data, along with interactive deployment of Wordpress blogs for individuals and research groups to deploy blogs and websites for the public. Non-standard resources might include an online platform for writing and sharing LaTex documents http://www.overleaf.com, for collecting and sharing citations (http://www.paperpile.com, http://www.mendeley.com), and for communicating about projects and daily activity (http://www.slack.com) or keeping track of projects and tasks (http://www.asana.com).
This link between local and web-based resource continues to be a challenge that is helped with better tools. For example, automated documentation tools (e.g., Sphinx for Python) can immediately transform comments hidden away in a Github repository into a clean, user friendly website for reading about the functions. Dissemination of a result, to both other scientists and the public, is just as important (if not more important) than generation of the result, period. An overlooked component toward understanding of a result is providing the learner with more than a statistical metric reported in a manuscript, but a cohesive story to put the result into terms that he or she can relate to. The culture of publication is to write in what sounds like “research speak,” despite the fact that humans learn best by way of metaphor and story. What this means is that it might be common practice to, along with a publication, write a blog post and link to it. This is not to say that results should be presented as larger than they really are, but put into terms that are clear and undertandable for someone outside of the field. Communication about results to other researchers and the public is an entire thesis in itself, but minimally scientists must have power to relate their findings to the world via an internet browser. Right now, that means a simple text report and prose to accompany a thought, or publication. Our standards for dissemination of results should reflect modern technology. We should have interactive posters for conferences, theses and papers immediately parsed for sophisticated natural language processing applications, and integration of social media discussion and author prose to accompany manuscripts. A scientist should be immediately empowered to publish a domain-specific web report that includes meaningful visualization and prose for an analysis. It might be interactive, including the most modern methods for data visualization and sharing. Importantly, it must integrate seamlessly into the methodology that it aims to explain, and associated resources that were used to derive it. It’s up to us to build these tools. We will try many times, and fail many times. But each effort is meaningful. It might be a great idea, or inspire someone. We have to try harder, and we can use best practices from software development to guide us.
The Academic Software Developer
I don’t have an “ending” for this story, but I can tell you briefly what I am thinking about. Every paper should be associated with some kind of “reproducible repo.” This could mean one (or more) of several things, depending on the abilities of the researcher and importance of the result. It may mean that I can deploy an entire analysis with the click of a button, akin to the recently published MyConnectome Project. It may mean that a paper comes with a small web interface linking to a database and API to access methods and data, as I attempted even for my first tiny publication. It could be a simple interactive web interface hosted with analysis code on a Github repo to explore a result. We could use continuous integration outside of its scope to run an analysis, or programatically generate a visualization using completely open source data and methods (APIs). A published result is almost useless if care is not taken to make it an actionable, implementable thing. I’m tired of static text being the output of years of work. As a researcher I want some kind of “reactive analysis” that is an assertion a researcher makes about a data input answering some hypothesis, and receiving notification about a change in results when the state of the world (data) changes. I want current “research culture” to be more open to business and industry practice of using data from unexpected places beyond Pubmed and limited self-report metrics that are somehow “more official” than someone writing about their life experience informally online. I am not convinced that the limited number of datasets that we pass around and protect, not sharing until we’ve squeezed out every last inference, are somehow better than the crapton of data that is sitting right in front of us in unexpected internet places. Outside of a shift in research culture, generation of tools toward this vision is by no means an easy thing to do. Such desires require intelligent methods and infrastructure that must be thought about carefully, and built. But we don’t currently have these things, and we are already way fallen behind the standard in industry that probably comes by way of having more financial resources. What do we have? We have ourselves. We have our motivation, and skillset, and we can make a difference. My hope is that other graduate students have equivalent awareness to take responsibility for making things better. Work harder. Take risks, and do not be complacent. Take initiative to set the standard, even if you feel like you are just a little fish.