Sunday, December 30, 2018

Truly Reproducible Research Papers

A slide from Prof. Barry Smyth's presentation
If you perform an experiment and get some interesting results which cannot be redone and get the same results by somebody else, something is wrong with your finding. This is called reproducibility of research. If it is not reproducible, it is not science. You might think that systematic research carried out by academics and professional scientists who publish papers in conferences and journals are doing reproducible research. Not really.

Majority of research papers I've come across in my own domain are bare descriptions and explanations of their results without proper support for reproduction of the results by anybody interested. Even though a paper with a good quality provides a lot of details of their experimental setups and settings, it difficult to truly recreate their results completely based on the details in the paper. It is often necessary to contact the authors and have a correspondence back and forth several times to get things clear. Similarly, if I ask myself whether I can reproduce a research work I had published few years ago solely based on the details I had put down on my own paper, I have to give a big 'No' unfortunately.

This is a bad way to do science.

It is unfair to computer scientist if I say they are not putting any effort to make their research reproducible. There are two important ways they try do it these days. The first is giving away data sets that they had collected. This allows third parties to verify their results and also to extend and build upon it. The second is to provide the source codes of the experimental implementations they have made. They usually put their codes into a Github repository and provide the link in their research papers so that readers can find the source code repository and reuse their code.

Another slide from Prof. Barry Smyth's presentation
Recently I attended to a talk delivered by Prof. Barry Smyth in UCD, Ireland where he suggested two interesting ways to make our research papers reproducible. The first is a practice which is much simpler and easier to do. That is to produce a Jupyter Notebooks along the scientific publication which has both software codes, data, descriptions and explanations in a well documented manner which a third party can quickly run and build upon. If you haven't used or read about Jupyter Notebooks, have a look at the first link in the references section. It's a way to produce well documented software codes where you have your software codes, their descriptions and their output in a report-like format.

There's even more powerful way of making reproducible research papers. Imagine you are producing a research paper where the paper talks about a 30% improvement in something. How to enable the reader to verify whether this number is truly 30% by using their own experimental data? If I'm giving away the source codes of my implementations, does the reader has sufficient information to locate the correct programs and execute them in the correct sequence in order to get the final result? This is where the tool "Kallysto" comes in. It is a tool developed by Prof. Barry in order to make scientific publications fully reproducible and traceable. Kallysto combines Latex with Jupyter Notebooks in such a way, your Latex manuscript is directly linked with the original data and the software codes which analyze them. While the typical workflow of writing a research paper is to (1) analyze data, (2) produce graphs as images or PDF files, and finally (3) create a Latex manuscript which explicitly include those graphs. When you compile your Latex source files, Kallysto will run the Jupyter Notebooks analyzing data and generates the results in real-time which will be used by Latex to produce the final PDF document.

The idea of Prof. Barry Smyth is to make scientific publications truly reproducible by scripting everything from the data to results and finally to latex documents.

References:

[1] Jupyter notebooks

[2] Netflix Papermil tool

[3] The tool made by Prof. Barry Smyth called Kallysto

No comments:

Post a Comment