A slide from Prof. Barry Smyth's presentation |
Majority of research papers I've come across in my own domain are bare descriptions and explanations of their results without proper support for reproduction of the results by anybody interested. Even though a paper with a good quality provides a lot of details of their experimental setups and settings, it difficult to truly recreate their results completely based on the details in the paper. It is often necessary to contact the authors and have a correspondence back and forth several times to get things clear. Similarly, if I ask myself whether I can reproduce a research work I had published few years ago solely based on the details I had put down on my own paper, I have to give a big 'No' unfortunately.
This is a bad way to do science.
It is unfair to computer scientist if I say they are not putting any effort to make their research reproducible. There are two important ways they try do it these days. The first is giving away data sets that they had collected. This allows third parties to verify their results and also to extend and build upon it. The second is to provide the source codes of the experimental implementations they have made. They usually put their codes into a Github repository and provide the link in their research papers so that readers can find the source code repository and reuse their code.
Another slide from Prof. Barry Smyth's presentation |
There's even more powerful way of making reproducible research papers. Imagine you are producing a research paper where the paper talks about a 30% improvement in something. How to enable the reader to verify whether this number is truly 30% by using their own experimental data? If I'm giving away the source codes of my implementations, does the reader has sufficient information to locate the correct programs and execute them in the correct sequence in order to get the final result? This is where the tool "Kallysto" comes in. It is a tool developed by Prof. Barry in order to make scientific publications fully reproducible and traceable. Kallysto combines Latex with Jupyter Notebooks in such a way, your Latex manuscript is directly linked with the original data and the software codes which analyze them. While the typical workflow of writing a research paper is to (1) analyze data, (2) produce graphs as images or PDF files, and finally (3) create a Latex manuscript which explicitly include those graphs. When you compile your Latex source files, Kallysto will run the Jupyter Notebooks analyzing data and generates the results in real-time which will be used by Latex to produce the final PDF document.
The idea of Prof. Barry Smyth is to make scientific publications truly reproducible by scripting everything from the data to results and finally to latex documents.
References:
[1] Jupyter notebooks
[2] Netflix Papermil tool
[3] The tool made by Prof. Barry Smyth called Kallysto