Day 5: Open Results

Day 5: Open Results#

By Neuromatch Academy & NASA

Content creators: NASA, Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan

Content reviewers: Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan, Shubhrojit Misra, Viviana Greco, Courtney Dean

Production editors: Hlib Solodzhuk, Konstantine Tsafatinos, Ella Batty, Spiros Chavlis

Tutorial Objectives#

Estimated timing of tutorial: 2 hours

This day focuses on giving you the tools you need to kick-start a scientific collaboration by creating contributor guidelines that ensure ethical contributorship. It starts out with a use case of open science in action, then a review of how to discover and assess open results. Next, the focus is on how to publish results, which includes a task checklist. The module wraps up with specific guidance for writing the sharing results section of the Open Science and Data Management Plans (OSDMP). We will also reflect on how our society and technology are constantly evolving in the way we do science.

Section 1: Introduction to Open Results#

This section aims to broaden your perspective regarding what shareable research outputs are produced throughout the research lifecycle. We will first consider what constitutes an open result. To do so, we will read an example of a forward-thinking research project that utilizes open-result best practices. The perspectives gained from this example will ultimately get us thinking about how we can work toward creating reproducible research.

What Research Objects are Created Throughout the Research Cycle?#

The Traditional Depiction of a “Scientific Result” Has Changed Over Time#

When we think of results, most people think of just the final publication.

1665

This publication dates back to 1665 when the first scientific journal, Philosophical Transactions, was established to publish letters about scientific observations and experimentation.

1940s

Later, in the 1940s, publishing became commercialized and took over as the mechanism for releasing journals, conference proceedings, and books. This new business model normalized publication paywalls.

21st century

Only by the 21st century did the scientific community expand the meaning of open results. The evolution of this definition was driven by technological advances, such as the internet, and advances in modes to share information. The open access movement was established by the Budapest Open Access Initiative in 2002 and the Berlin Declaration on Open Access in 2003, both of which formalized the idea that, with regard to new knowledge, there should be “free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles” (Budapest Open Access Initiative).

But Results Have Always Been Far More Than Just the Publication#

You might be familiar with the research life cycle but may not have considered what results could be shared openly throughout its process. This lesson adopts a definition of the research life cycle based on The Turing Way and breaks it down into nine phases, pictured in the figure below.

Although the phases are presented in a linear fashion, we acknowledge that the research lifecycle is rarely ever linear! Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!

Following Garcia-Silva et al. 2019, we define a Research object (RO) as a method for the identification, aggregation, and exchange of scholarly information on the Web. Research objects can be composed of both research data and digital research objects that are defined as follows by the Organization for Economic Co-Operation and Development (OECD Legal Instruments).

The term ‘Open Results’ comprehensively includes all these research products and more.

Open results can include both data and code. Since data and code were covered in previous modules, in this lesson, we focus on sharing science outcomes as open results. Examples of open results can include:

Open access peer-reviewed articles
Technical reports
Computational notebooks
Code of conduct, contributor guidelines, publication policies
Blog posts
Short-form videos and podcasts
Social media posts
Conference abstracts and presentations
Forum discussions

Open-access peer-reviewed articles are archived for long-term preservation and represent a more formal discussion of scientific ideas, interpretations, and conclusions. These discussions inform the method by which researchers share results. In the following tutorial section, we will discuss different types of sharing and methods to build and adapt them for use in your research.

Scientists can share their incremental progress throughout the research process and invite community feedback. Sharing more parts of the research process creates more interactions between researchers and can improve the end result (which may be a peer-reviewed article).

Throughout this day, we will show you how to use, make, and share open results.

Examples of Open Results#

Let’s broaden our perspectives on the types of research objects that are produced throughout the research process. Let’s take a look at some examples from different projects.

Case Study: Reaching New Audiences#

Qiusheng Wu is an associate professor at the University of Tennessee. He has published 500+ video tutorials on YouTube, which have gained 25K+ subscribers and 1.1M+ views (as of 8/2023).

Professor Qiusheng Wu created a YouTube channel in April 2020 for the purpose of sharing video tutorials on the geemap Python package that he was developing. Since then, Wu has published over 500 video tutorials on open-source geospatial topics. The channel has gained over 25K subscribers, with more than 1 million views and 60K watch hours in total. On average, it receives 70 watch hours per day.

The YouTube channel has allowed Wu to reach a much larger audience beyond the confines of a traditional classroom. It has made cutting-edge geospatial research more accessible to the general public and has led to collaborations with individuals from around the world. This has been particularly beneficial for Wu’s tenure promotion as it has resulted in increased funding opportunities, publications, and public engagement through the YouTube channel, social media, and GitHub.

Overall, the YouTube channel serves as an important tool for Wu to disseminate research, inspire others, and contribute to the advancement of science. It has also played a significant role in advancing Wu’s professional career.

Case Study: New Media for Science Products#

“A new method reduced the compute time for this image from ~30 minutes to <1 minute”. In 2021, Lucas Sterzinger spent one summer of his PhD on an internship. During that summer, he wrote a blog post to explain and demonstrate a game-changing technology called Kerchunk – a software package that makes accessing scientific data in the cloud much faster.

Source

Alongside the blog post, he also created a tutorial as a Jupyter Notebook – both of these resources and associated code are freely accessible to the public, allowing for rapid adoption and iteration by other developers and scientists. He posted the blog on Medium and posted about it on Twitter. The blog got a lot of attention on a newly developed technology as it was being developed! This is starkly different from the slow and complicated world of academic publishing, where this result would not have been shared for about a year (writing it up, the review process, publication process). He said, “Working on Kerchunk and sharing it widely using open science principles greatly expanded my professional connections and introduced me to the field of research software engineering. The connections I made from this led me directly to my current role as a Scientific Software Developer at NASA.”

Case Study: New Products for Increasing Impact#

From “2003: let’s map the UK to 2023: using the data to map the world with applications ranging from Uber to mapping UN Sustainable Development Goals.” (>1.5M contributors, 100M+ edits) OpenStreetMaps is being used for GIS analysis, such as planning or logistics for humanitarian groups, utilities, governments and more. This was only possible because it was set up and shared openly and built by a community devoted to improving it. You never know where your personal project might go or who might be interested in collaborating!

Case Study: Reporting and Publication#

The public is interested in what you are doing, and reaching them can involve communication through traditional and new platforms. Publishing results on platforms such as Twitter/X, YouTube, TikTok, blogs, websites, and other social media platforms is becoming more common. Awareness through social media drastically increases the reach and audience of your work. There have been studies on how this impacts citation rates. For example, The Journal of Medical Internet Research (JMIR) conducted a three-year study of the relative success of JMIR articles in both Twitter and academic worlds. They found that highly tweeted articles were 11 times more likely to be highly cited than less tweeted articles.

Open communication platforms noticeably furthered the reach and audience of results.

What is the Reproducibility Crisis?#

A 2016 Nature survey on reproducibility found that of 1,576 researchers, “More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” The ‘reproducibility crisis’ in science is a growing concern over several reproducibility studies where previous positive results were not reproduced.

We must consider the full research workflow if we are to solve the reproducibility crisis. The fact that 70% of researchers could not reproduce other scientists’ results is shocking, especially considering that the reproducibility of science is the cornerstone of the scientific method.

There are many personal incentives to implement open science principles throughout all stages of the research process. By making results open throughout, you increase your ability to reproduce your own results. This also has implications for research beyond the ability to improve your research.

What is the Cause of This Reproducibility Crisis?#

The three main causes of the reproducibility crisis are:

Intermediate methods of research are often described informally or not at all.
Intermediate data are often omitted entirely.
We often only think about results at the time of publication.

We need to think of the entire research process as a result. As an example, scientific articles describe computational methods informally which demands significant effort from others to understand and to reuse.

Articles often lack sufficient information needed for other researchers to reproduce results, even when data sets are published, according to two studies in Nature Genetics and Nature Methods. Raw and/or intermediate data products and relevant software are often not provided alongside the final manuscript, limiting the reader’s ability to attempt replication.

Without access to the source codes for the papers, reproducibility has been shown to be elusive, according to two other studies in Briefings in Bioinformatics and Nature Physics.

Combating the Reproducibility Crisis#

If your research workflow uses principles of open results, as showcased in the example, this will help you to combat the reproducibility crisis.

We can create reproducible workflows and combat this crisis by considering open results at each stage of the research lifecycle. An Open Science and Data Management Plan (OSDMP) helps researchers think and plan for all aspects of sharing by determining how they will make software and data available. This plan can be shared publicly early on through a practice called pre-registering, where researchers determine their analysis plan and data collection procedure before a study begins (discussed previously in Section 2 of Day 2).

What Could You Do?#

The OpenSciency team created a large table that describes all the different kinds of shareable research objects that are possible to create throughout the research lifecycle.

Thinking about sharing everything all at once can be overwhelming when you are getting started. To move forward, just focus on how you might pick the most important item. Here, we have pared down the list to only a couple of items per category. Furthermore, you could think about shortening the list even further when you are getting started. For example, maybe it is the case that, for your work, sharing the code used to wrangle the data is the most critical element to reproducibility. Therefore, code-sharing would be a good place to start your open science journey. The small steps we make are what move us towards sustainable open science.

Ideation: Proposals can be shared on Zenodo and open grant platforms such as ogrants.org.
Planning: Projects can be pre-registered before they begin.
Project Design: Contributor guidelines or a code of conduct can be posted on Zenodo, GitHub, or team Web Pages.
Engagement & Training: Workflow computational notebooks can be shared with the team via GitHub and released on Zenodo.
Data Collection: Raw data can be shared through data repositories.
Data Wrangling: Code can be shared through software repositories.
Data Exploration: Computational notebooks can be shared via GitHub and released on Zenodo.
Preservation: Data management plans for archiving can be posted on Zenodo.
Reporting & Publication:
- Open access peer-reviewed articles
- Computational notebooks
- Code of conduct, contributor guidelines, publication policies
- Blog posts
- Short-form videos and podcasts
- Social media posts
- Conference abstracts, posters, and presentations (when made openly available)
- Forum discussions

Key Takeaways#

In this section, you learned that:

The contemporary scientific workflow involves being open about processes and products. Research products (results) include far more than just the final manuscript, which is a drastic change from the historical notion of a scientific result.
At every stage of the research lifecycle, there are research objects produced that we can consider results.
We can combat the reproducibility crisis by sharing these research objects at each stage of our research workflow.
There are amazing examples of research groups sharing different types of open results!

Section 2: Using Open Results#

In this lesson, you will be familiar with resources for open results utilization, how and when to cite the sources of the open results that you use, how to provide feedback to open results providers, and how to determine when it is appropriate to invite authors of the open results materials to be formal collaborators versus simply citing those resources in your work.

Published articles, blog posts, and forums can lead to new ideas for your own research. A technique learned from social media can be applied to a use case that you are trying to solve. There are many different ways to discover results.

How to Discover Open Results#

How do I learn about the state of research for a particular field? How do you engage in the current conversation? Researchers often begin with a search of peer-reviewed articles. This review tells you how much research has been done in a field and what conclusions have recently been reached. In most fields, going through the peer-review process can take up to a year. The ability to find pre-prints can help reduce this delay because they offer the latest findings before the publication date. However, researchers who choose to share their results before publication typically do so in the ways listed as best practices above. As you start researching a topic, how do you find all these different types of results and engage in the most relevant research?

The various stages of research, from conceptualization to dissemination of results, produce products that can be put into the public domain as “Open Results”. Where these results are archived and to what degree depends on the discipline author. However, some general guidelines on where to start a search on open results include:

Scholarly Search Portals
Web Searches

Scholarly Search Portals

Search engines like Google and Bing have radically changed how we look up information. For research results, specialized academic search engines and portals curate scientific results from researchers based on topic and field. These engines are useful for finding peer-reviewed articles.

Generic:

Google Scholar
ADS
Scopus
Web of Knowledge
Open knowledge map - facilitates exploration of interconnected topics
JSTOR - a wide range of scholarly content
ResearchGate
ScienceCast

Discipline-specific:

EuropePMC Life sciences
Pubmed biomedical literature
arXiv - for scholarly pre-prints in STEM, economics and computer science fields
Biorxiv Preprint - server for biology
EarthArXiv and Earth and Space Science Open Archive
ASAPbio - catalogs of preprint servers
and others...

Publications that provide some levels of open access are tracked in the Directory of Open Access Journals (DOAJ).

Web Searches

Open results include much more than open-access peer-reviewed publications. How do you find these alternative types of research objects?

Open communities and forums offer the best way to find research objects other than complete publications. How do you even find out whether these exist and where they are?

Once you have found a few peer-reviewed articles that are highly relevant, to find additional research objects, you can follow the authors on social media for links to their posts, blogs, and activities. There are open communities in almost every area of research - find yours! Here are different platforms to locate these conversations and resources:

GitHub
LinkedIn
YouTube
Google/Bing
Conference websites
X, formerly known as Twitter
Facebook
Medium
Substack
Stack Overflow
Reddit
Mastodon

Various research objects, including datasets and software, are frequently attached to scholarly publications in the form of supplemental material. At other times, the source is referenced in the paper, which could be a GitHub repository, personal/institutional website, or other storage site. This can be another starting point, by engaging in discussions on the GitHub repository.

How to Assess Open Results#

“Garbage in, garbage out” – your own research products are only as good as the data used in your investigation.

If you use poor-quality data or materials from unreliable and unvetted sources as critical components of your research, you run the risk of producing flawed or low-quality science that may harm your reputation as a scientist. Therefore, it is critical to assess the quality and reliability of open-results sources before you include them in your own work.

What are the best practices for assessing the quality of alternative sources of data to research articles such as blog posts, YouTube videos, and other research objects?

Attributes of Reputable Material#

Let’s take a look at the questions you might consider asking yourself when determining the reliability of any type of open results source.

Here, we list questions under two categories: the open results material themselves and the server they are downloaded from. The more questions here that can be answered in the affirmative the lower the risk of utilizing the open results materials for your own research.

The material itself

Is the material associated with a peer-reviewed publication?
Are the primary data associated with the results also open-source?
Is the code used to generate the Open Results materials also open-source?
Are all fields and parameters clearly defined?
Is the derivation of measurement uncertainties clearly described?
Were any data or results excluded, and if so, were criteria provided?
Are authoring teams also members of the field?

The associated website/server

Does the host website's URL end in .edu, .gov, or (if managed by a non-profit organization) in .org?
Does the host website provide contact information of the author and/or organization?
Is the host website updated on a frequent basis?
Is the host website free of advertisements and/or sponsored content, the presence of which could indicate bias?

Source reliability indicators

Is the result reproducible? Can you interact with the data and results? Have others reported being able to reproduce the results?
Is the author reliable? Have you seen them publish or share results in other forums?
Is the result from only a single author/voice or includes contributions from a broader community?
Does the post have a significant amount of likes/views and public comments? The value of a blog post with no comments or responses can be difficult to assess. Conversely, a thorough GitHub discussion forum with multiple views shared indicated a robust post.
Is the result part of an active conversation? (Is the information still relevant and current?)

Adapted from source.

Note that failure to meet one or many of the criteria does not automatically mean that the open results are of poor quality but rather that more caution should be exercised if incorporated into your own research. It also means that you will have to invest more personal vetting of the material to ensure its quality is sufficient for your purposes.

Reliable Example: Qiusheng Wu YouTube videos (as mentioned in the previous section). Professor Wu is an expert in his field. He presents results along with notebooks that demonstrate reproducibility. Comments on his YouTube tutorial videos represent meaningful interactions between users reproducing results and the author.

Activity 1: Find Results For Your Research#

Estimated time for activity: 10 minutes. It is an individual activity.

For this activity, review the proposed scholarly search portals and web portals above and find the relevant studies to your current research interests. In particular, try searching for social media posts and blog posts, which are typically not the first place to look for open results but might benefit you tremendously. Then, evaluate your findings on the matter of reliability.

How to Use Open Results#

While open results benefit science and have already provided valuable societal benefits, the misuse and incautious sharing of open materials can have far-reaching harmful effects. The end-user of open results bears the responsibility to ensure that the data they reference are used in a responsible manner and that any relevant guidelines for the use of the data are followed.

How to Contribute and Provide Constructive Feedback#

Contributing to and providing constructive feedback are vital components for a healthy open access ecosystem, ensuring the long-term sustainability of the open resources by providing continual improvements and capability expansions.

In our current system, there are results creators and consumers. This scenario presents a one way street with no feedback loop, no sharing of data back to publishers, and no sharing between intermediaries.

The practice of producing open results aims to foster a system where feedback loops exist between users and makers. Users share their cleaned, integrated, or improved work with the maker. This feedback creates a symbiotic and sustainable process where everyone benefits.

Your Responsibilities as an Open Results User#

Users should familiarize themselves with contributor guidelines posted to open result repositories and follow the associated policies. What if there aren’t contributor guidelines? Contact the creators!
Always provide feedback in a respectful and supportive manner.
If you discover an error in Open Results materials, the ethical action to take is to contact the author (or repository, depending on the nature of the issue) and give them the opportunity to correct the problem rather than ignoring the issue or (worse!) taking advantage of a fixable issue to elevate your own research.

Different Ways to Provide Feedback#

Use GitHub Issues

Pro: The feedback is open, and other community members can see ongoing issues that are being addressed.
Pro: Contribution is archived and logged on GitHub.

See this blog for general issue etiquette while working with GitHub Issues.

Email authors

Con: the feedback is closed. The information is generally not propagated back to the community unless the creator creates a new version.
Con: No way of tracking credit.

Getting Credit for Providing Feedback#

If your feedback results in a substantial intellectual contribution to the work, it is reasonable for you to expect an opportunity for co-authorship in a future version of the open result. The associated contribution guidelines should address this possibility and manage expectations prior to your providing feedback.

Sadly, many times, contributor guidelines do not exist, and it is not clear what is “substantial”.

Open Results User Responsibilities#

Institutional Security Compliance: Always download code from an authoritative source and be familiar with / follow your institution’s IT security policies.
Licensing Policies: Understand and abide by the license(s) associated with the open results materials being used.
Attribution and Contribution: Provide appropriate attribution for the open results used and contribute to the open results community.

Additionally, give credit to repositories that provide open-source materials in the acknowledgment section of your paper. If the repository provides an acknowledgments template in its “About” link, follow that suggestion. Otherwise, a generic “This research has made use of <insert repository name>.” will be sufficient.

Avoid Plagiarism When Using Open Results#

Standard guidelines that you’ve been using in your research all along for providing appropriate attribution and citations of closed-access publications also apply to open-access published works.

Examples of plagiarism include:

Word-for-word copying without permission and source acknowledgment.
Copying components (tables, processes, equipment) without source attribution.
Paraphrasing an idea without proper source referencing.
Recycling one’s own past work and presenting it as a new paper.

Here is a useful guide regarding the different forms of plagiarism.

How to Cite Open Results#

Giving proper attribution to open results is an important and ethical responsibility for using open source materials. The process for citation is specific to the nature of the material.

If a paper has been formally published in a journal, then your citation should point to the published version rather than to a preprint server.

Take the time to locate the originating journal to provide an accurate citation.

Preprint Server (Cite only if journal publication not available)

Source Publication (Always cite)

If a paper that you wish to cite is not yet accepted for publication, you should follow the guidelines of the journal to which you are submitting your paper. A preprint reference citation typically includes the author name(s), date of the most recent version posted, paper title, name of the preprint server, object type (“preprint”), and the DOI.

At the time of the Lesson preparation, the following paper did not yet appear as a journal publication.

For material that has a DOI

To cite all of the following, follow existing guidelines and community best practices:

Cite publications
Cite data
Cite software
Cite any other object with a DOI. Since many journals will only allow authors to cite material that has a DOI, what do you do with other types of open results?

For material that does not have a DOI

Examples include blog posts, videos, and notebooks.

You could also contact the author and ask them to obtain a DOI.
Leave a comment in the comments section or on the forum letting the author know about your publication.

For other materials or interactions that were helpful for your research

Acknowledge communities and forums that helped you advance your research in the Acknowledgements Section. Not only does this give them credit, but it helps others find those communities.
Citing open research results advances science by giving appropriate credit for all parts of the research process. This is essential for the cultural shift to open science; we must give credit for all types of contributions and expect them in return. Participatory science allows more people from more places with different voices and experiences to participate in science.
Contributing and collaborating this way lowers the barriers (like conference fees) to participate in science and broadens who can participate.

Case Study: Giving Credit#

In the Section 1 blog post example, researchers acknowledged people they worked with in an article they wrote that they found helpful, and two different communities, as well as the computational environment they worked on. This is a great example of giving credit: “I would like to thank Rich Signell (USGS) and Martin Durant (Anaconda) for their help in learning this process. If you’re interested in seeing more detail on how this works, I recommend Rich’s article from 2020 on the topic. I would also like to recognize Pangeo and Pangeo-forge who work hard to make working with big data in geoscience as easy as possible. Work on this project was done on the Pangeo AWS deployment.”

Key Takeaways#

In this section, you learned:

Open results can be found using both Scholarly Search Portals and Web searches.
The reliability of a post can generally be evaluated by the trustworthiness of the website from which it originated, the engagement of community members, and the scientific rigor of its content.
Users of open results, as inherent stewards of the open source community, informally carry some responsibility to contribute to the community’s sustainability. This participation includes providing feedback to open results providers and developers.
Giving proper attribution to open results is an important and ethical responsibility for using open source materials. The process for citation is specific to the nature of the material.

Section 3: Making Open Results#

In Section 2 you learned how to use other’s results. In this section, we focus on making open results. We will start by discussing what it means to make reproducible results. Having earlier in the course discussed the computational reproducibility practices in open software, in this section, we specifically emphasize the importance of collaborations in making those results open and reproducible. This begins with acknowledging that the scientific results are not made by single individuals. We will then teach how to ensure equitable, fair, and successful collaborations when making your open results that acknowledge all contributions. Once you’ve planned the rules of engagement, we will provide you with ways to ensure that your reporting and publication abide by open results principles and combat the reproducibility crisis.

How to Make Open Results#

Capturing the Research Process Accurately in the Making of Results#

I am aware of the reproducibility crisis and how open science can help combat it. What practical ways can I apply to my research outputs to make open results? How can I ensure that the results I share can be reproduced by others? How can I publish scientific publications that do not add to, but combat the reproducibility crisis?

In the Ethos of Open Science, you learned about the ethics and principles underlying responsible open science practices. In Open Code, you explored and identified the right tools and methods that ensure the usability and reproducibility of your analysis. In Open Data, you developed a data management plan that can ensure the Findability, Accessibility, Interoperability, and Reusability (FAIR) of your data throughout the research process, and not just at the end when the final report from the project is released. These open science approaches directly address the root causes of the reproducibility crisis, which are a lack of openness throughout the scientific process, lack of documentation, poor description of intermediate methods, or missing data that were used at intermediate stages of the research process. In this lesson, you will learn to put all of these together to ensure that you are prepared to make your open results easy to reproduce by others.

In Section 1, we identified different research components that can be considered open results at various stages of research. In this section, we want to specifically explain what processes are involved in making them.

Case Study: Open Results from Distributed Multi-Team Event Horizon Telescope Collaboration (EHTC)#

In 2017, the Event Horizon Telescope targeted supermassive black holes with the largest apparent event horizons, M87 and Sgr A*, in the Galactic Center on four separate days. This distributed collaboration led to the multi-petabyte yield of data that allowed astronomers to unveil the first image of a black hole, providing the strongest visual evidence of their existence. The EHTC website provides information about research projects, scientific methods, instruments, press and media resources (such as blog posts, news articles, and YouTube videos), as well as events, data, proposals, and publications. This project shows large-scale and high-impact work that applies open practices in making their results. Different kinds of outputs shared under this project can be mapped to different stages of the research process and the teams involved in creating them.

Making Results and Crediting Contributors Fairly at Different Stages of Research#

The case studies listed above highlight that results associated with a project are more than a publication. By understanding how open results are created in different projects, we can gain deep insights into the processes for making them. With that goal, the rest of this section describes the process of making results into three parts:

making all types of research outputs;
recognizing all contributors;
combining outputs for scientific reporting and publications.

Making All Types of Research Outputs#

New ways of working with creative approaches for collaboration and communication in research have opened up opportunities to engage with the broader research communities by sharing scientific outcomes as they develop rather than at the end through summary articles. A range of research components are created throughout the research lifecycle that can be shared openly. For example, resources created in a scientific project include, but are not limited to the following:

Ideation and planning

Ideation and planning – perhaps before the research project is funded or started:

Research proposals
People and organizations involved
Research ethics guidelines
Data management plan

Data collection and exploration

Data collection and exploration – research artifacts created during the active research process:

Project repository
Project roadmap and milestones
Resource requirements
Project management resources (without sensitive information)
Collaboration processes like Code of Conduct and contributor guidelines
Virtual research environment
Data and metadata information

Community engagement and reproducibility

Community engagement and reproducibility – most valuable during the project period:

Training and education materials
Computational notebooks
Computational workflow
Code repository (version controlled)
Blog posts
Short form videos and podcasts
Social media posts
Forum discussions (for example, when asking for feedback or troubleshooting)

Preservation and publication

Preservation and publication – expected to persist long-term:

Publication and authorship guidelines
Open access peer-reviewed articles
Conference abstracts and presentations
End of project report
User manual or documentation
Public outreach and events

Image credit: The Turing Way project illustration by Scriberia. Zenodo.

You have already come across some of these in the previous lessons, and hopefully, you could already identify which of these or additional outputs you are generating in your work. To make them part of your open results, it’s important that they are shared openly with appropriate licensing and documentation so that others can read, investigate, and, when possible, reuse or build upon them.

Making Open and Reproducible Results#

Open science ultimately informs our decisions as scientists and guides the selection of approaches that contribute to making our results open at different stages. One of the main purposes of open results is to ensure research reproducibility, often explained through definitions such as the following by Stodden (2015):

“Reproducibility is a researcher’s ability to obtain the same results in a published article using the raw data and code used in the original study.”

Stodden (2015)

Using this definition, results that can be computationally reproduced by others would be called Reproducible Results. The EHTC case studies present open results as collections of research objects created at different stages of the research process. They also provide documentation and resources that allow reanalysis and reproduction of the original results.

Ideally, anyone, anywhere, must be able to read a publication and understand the results, easily find methods applied, as well as properly follow procedures to achieve the same results as shared in that study. However, as already learned, the issue of reproducibility is prevalent across all scientific fields (refer to this Nature report). A well-intentioned scientist may share all research objects and describe all steps applied in their research, but failing to provide the research environment or other technical setup they used for analyzing their data can prohibit others from reproducing their results. This issue is further compounded by human bias and errors. For example, individuals may not always be able to identify how their interests and experiences inform their decisions that impact their research conclusions. This makes the issue of combating the reproducibility crisis even bigger.

Approaches for making open results should integrate reproducible tools and methods, such as version control, continuous integration, containerization, code review, code testing, and documentation. Furthermore, to extend the reproducibility beyond computational aspects of research, reporting, and documentation for different types of outputs and decisions should also be supplied transparently.

How to Make Different Types of Open Results#

Sharing different types of results as early as possible not only helps you find solutions faster but also helps your science be more reproducible because that openness helps you understand how to communicate your methodologies and your findings more clearly to others. Here, we provide some easy places to start creating your results openly.

Writing a forum post

Often, when first starting in research, public forums are a great place to begin understanding and collaborating with communities. Most discussion forums have a code of conduct and guidelines on best practices for participation. Some common ones that may be helpful are guidelines from StackOverflow, and Xarray, but most forums have some specific guidance. On forums, you increase trust by interacting with the community, so the more you interact, the more people are likely to respond! Often, best practices include making sure you are posting to the right area, using tags (when available), and including examples that document the question or issue you are having. If you review the post on the Pangeo Discourse Forum with a large number of reviews you can see that they clearly state the problem they are trying to solve, reference other posts on similar topics, link to a computational notebook that has an example of their code, and give an example of the code they are trying to do.

Writing a good blog post

Blogs are long-form articles that aren’t peer-reviewed. Blogs can be a great way to share your scientific process and findings before they are published, but also after they are published to provide another more accessible presentation of the material. For example, maybe you write a scientific article on your research that is highly technical, but then break it down in more accessible language in a blog post. Many scientists use blog posts to develop and test ideas and approaches because they are more interactive. There are science blogs all over the internet. Some popular ones are Medium, Science Bites, and Scientific American. One good way to get started is to find a blog post that you liked or found inspirational and use that as a guide for writing your own post.

Making a good video

Start small! Record a short video where you show how to do something that you struggled with or a new skill or tool that you learned how to use and post it to YouTube or other popular video platforms. Great videos often explain science concepts, ideas, or experiments to a target audience. Videos can inspire others to work in science, so talk about how you got into science, and show some of your research. There are a lot of online resources to help you out here as well!

Writing a social media post

Social media is also a good place to ask questions as you are just starting on a research topic and also as a place to share all types of results. Providing a link to a video, blog post, or computational notebook and/or sharing an image of a scientific result is a great way to start interactions. You can draw attention to your post by using hashtags and tagging other collaborators. There are a lot of online guides for how to write social media posts and it is always good to look at what others in your area are doing. Responding to comments and engaging with others can help you improve your research and learn about new tools or methods.

All these different ways of sharing information will help make your published report or article better. And as you start working more in the open with others, think about how collaborations will work and how you will give credit. All resources can be centralized through reports and documentation on a repository or website so anyone, including the ‘future you’ can find them in the future.

More ways to communicate your work can be found in a guide for communication in The Turing Way.

Maintaining Ethical Standards#

Open science, as learned in the Ethos of Open Science, should maintain the highest ethical standards. This can be enabled through the involvement of diverse contributors in the development of scientific outcomes. Participatory approaches allow multiple perspectives and expertise to be integrated into research from the start and ensure that peer review happens for all outputs in an iterative manner, not just for the articles at the end.

In making and planning to share open results, you can apply the “as open as possible, as closed as necessary” principle. This means protecting sensitive information, managing data protection practices where necessary, and not carelessly sharing sensitive data or people’s private information that can be misused. Online repositories, such as GitHub and GitLab, allow online interaction in addition to serving the technical purpose of version control and content hosting. For example, you can use issues and a project board to communicate what is happening in a project at any given point. The use of Pull Requests signals an invitation for peer review on the new development of code or other content. Thanks to a number of reusable templates, you don’t have to set up repositories from scratch. For example, you can directly use a template for reproducible research projects.

Role of Contributors in Open Science#

Collaboration is central to all scientific research. The positive impact of collaboration is achieved when diverse contributors are supported to combine a range of skills, perspectives and resources together to work towards a shared goal. Projects that apply open and reproducible approaches make it easier for diverse contributors to be involved and get recognized for their contributions while supporting the development of solutions that they can all benefit from.

Involving and recognizing the roles of all contributors in making open results is an important part of open science, which we will discuss next.

EHTC Case Study: Recognizing All Contributors#

A map of the EHT. Stations active in 2017 and 2018 are shown with connecting lines and labeled in yellow, sites in commission are labeled in green, and legacy sites are labeled in red. From Paper II (Figure 1). IOPscience. https://iopscience.iop.org/journal/2041-8205/page/Focus_on_EHT

The Event Horizon Telescope (EHT) team involved 200 members from 59 institutes in 20 countries, from undergraduates to senior members of the field. They used an array that included eight radio telescopes at six geographic locations across the USA, Latin America, Europe, and the South Pole. All collaborators were located in different geographic locations, had access to different instruments, collected data generated from telescopes in different locations, and applied skills from across different teams to create groundbreaking results. Each contributor was acknowledged across different communication channels and given authorships in publications. EHTC also supports the “critical, independent analysis and interpretation” of their published results to facilitate transparency, rigor, and reproducibility (EHTC website).

Making Open Results Starts with Contributors!#

Making different research components and preparing to share them as open results involve a range of activities. Behind these activities are the contributors who engage in various responsibilities that include, but are not limited to:

Conceptualizing the idea
Designing the project
Serving as advisor or mentor
Conducting experiments as a student, researcher, or research assistant
Creating tools essential for carrying out the research
Providing data expertise
Developing software
Providing specialized expertise and support
Managing community and project requirements
Providing feedback on the results
Designing experiments and interpreting results
Manuscript writing and review
And more!

Too often, conversations about contribution and authorship take place toward the end of a project or when a scientific publication is drafted. However, as you learned in the previous sections, research outputs are generated throughout the lifetime of a research project. Therefore, it is important to build an agreement at the beginning of the project for how contributorship in the project will be managed.

Developing contribution guidelines and contributor agreements requires collaboratively defining what is considered contributions in your project, who among the current contributors will get authorship, who will get acknowledged as a contributor, what is the significance of the order in which authors are listed in a scientific publication, and who makes these decisions. Ensuring that all collaborators understand and agree to these guidelines before beginning the project is also important.

Contributors and Authorship#

First and foremost, you must ensure that anyone who has contributed to the research project has their contributions recognized. With that shared understanding, in this section, you will explore what those recognitions as contributors or authors in your research project might look like.

Let’s first define contributor and author roles. A contributor is anyone who has done any activity that made it possible for the research to happen and results to be created, published, or shared. An author of an open result is a contributor who has given a substantial contribution to the conception or design of the work or the acquisition, analysis, or interpretation of the data for the published work.

Are All Authors Contributors and Vice Versa?#

An author is a contributor who actively carries out one or several of the tasks listed above (National Institute of Health - NIH and ICMJE). All authors are contributors, but all contributors may not be authors, for example, someone serving as a mentor, trainer or infrastructure maintainer. Ideally, all contributors are given the opportunity to author research outputs.

Given the importance traditionally placed on authorship in scientific publication and the fuzziness of the definitions (that often contain relative terms such as “substantial” or “extensive”, leaving too much room for interpretation), it is not surprising that determining who among the contributors gets to be an author can lead to biased or unfair decisions, disputes between contributors, or at the very least leave someone resentful and feeling unappreciated.

There is no single approach for recognizing contributors as authors, but here is what you should consider:

Group power dynamics & equity (e.g., seniority, systems of oppression)

Consider this hypothetical scenario: You are a postdoctoral fellow and the leading author of a research project. A rotating student spends 4 months in the lab helping you set up and perfect the experimental protocol that you will then use to carry out the experiments needed to answer your research question. They may even help you collect some preliminary data, but then they leave and later decide to join another lab. Would you provide authorship for the student?

It would be unethical not to give authorship or credit to someone who has provided significant help and contributed to the success of a research, even when they are no longer involved. A fair path in this scenario could be to contact the previous contributor and involve them in writing a relevant section of the manuscript.

The type of contribution

he NIH guidelines for authorship outline what type of contribution does or does not warrant authorship. Each contribution is represented on a sliding scale and has no rigid cutoffs. Some contributions are given more weight than others. For example, for “design and interpretation of results”, nearly all types of “original ideas, planning, and input” result in authorship. Whereas simply supervising the 1st author usually does not result in authorship (unless they are also contributing to the paper, of course). This is just one example. You will need to think about what this looks like for your own work!

Clear communication about roles and responsibilities early in the project and guidelines for how credit will be determined can help mitigate some of these issues.

Diverse Role of Contributors#

It is important to set a reference for each research team/project about different kinds of responsibilities and opportunities available for different contributors and how each of them is acknowledged. CRediT Taxonomy represents roles typically played by contributors to research in creating scholarly output. Below, we provide a table with research roles that extend the CRediT taxonomy to include broader contributorship (Sharan, 2022). Using this as a starting point, open dialogue and discussion among team members can be facilitated to set a shared understanding and agreement about the diverse roles of contributors, including authorship of publications. The distinction between contribution types can help set clear expectations about responsibilities and how they can be recognized in a project.

Research Roles	Definition
Project Administration	Management and coordination responsibility for the research activity planning and execution
Funding Acquisition	Acquisition of the financial support for the project leading to the research and publications
Community Engagement	Connecting with project stakeholders, enabling collaboration, identifying resources, and managing contributors interactions
Equity, Diversity, Inclusion and Accessibility (EDIA)	Inclusive approaches to collaboration and research, involvement of diverse contributors, accessibility of resources, consideration of disability, neurodiversity and other considerations for equitable participation
Ethics Review	Verify whether the research project needs to undergo an ethics review process
Communications and Engagement	Communications about the project and engagements with the stakeholders beyond the project and institution
Engagement with Experts and Policymakers	Pre-publication review, external advisory board meetings, regular reporting, post-publication reporting, and reaching out to the relevant policymakers actively
Recognition and Credit	Assessing incentives, creating a fair value system, fair recognition of all contributors
Project Design	Technical planning, expert recommendations, supervision or guidance, developing project roadmaps and milestones, tooling and template development
Conceptualization	Ideas; formulation or evolution of overarching research goals and aims
Methodology	Development or design of methodology; creation of models
Software	Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components
Validation	Verification, whether as a part of the activity or separate, of the overall replication/ reproducibility of results/experiments and other research outputs - generalizable
Investigation	Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection
Resources	Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools
Data Curation	Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later reuse (including licensing)
Writing - Original Draft	Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation)
Writing - Review & Editing	Preparation, creation and/or presentation of the published work by those from the research group, specifically critical review, commentary or revision – including pre-or post publication stages
Visualization	Preparation, creation and/or presentation of the published work, specifically visualization/ data presentation
Supervision	Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team

How to Give Open Recognition#

To openly and fairly recognize all contributors, their names with the types of contributions they made should be listed in the project documentation. In manuscripts, it is a common practice to mention contributors’ roles under the ‘acknowledgment’ section, such as using CRedIT or a similar taxonomy as provided in the table above. All contributors should be encouraged to provide ORCIDs associated with their names to make them identifiable.

Contribution statements in documentation and manuscripts can specify who did what in the official results. This is great for transparency. It is also a great way to guard against unfair power dynamics. Details about contribution type show explicitly who works on which parts of results and make it easy to give fair authorship. For example: “Pierro Asara: review and editing (equal). Kerys Jones: Conceptualization (lead); writing – original draft (lead); formal analysis (lead); writing – review and editing (equal). Elisha Roberto: Software (lead); writing – review and editing (equal). Hebei Wang: Methodology (lead); writing – review and editing (equal). Jinnie Wu: Conceptualization (supporting); Writing – original draft (supporting); Writing – review and editing (equal).”

If a GitHub repository and website exist, a dedicated page should be created to list and recognize all contributors. If someone minorly contributed to the paper, code, or data, you could add them as an author or contributor to the GitHub and Zenodo releases, respectively. Engaged collaborators and contributors not already involved in making research outputs should be given the opportunity to contribute to open results such as through presentations, posters, talks, blogs, podcasts, data, software as well as articles.

A standalone contribution guideline should be created for each open project, even when that means reusing an existing draft that the research team has used in another project.

Note that this is different from “contributing” guidelines that describe “how” to contribute (for example, on code repositories). Contribution guidelines should describe contribution types and ways to acknowledge them, as discussed above.

Contribution guidelines are not set-in-stone, but rather:

Are discipline-dependent
Can be adapted to your unique situation

You can begin by reviewing guidelines by NIH and ICMJEs for authorship contributions.

Notice that many categories and criteria for authorship, such as represented in the NIH guidelines’ sliding scale, may be differently decided. For example, in some fields, providing financial resources for a research project always warrants authorship. In other fields, this is not the case.

Some projects may not follow traditional manuscripts as their outputs. For example, if software is a primary output from a project, there may be a need to define specific roles regarding code contributions. You can work with your research team to create a version of CRediT Taxonomy for your project, such as shared in an expanded version of the table above.

When different kinds of contributorship have been identified, clarify how different contributors will be involved and acknowledged. This may include recommended communication and collaboration processes for the team members, as well as recognition and credit for different kinds of contributions they make.

For additional tips on how to acknowledge different kinds of contributors to developing a resource, including authorship, check out Acknowledging Contributors The Turing Way.

If working with online repositories such as GitHub, an app like ‘all-contributors’ bot is a great way to automate capturing all kinds of contributions, from fixing bugs to organizing events to improving accessibility in the project. Neuromatch uses this bot to keep track of contributors to the courses!

More systematic work is being undertaken by hidden REF who constructed a broad set of categories that can be used for celebrating everyone who contributes to the research.

There are several infrastructure roles like community managers, data stewards, product managers, ethicists and science communicators, who are also being recognized as valued members in research projects with an intention to provide leadership paths for technical and subject matter experts, even when their contributions can’t always be assessed in tangible or traditional outputs [Mazumdar et al. 2015, Bennett et al., 2023].

The Declaration on Research Assessment (DORA) is also a good resource to understand what researchers, institutions, funders and publishers can do to improve the ways in which researchers and the outputs of scholarly research are evaluated.

Combining Open Results for Scientific Reporting and Publications#

Scientific publications have traditionally remained one of the most popular modes of reporting and publication. Over the last decade, it has become a standard practice to submit pre-peer-reviewed manuscripts on preprint servers (such as arXiv) to speed access to research before the peer-reviewed journal articles are published (discussed in Section 2). The publication system has also evolved massively. Journal articles are no longer about writing overviews and summaries of research but can be used to share articles on software, data, education materials, and more.

EHTC Case Study: Capturing Results on Activities Ranging From Collaboration to Observations, Image Generation to Interpretation#

The polarized image of the M87 black hole shadow, as observed on April 11 2017, by the EHT (left panel) and an image from the EHT Model Library with a MAD magnetic configuration (right pane), with a list of papers describing different sets of results.

Across several preprints and eight peer-reviewed letters, EHTC presented open results issued from different teams on instrumentation, observation, algorithm, software, modeling, and data management, providing the full scope of the project and the conclusions drawn to date.

Open results such as reports, publications, code, white papers, press releases, blog posts, videos, TED talks, and social media posts add to the comprehensive repertoire of open results supported by EHTC. Resources are centralized on the EHTC website, GitHub organization and YouTube channel among others to provide easy access to all open results.

It’s important to highlight that their efforts have led to independent reanalysis and regeneration of black hole images. Specifically, Patel et al. (2022) not only reproduced the original finding but also contributed additional documentation, code, and a computational environment as an open-source containerized software package to ensure future testing. Some of the original authors reviewed this work and made their comments also available online (Authorea).

How Do I Connect Open Results to Make Reproducible Publications#

If not considered from the start, it can become challenging to ensure result reproducibility at the publication stage. Assuming that you have maintained open results considering their reproducibility, you can start assembling them to connect with the final reporting and publication with appropriate references to previous studies.

Before writing your manuscript, assess each output to make sure that the appropriate license is attached for reuse, documentation has been provided, and contributors are clearly listed. You can decide to create a version of the record and point to a permanent identifier, such as via Zenodo so that the link never breaks when sharing them on a public repository (such as GitLab/GitHub) or manuscripts with a visible list of contributors.
Your publications can be created individually (such as in an EHTC case study) or by combining several outputs or pieces of information in manuscripts. These will include resource requirements, dependencies, software, data, repository where code is shared with documentation, and contributor information, among other research artifacts.
The manuscript itself will describe research questions, methods as well as individual figures and tables explaining the results. When writing a manuscript, you can begin with figures by packaging data, code, and parameters used, ensuring that the information represented can be reproduced. You can find a detailed checklist in the publication by Gil et al. (2016).

As demonstrated in the EHTC case study, a final step towards making open results could be to create a meta article and/or simple website/git page that centralizes all your research outputs. Different parts of research (individual open results) can be accessed centrally with details, including open recognition for all contributors.

If you are looking for concrete actions you can take to make open results, pick one of these four items:

Improve how you define contributorship in your project and how authorship is assigned.
Ensure the data or software in your paper is uploaded to Zenodo with license and documentation, including metadata, and that the DOI is posted to your scientific report and publication.
Ensure that the process you use to collect data and perform its analysis, including all the dependencies and methods used in your data analysis pipeline, are clearly described to allow others to reproduce your results.
Create a centralized repository or a simple git page to centralize all research outputs with a contributors list.

Key Takeaways#

The steps that we highlight to make open results are not intractable. In fact, the steps we have highlighted are things we can do on a regular basis to ensure that all research artifacts can be shared later as open and reproducible results. In this section, we learned:

Approaches for making open results.
The importance of collaboration in making results.
How to recognize and credit all of the contributors who make results.
How to combine different open results to create scientific reports and reproducible outputs.

Section 4: Sharing Open Results#

In the previous section, you learned about how to make reproducible results. Now, we can finally think about how to best share those results. In this section, we will place emphasis on publishing manuscripts as open access. You will learn what subtleties to consider when determining what journal to publish in, including how to make sense of a journal’s policies on self-archiving. Finally, we discuss some commonly held concerns about sharing open-access publications and how to overcome them. Ultimately, we want to ensure that you have confidence in your decision to publish as open access.

When to Share#

Sharing and talking about your research as you are doing it, as well as engaging with other scientists, will increase the robustness of your work. Ask questions. Share what you are working on. You will find that many involved in the scientific community want to help. The more you engage, the larger the audience and the more impact you will have when that ‘final’ publication is published.

In the past few decades, scientists have made new connections and sought collaborators through letters and at conferences. However, this way of doing science tended to restrict who could participate. Today, most of these discussions take place on the internet, which has enabled new avenues for participatory science, open to all.

The platforms where you share research depend on what you want to share. How will this influence who you have the ability to engage with?

Let’s start with sharing in smaller groups (workshops and conferences) and move to larger audiences. There are distinct reasons for communicating results to different sizes of groups, as explored in the following sections.

At Workshops and Conferences#

Many of us attend scientific conferences, workshops, and other gatherings to discuss our science with peers. The costs associated with attendance and travel to these events may limit who has access to the material presented there. At these events, scientists often give talks or present posters that are not yet peer-reviewed to invite feedback from the community and potentially recruit collaborators. These interactions are important for improving research projects and are often done when a project is still ongoing so that researchers can gather feedback early in their scientific process.

It is important to think about what audience you will be reaching at an event. Conferences have different policies about open access to materials presented at an event. Consider what you are sharing and who you want to share it with. For example, not all events provide long-term open access to workshop materials after the event. If you want to reach a larger audience or preserve the materials long-term, as a scientist, you have options to license and publish presented materials yourself (for example, using Zenodo with a DOI) if an event doesn’t do so.

Other Forms of Interactive Feedback#

Other forms of sharing can serve a similar purpose to share and document your results and/or software packages, and also allow for additional flexibility and openness! There are a number of additional resources that you can use

Blog posts and online articles
Short-form videos and podcasts
Computational notebooks
Social media posts
Forum discussions

These different pathways allow for the dissemination of null results, intermediate science updates and/or software improvements. These alternative ways of sharing your work can benefit your research by facilitating extended dialogue between you and collaborators, and even the general public. Additionally, the public has easier access to these forms than they do to conferences.

Here are some specific examples of engagement across contemporary platforms for scientific collaboration:

Blog posts such as the Pangeo blog - see examples of how to use different software tools for different science questions!
Computational notebooks as a way to demo software techniques (e.g. the Project Pythia Cookbook Gallery showcasing computational science workflows in the Earth sciences).
Non-peer reviewed publications, such as Research Notes of the AAS.
Team and/or Mission Science Pages, such as the LUVOIR team’s page or the Juno mission’s page.
Conference proceedings, such as from the Society of Photo-Optical Instrumentation Engineers.
Social media posts: https://twitter.com/MartianColonist/status/1706824699349488036

Publishing Reproducible Reports and Publications#

An open-access report and paper can be reproducible when its data, software, and content are made available to the readers following best practices. There is a growing list of resources documenting how to make open results reproducible (such as The Turing Way and FORRT).

There are several examples (discussed in these lessons) that demonstrate how we can integrate technical and collaborative solutions to enable reproducibility. For example, executable notebooks allow interactivity and testing, training workshops invite feedback for improvement and GitHub/GitLab enable community based open review.

Scholarly Journals

Publishing work in a peer-reviewed journal forms the traditionally written basis of how we share our science and is important for communicating scientific detail and rigor to colleagues. Academic journals also act as a long-term archive of scientific research papers. For many scientists, publishing in peer-reviewed journals and receiving citations are key factors in how they are evaluated for career advancement, position appointments, committee memberships, and honors.

Traditionally, authors pay an Article Processing Charge (APC) that can range from $200-$12000 USD. Higher-profile journals often charge higher fees to authors. Accessing articles has traditionally been restricted by paywalls that require a subscription or charge per article. Journals have different options for making your published work accessible to various communities.

Who Has Access to Journal Subscriptions?

Paywalls limit who can access scientific research. This barrier acts to limit who can participate in science and erodes public trust in results. Part of open science is ensuring worldwide access to research.

Open Access Journals

Open-access journals are peer-reviewed journals that are more accessible because they don’t require readers to have a subscription or pay to access the content. However, open-access journals often require additional fees for the author. Open-access peer-reviewed articles are archived by a more formal discussion of scientific ideas, interpretations, and conclusions. They form the basis of how researchers share results.

Activity 2: Read the Open Access Policies of Publishers That You Use#

Estimated time for activity: 10 minutes. It is an individual activity.

In this activity, you will learn how to access information about a journal’s data archive policies. The Directory of Open Access Journals (DOAJ) provides an extensive index of open access journals around the globe. The DOAJ can be used to look up information, including data archiving policies, for journals that publish research. Let’s open up this website and look up the policies specific to your most-used journals.

First, navigate to the DOAJ website.
Type in the name of one of the following journals in the search box, and then click on the yellow “SEARCH” button.

Atmospheric and Oceanic Science Letters
Frontiers in Computational Neuroscience
Brain Communications

Note: You may input any journal desired, but for this exercise, use one of those listed to see the Sherpa/Romeo link that is listed in Step 5.

The search results may show more than one match. Select the desired journal within the search results by clicking on the journal name.

A dashboard appears, giving information regarding publication fees, waiver policies, the type of open license used, and other information on multiple displayed titles.
Click on the “archiving policy” link appearing in one of the displayed boxes as seen here. This will provide links to extensive information regarding the journal’s open access policies for the manuscript itself:

An extensive amount of information will be presented, including details on the publishing policies specific to the selected journal.

Alternatively, to get a more condensed view of the journal’s policies, return to the DOAJ dashboard on the About page with the multiple boxes displayed and click on the “Sherpa/Romeo” link as shown here.

On the Sherpa Romeo page, click on the journal name that is displayed in the list (the only journal displayed).

When you view the page, you see that it consolidates and summarizes the open-access policies for that journal and associated materials. The published version is likely to be the most relevant (see red box in figure).

Review the page and determine which license the journal you selected has defined for reusability for manuscripts.

N.B.: Currently, DOAJ hasn’t adopted their links to new version of Sherpa Romeo serive, which is now Jisc’s open policy finder. Thus, steps 6 to 8 are outdated, but will work in the future. For now, after being redirected to Jisc’s website, simply put the journal name in search field once more.

This is an example of a site that you can use to determine if a journal’s policy is consistent with how you wish to publish your open access results. Journal policies should always be reviewed and considered during the early planning phase of your project and well before submitting your manuscript for publication.

How to Share#

Perhaps the single most important step to make your results open is to assign them a globally unique and persistent identifier. This will give you a single code, URL, or number that you can use to uniquely refer to a research object. Any derived research object can use this identifier to link to it and create a traceable and rich history of use and development. Crucially, this identifier can be used by others to cite and credit your work (source).

The identifier must also be persistent. This guarantees that the identifier points to the same research object for a long period of time. What counts as “persistent” is, of course, a matter of degree since even the most stable identifier probably won’t survive the Sun engulfing the Earth in a few billion years. In this context, “persistent” implies that it is registered in a database managed by an organization or system that is committed to maintaining it as stable and backward compatible for the foreseeable future.

For example, URLs (for example, a personal website, GitHub repository, or cloud storage) are notoriously not persistent since they can change their contents frequently or become invalid without maintenance. On the other hand, Journal publications have a Digital Object Identifier (DOI) whose persistence is guaranteed by the International DOI Foundation.

As well as uniquely identifying each research object, it is important to be able to uniquely identify and cite all the authors and contributors. For this, it is recommended to get the permanent digital ID of each of the authors and contributors. ORCID (Open Researcher and Contributor ID) is an online service where you can get a permanent digital identifier.

There are examples of globally unique and persistent identifiers:

Digital Object Identifier

The Digital Object Identifier is provided by the International DOI Foundation, which ensures that each ID is unique and ensures that a DOI link always links to the correct object. Example: 10.1371/journal.pone.0230416.

ISBN-13

This is an International Standard Book Number, which has to be purchased by publishers by the International ISBN Agency. Example: 978-0735619678.

The Internet Acrhive

The Internet Archive captures snapshots of websites, and their links are really stable. Even if not ideal, it’s a handy tool for creating identifiers of websites easily.

Routes for Open Access Publishing#

Pathways to Open Access Publishing Diagram.

Routes to publishing openly. The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 license. Original version on Zenodo. http://doi.org/10.5281/zenodo.5706310

The most common types of open-access publishing are Green, Gold, and Diamond.

Green Open Access Publishing

Green Open Access is the process of self-archiving. The self-archiving movement aims to provide tools and assistance to scholars to deposit and disseminate their refereed journal articles in open institutional or subject-based repositories. You may choose to self-archive your work to make it more discoverable and/or after you’ve published it in a subscription journal to ensure there is an open version of your paper.

The Registry of Open Access Repositories contains a list of repositories that are available for researchers to self-archive. At the beginning of 2019, there were more than 4000 repositories. It is important to find yourself-archive community!

Gold Open Access Publishing

In Gold Open Access Publishing, authors pay an Article Processing Charge (APC) to a journal so that they publish the final version of your article under an open access license, which is then permanently and freely available online for anyone. The author will retain the copyright of their article, usually via a Creative Commons license of their choice, which dictates what others can do with the article. A criticism around gold Open Access publishing is the cost.

APCs can generally be around 2000 USD or, in some cases, more, which can, therefore, be prohibitive for authors across the globe. Some publishers offer discounts or waivers to authors from countries classified by the World Bank as low-income economies or APCs may be covered by your funder as part of your grant.

Diamond Open Access Publishing

Diamond Open Access are publications where there is neither a cost for reading the article nor publishing an article. Diamond Open Access journals either have very low costs due to building on existing infrastructure and volunteer efforts or are supported directly by foundations or institutions. For authors, Diamond Open Access publications typically allow the author to retain copyright and the final version of their article as it is published under an open access license.

Pros and Cons of Preprints#

When publishing in a peer-reviewed journal, you can decide to share a pre-print. A preprint is a version of a paper prior to its publication in a journal*. This can be the author’s version of the accepted manuscript after peer review or a version prior to submission to a journal.

The accepted manuscript is the final, peer-reviewed version of the article that has been accepted for publication by a publisher. The accepted manuscript includes all changes made during the peer review process and contains the same content as the final published article, but it does not include the publisher’s copy editing, stylistic, or formatting edits that will appear in the final journal publication (i.e., the version of record).

Source

Many journals provide preprint services. If they don’t, there are many public preprint servers available. Often, the funding agency will have a preferred public preprint server.

Advantages to publishing work as a pre-print

Quickly disseminate findings to communities in a timely manner.
Many field-specific preprint servers (e.g. arxiv.org, biorxiv.org, essoar.org) are free to both upload and read.
Community feedback on your work as it's being done.

Potential disadvantages

Work may be shared with critical errors that may have been caught in peer review.
In some fields, there is a perception of lessened reliability or quality of research published as a preprint.
Some journals do not allow or accept articles if they have been submitted to a preprint server.

What to Consider When Making Preprints#

When deciding to preprint your work, you will need to check:

The copyright policy of the journal with which you aim to publish.
The version of the paper that can be deposited.
When the paper is allowed to be made publicly available.

Additional Reading#

Read the story about how Joanne Cohn’s email list for preprints led to Paul Ginsparg’s development of arXiv.

Other Considerations When Sharing#

Predatory Publishers#

Predatory Publishers are generally for-profit publishers that charge a publishing fee but provide few quality checks on the quality of the publication that would be expected from scholarly publications. They sometimes use the benefit of open access to entice authors to publish with them. If you are unsure if a publisher may be predatory, checking with your library staff is a good place to start.

There are many red flags in these requests for predatory publishers:

There is an urgency and request for an extremely quick turnaround. A very fast publication time might indicate a less rigorous peer-review process.
Written English in correspondence is often poor quality with many grammatical errors. (Though it’s important to remember that this alone does not indicate predatory behavior, as grammatical mistakes can be made for innocent reasons, such as being a non-native speaker.)
The journal subject is nonspecific.
The solicitation is inaccurate or generic.
The email is often unsolicited, even if they claim that they’re referring to a previous paper of yours. This might start with an inaccurate or generic solicitation such as “professor”.
They emphasize ISSN indexing and/or impact factors, although this particular journal doesn’t have one. Consider the Journal Citation Indicator (JCI) in addition to the Journal Impact Factor (JIF).
The publisher/journal sends multiple emails soliciting manuscripts, special issues, and editorial roles.
They have a high number of special issues, such that the majority of the papers published appear in special issues.
Their name resembles the name of a prestigious journal.
They have a high self-citation rate, such as over 20%.
They have a very high acceptance rate of submitted papers.
They send frequent requests to submit/serve as editors.

Below are some final thoughts on what or what not to consider when deciding where to publish. As with many considerations you will encounter in academia, sometimes deciding the best place to publish will be determined by word-of-mouth conversations with peers. Read more on NOAA’s guidance on predatory publishing.

Key Takeaways#

In this section, you learned:

When to share open results and the different ways in which they can be shared. This includes peer-reviewed publications, conference proceedings, blog posts, videos, notebooks, and social media.
How to share open results, including considerations around the license for the publication, routes for open access publications (Green, Gold, Diamond), and preprints as part of the publication process.
Considerations around sharing, including considerations around predatory publishers and common concerns around openly sharing of results.

Section 5: From Theory to Practice#

In the previous sections, we learned about various ways to share our science and what steps we should think about when sharing. In this section, we tie the concepts from previous sections together with some specific guidance for writing the Sharing Results section of an Open Science and Data Management Plan (OSDMP). We will also reflect on how our society and technology constantly evolve, as does the way we do science. A new technology with the potential to radically alter the way we do and share science is artificial intelligence (AI), particularly when it comes to language learning models. These AI tools are already changing how we interact with written text. In this section, we discuss some of the ways that AI is and will affect how we do and share our science.

Example Steps Toward More Open Results#

When results and research objects are published openly, anyone can reproduce the scientific result. For topics like climate change, the transparency of results helps reduce misinformation and increases public trust in results.

Here is a GitHub repository with an example of a result made available as open access. This visualization is not perfect but provides a snapshot of a work in progress that can be shared with the community for feedback and refinement. This could be further refined, or perhaps serve as the start of a new effort that will extend the initial results. The results are more accessible, inclusive, and reproducible by being published openly.

There are lots of ways that open science can extend the span or scope of projects. Here are some steps you can take to share your open results in a way that makes your work more usable, reproducible, and inclusive:

Add a Code of Conduct via the CODE_OF_CONDUCT file and link to other policies that apply to your work.
Add contributors and authorship guidelines via a CONTRIBUTING file.
Add your collaborators and team members’ names with their permission.
Add your proposal but remove any sensitive information.
Create a preliminary roadmap and what goals the project is trying to achieve.
Create a project management, code and data folders where you can upload appropriate information as your project develops.
Create a resource list that your project requires.
Provide links to training materials that your collaborators and contributors may benefit from.
Use issues and project boards to communicate what is happening in the project.
Use Pull Request to invite reviews to new development of code and content.
Add user manual and executable notebooks to allow code testing.
Create and share executable notebooks that document how data is processed and the result obtained.
Create tutorials or short form videos demonstrating how a step in your research workflow was accomplished.
Write a blog post about your experience wrestling with a particular research challenge and how you solved it.
Contribute to documentation to improve the open-source tools based on your own experience.
Connect your repository to Binder to allow online testing of your code and executable notebooks.
Link all the outputs that are generated outside this repository (like blog, video, forum post and podcast among others as discussed above).
Some advanced steps that should be applied as the project develops include continuous integration, containerization, Citation CFF file and the creation of a simple web page to link all information.

How Emerging Technology Like AI is Changing How We Do Science#

Throughout these modules, the internet has been identified as a fundamental disruptive technology that changed how almost all of science is accomplished. Scientists rarely go to libraries to read the latest journal articles. Data is no longer mailed around the world on tape drives. Software isn’t shared via floppy disks. The internet helped create the modern scientific workflow and made science more interactive and accessible. Now AI tools are starting to disrupt science in a similar manner. AI is not only revolutionizing many aspects of our lives, it is also changing how we do science. As companies race to create and integrate new generative AI tools into every aspect of our lives, many scientists, institutions, journal publishers, and agencies are looking to see how to use these tools effectively, understand their reliability, accuracy, biases, and how to also use these cutting edge tools ethically. An additional concern is how any information shared with AI tools may be used to intentionally or unintentionally disclose confidential data, leading to privacy concerns.

AI can help us use and share research. It can act as an accelerant, taking care of tedious tasks while leaving scientists free for more creative thought. These tools are better than humans at processing vast amounts of data, but humans are better at creative and nuanced thought. This is important to consider when determining whether or not to use AI. As an example, many people already use AI tools to help with their inbox management and writing emails with AI generated suggested content. Within science, there are many potential tasks that could potentially be expedited using AI, according to three studies published in Nature:

Using AI#

Literature reviews

The ever-increasing volume of scientific literature has made it challenging for researchers to stay abreast of recent articles and find relevant older ones. AI tools can be used to create personalized recommendations for relevant articles as well as create summaries of them in various formats. Some examples of these tools include SciSummary, SummarizeBot, Scholarcy, Paper Digest, Lynx AI, TLDR This.

Possible drawbacks when using these tools include:

Potential introduction of biases
Insufficient contextual understanding or interpretation
Possible inability to handle complex technical language
Incorrectly identifying key points

Searching for relevant datasets and software tools

AI tools can be used to discover different datasets that may be relevant to a scientific query and recommend relevant software libraries.

Language barriers

AI tools can be used to create automatic translations into different languages. Several of the tools above also offer translation.

Making with AI#

Code

AI tools can be used to generate code to perform analysis tasks and translate between programming languages. Some examples of these tools include Co-Pilot, Codex, ChatGPT, and AlphaCode.

Usage tip: Popular large language models can be used to generate code, but it has been noted by many that breaking down tasks and using careful prompts helps generate better results.

Results

AI tools can be used to generate text, summarize background materials, develop key points, develop images and figures, and draw conclusions. Using these tools may help non-native speakers communicate science in different languages more clearly. Additionally, they could be helpful in developing plain-language summaries, blog posts, and social media posts.

Some possible drawbacks when using these tools:

See the list above for a literature review.
Factual and commonsense reasoning mistakes because they do not (at this time) have the type of cognition or perception needed to understand language and its relationship to the external physical, biological, and social world (cite: https://www.tandfonline.com/doi/full/10.1080/08989621.2023.2168535).

Cautions About the Use of AI Tools#

Journals are increasingly implementing guidelines and requirements concerning the usage of AI tools during the writing process. Many require that the use of AI tools for writing, image creation, or other elements must be disclosed and their method of use identified. As is the case with all other material within an article, authors are fully responsible for ensuring that the content is correct. Examples of this policy can be read in the AI guidelines of Nature and NCBI.

Furthermore, there are numerous examples of generative AI (for both code and content) delivering plagiarized information in violation of licenses, as well as fabricating material, including citations. Using these AI tools may lead to findings of academic and research misconduct should fabrication, falsification, or plagiarism be contained within AI-generated materials. So BE CAREFUL. Learn more about possible issues with AI in a Nature example here.

At this time, and for these reasons, AI tools are generally not allowed in grant applications or in peer-review or proposal review activities.

The National Institutes of Health (NIH) has prohibited “scientific peer reviewers from using natural language processors, large language models, or other generative Artificial Intelligence (AI) technologies for analyzing and formulating peer review critiques for grant applications and R&D contract proposals.” Utilizing AI in the peer review process is a breach of confidentiality because these tools “have no guarantee of where data is being sent, saved, viewed or used in the future.” Using AI tools to help draft a critique or to assist with improving the grammar and syntax of a critique draft are both considered breaches of confidentiality. Read NIH’s AI policy here.

AI tools for science are developing rapidly. The science community’s understanding of how to ethically and safely use AI is just developing as its use in research expands rapidly. The guidelines above offer a snapshot in time and will likely continue to evolve. If you choose to use these tools for scientific research, carefully consider how much to rely on them and how their biases may impact results, as cautioned in this Nature article. The internet has transformed the world, and AI tools are likely to do the same. As with any tool, it is important they are used for the appropriate purpose and in an ethical manner.

Key Takeaways#

In this section, you learned:

How to include open results in the OSDMP.
An example of how results can be shared openly.
That developing AI tools are being used in all parts of the scientific workflow, they are changing rapidly, and there are still many open questions about how and when to use them.

Summary#

After completing this day, you should be able to:

Describe what constitutes an open result.
Explain what the reproducibility crisis is and how open science can help combat it.
Use a process to discover, assess, and cite open results for reuse.
List the responsibilities of the following participants that are creating open results: open results user, project leader, collaborator, contributor and author.
List the tasks for creating reproducible results and the items to include in a manuscript to ensure reproducible results.
Define a strategy for sharing your results, including selecting publishers, interpreting journal policies and licenses, and determining when to share your data or software with your manuscript.

Day 5: Open Results

Contents

Day 5: Open Results#

Tutorial Objectives#

Section 1: Introduction to Open Results#

What Research Objects are Created Throughout the Research Cycle?#

The Traditional Depiction of a “Scientific Result” Has Changed Over Time#

But Results Have Always Been Far More Than Just the Publication#

Examples of Open Results#

Case Study: Reaching New Audiences#

Case Study: New Media for Science Products#

Case Study: New Products for Increasing Impact#

Case Study: New Visualizations to Share Results#

Case Study: Reporting and Publication#

What is the Reproducibility Crisis?#

What is the Cause of This Reproducibility Crisis?#

Combating the Reproducibility Crisis#

What Could You Do?#

Key Takeaways#

Section 2: Using Open Results#

How to Discover Open Results#

How to Assess Open Results#

Attributes of Reputable Material#

Activity 1: Find Results For Your Research#

How to Use Open Results#

How to Contribute and Provide Constructive Feedback#

Your Responsibilities as an Open Results User#

Different Ways to Provide Feedback#

Getting Credit for Providing Feedback#

Open Results User Responsibilities#

Avoid Plagiarism When Using Open Results#

How to Cite Open Results#

Case Study: Giving Credit#

Key Takeaways#

Section 3: Making Open Results#

How to Make Open Results#

Capturing the Research Process Accurately in the Making of Results#

Case Study: Open Results from Distributed Multi-Team Event Horizon Telescope Collaboration (EHTC)#

Making Results and Crediting Contributors Fairly at Different Stages of Research#

Making All Types of Research Outputs#

Making Open and Reproducible Results#

How to Make Different Types of Open Results#

Maintaining Ethical Standards#

Role of Contributors in Open Science#

EHTC Case Study: Recognizing All Contributors#

Making Open Results Starts with Contributors!#

Contributors and Authorship#

Are All Authors Contributors and Vice Versa?#

Diverse Role of Contributors#

How to Give Open Recognition#

Combining Open Results for Scientific Reporting and Publications#

EHTC Case Study: Capturing Results on Activities Ranging From Collaboration to Observations, Image Generation to Interpretation#

How Do I Connect Open Results to Make Reproducible Publications#

Key Takeaways#

Section 4: Sharing Open Results#

When to Share#

At Workshops and Conferences#

Other Forms of Interactive Feedback#

Publishing Reproducible Reports and Publications#

Activity 2: Read the Open Access Policies of Publishers That You Use#

How to Share#

Routes for Open Access Publishing#

Pros and Cons of Preprints#

What to Consider When Making Preprints#

Additional Reading#

Other Considerations When Sharing#

Who is Sharing?#

Predatory Publishers#

Common Questions About Sharing Results#

Key Takeaways#

Section 5: From Theory to Practice#

Writing an OSDMP: What to Include in the OSDMP for Sharing Results Openly#

Activity 3: Pen to Paper#

Example Steps Toward More Open Results#

How Emerging Technology Like AI is Changing How We Do Science#

Using AI#

Making with AI#

Sharing with AI:#

Cautions About the Use of AI Tools#

Key Takeaways#