Day 1: The Ethos of Open Science#

By Neuromatch Academy & NASA

Content creators: NASA, Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan

Content reviewers: Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan, Shubhrojit Misra, Viviana Greco, Courtney Dean

Production editors: Hlib Solodzhuk, Konstantine Tsafatinos, Ella Batty, Spiros Chavlis


Tutorial Objectives#

Estimated timing of tutorial: 2 hours

An introduction to open science, which is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity. In this module, you will take a closer look at what open science is, including the current landscape as well as the benefits and challenges. You then get a glimpse into the practice of open science, including case studies and examples. Lastly, you are presented with actions that you can take starting today, such as exploring communities that they can engage with.


Section 1: What is Open Science?#

In this section, you take a closer look at what open science means, including the intended goals and outcomes of adopting open science as an individual and as part of a larger community. You then review examples of open science in action. Finally, you wrap up the section by taking a closer look at why adopting open science is needed.

Ethos of Open Science#

Let’s begin by explaining the word “ethos”.

“Ethos is the distinguishing character, sentiment, moral nature, or guiding beliefs of a person, group.”

Merriam-Webster

Note that “ethos” is not exactly “ethics”, but offers a broad enough term to include the moral attitudes held by the individuals or institutions who practice open science. To clarify the moral element to this discussion, we speak of “responsible open science” going forward. Throughout this tutorial, we have integrated ethics around open science that dictate how you share, give due credit, and work together. “Practice the Golden Rule” - treat others the way you would like to be treated in their situation.

Open Science at NASA#

NASA funds some of the most diverse research of any federal agency and has a history of sharing research and results going back to the Apollo Program in the 1960s. NASA’s Transform to Open Science program emphasizes sharing guidelines and best practices that apply to its diverse research efforts, cultivating a culture of openness. NASA’s commitment to open science enhances collaboration across various research domains, from astrobiology to physics, allowing broader access to important scientific information. NASA datasets include biology, chemistry, environmental science, geology, and other fields related to robotic and human planetary exploration, stellar evolution, and the search for extraterrestrial life.

NASA's logo

The open science practices and principles that play a critical role supporting NASA mission success are equally relevant to other government agencies and institutions. Similar considerations, approaches, and behaviors are needed in a variety of scientific contexts. Tools for open science frameworks and workflows follow generally similar models.

Case Study: Open Science in Action at NASA#

Open science practices and principles can be applied to all stages of the research process. One early example of NASA’s efforts to involve more people in science is the exoplanet citizen science projects, with the Exoplanet Explorers being a significant part of this effort.

Exoplanet Explorers program results

“Stargazing Live”, a live television program, took place across three consecutive nights in 2017. The hosts invited viewers to identify exoplanets in an open access dataset. Within 48 hours of the program’s debut, more than 10,000 people had participated in Exoplanet Explorers and classified over 2 million systems.

Following the first night of the program, the researchers watched the results roll in, as citizen scientists helped sift through the data. On the second night, enough people had participated that the researchers were able to share that 44 Jupiter-size candidate planets, 72 Neptune-size candidate planets, 53 sub-Neptune size candidate planets (larger than Earth but smaller than Neptune), and 44 Earth-size candidate planets had already been found and were undergoing additional analysis.

Communities, working together on a problem, can rapidly find new results! Open science enables this and more.

The Internet and Open Science#

Historically, factors like time, access to sufficient tools and data, and physical proximity limited who could be involved in science, as well as how easily collaboration could take place within the scientific community. More recently, digital resources like the Internet have increased participation by eliminating barriers to entry and presenting a platform for digital collaboration on a global scale. The internet offered people access to the appropriate infrastructure to conduct open science, while the practices of open science enabled more people to engage with research products. Unfortunately, challenges remain for people who don’t have the right computational tools and/or speak the relevant languages.

The Internet creates many outlets for public hosting and free access to research and data. These outlets combined with advances in computational power enable nearly anyone to perform complex data analysis. It is now possible to connect participants, stakeholders, and outputs of open science on the Internet to make scientific processes and products easier to discover and access.

Why Should We Do Open Science Now?#

Science and science communication increasingly face severe pushback from the public because of inadequacies in the reproducibility of results and the spread of misinformation, respectively, that foster mistrust. The practice of open science counteracts this by involving community feedback to validate results in a more robust manner and combats misinformation by making results available to the public.

Reproducibility Challenges#

Science becomes more robust and accurate when scientists validate their colleagues’ results. However, the rapidly-growing pool of published research presents an overwhelming challenge to reproduce:

  • In 2011, the AAAS, publisher of Science, began requiring the authors of computational research reports to share data and software upon request.

  • In 2018, a research study was carried out that investigated 204 articles for reproducibility and that were published in the journal Science after 2011. It was found that only 26% of papers were able to be reproduced, with the two primary reasons being the inability to get access to the data and software and the fact that the methods were not described in sufficient detail.

Case Study: Open Results Enable Iteration and Improve Error-Detection#

We will look at an example of how closed science can restrict research impact by following the outcome of a highly cited journal article to understand how science functions to inform a field’s state of research, the decisions of policymakers, and the actions of society.

The global cooling error timelime

A 1990 analysis of satellite data on climate temperature concluded that the upper troposphere experienced no warming, a finding that contradicted early climate models predictions. Policymakers concluded from this result that researchers don’t understand climate models enough to warrant changes in environmental policy. The processed data from this study were made open-access but, as was typical for the time, neither the original data nor the code used for processing and analyzing the data were shared by the original research team. Eight years after the article was published, other scientists noticed that the original authors didn’t account for several important effects. This oversight introduced errors into the dataset and falsely produced artificial cooling to the temperature measurements. It took another five years and additional funding to reproduce the code and conduct a new analysis. Thirteen years after the original paper, it was confirmed that the upper troposphere was warming and agreed with climate model predictions.

Note: Learn about the layers of Earth’s atmosphere here.

The inability for the scientific community to access an article’s original data and code slows the pace of discovery, thirteen years in this case, and forces other research teams to repeat the work (code) instead of moving on to new projects. This isn’t the pace that we want to advance science, with one step forward and two steps back to iterate and resolve problems.

The intentions of the original research group were not to conceal or prevent access to their data and methods; the community norms at that time simply did not include the sharing of data and software openly. This is, in part, because it allows researchers to keep a competitive advantage when seeking funding opportunities. In this case, the research group simply followed this common practice. This culture of closed science needs to be changed because the practice of withholding code (or data or other research artifacts) can stifle scientific progress. In the climate change example, a flawed study could have swiftly been corrected by open peer feedback but it instead undermined the credibility of climate scientists. The cost to progress on climate change research and the prevented benefit to society was enormous. It is imperative to shift the entire science ecosystem, policies, and rewards toward the prioritization of openness if the full and immediate benefits of research are to be realized.

Limitations of Scientific Publishing#

Historically, scientific publishers have charged subscription fees to access journals and, often, article processing charges (APCs) to cover the costs of preparing a manuscript for press (even when the peer reviewers were volunteering their time). These practices limit both who could read papers and who could publish results.

Open access publishing has significantly increased the number of articles that are available as electronic copies online. A growing number of governments and funding agencies are starting to mandate that research funded by taxpayers must be accessible to the public after publication. However, the current hybrid system still does a poor job of allocating costs fairly across the research publication process (more on this in Day 5).

The issue of who has access to published papers also motivates open science. For example, even though more climate research is made available as open access than that from other scientific fields, the majority of climate research articles, including many important ones, remain behind paywalls. Climate misinformation is freely available to anyone online but scientific climate results are mostly hidden from the public behind paywalls. This practice does not increase trust in science.

Chart depicting percentage of open access publications

What is Open Science?#

What is open science exactly? To illustrate, first, we’ll present a definition of open science that was developed by the U.S. federal government.

“Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.”

The White House Office of Science and Technology Policy Memo, 2022 (adapted)

Let’s break down the definition a bit more:

  • Research products and processes should be available to all, not just a small subset of experts, particularly if funded with public funds.

  • Research products and processes should be ‘respecting diverse cultures’ – fostering an open dialogue between researchers, indigenous people, and local communities. This also means that research must respect the diversity of laws and customs in different countries and/or as they apply to different kinds of research.

  • While open science is our aim, security, and privacy remain important concerns. Therefore, select sensitive information should be protected.

  • Of the stated principles, “Fostering collaborations, reproducibility, and equity”, the first two are research standards, while the latter refers to the inclusion of people who might otherwise get left out.

Open science is a culture intended to promote science and its social impact.

How Do You Do Open Science?#

The Ethos of Open Science is a broad term that encompasses the moral and ethical attitudes held by individuals and institutions about practicing ‘open’ science. There is an ethical element to sharing both new knowledge and the processes used to obtain said knowledge. It is important to note that there is no one be-all way of practicing or conducting open science.

Diverse practices, assumptions, and goals are just part of the complexity of open science. There are also divergent moral principles that guide open science communities. Such principles are captured in “codes of conduct”. A code of conduct is a community governance mechanism that outlines the principles and practices expected of a given research community’s members, as well as the process for investigating and reprimanding those in violation of the code.

In a sense, a code of conduct constitutes the moral backbone of a research community. However, as with the numerous schools of thought, there are similarly many codes of conduct. In other words, there is no one set of universal principles that all open science practitioners abide by. For example, consider how OLS, INOSC, allea, AGU and Ethical Source all have different codes of conducts and guiding principles.

This great diversity responds to the growing proliferation of open science initiatives and the great use we can make of open science approaches to knowledge.

Fostering Collaboration, Reproducibility, and Equity#

The IDEA of Open Science: Inclusive, Diverse, Equitable, and Accessible.

Open source gives credit

Openly using, making, and sharing research analyses, software, or datasets gives everyone credit for their work.

Sharing is grounded in the belief that access to information and the ability to collaborate is essential for advancing scientific understanding and solving complex problems.

Open sharing enables greater transparency in the scientific process and facilitates reproducibility; it enables collaboration and inclusion of more diverse perspectives and expertise; and it makes scientific knowledge more accessible to the public.

Not only does open sharing help society, but it also can benefit each of us as individual researchers. It can lead to greater visibility, impact, and credit of your results, data, and software; it can provide access to new collaborations and ideas, and it can fulfill ethical and social responsibilities.

Case Study: Radar Data and Climate Change#

Have you ever seen weather forecast images for your location? That data comes from NEXRAD radar stations, many of which have been operating for over 30 years. The data has always been made publicly available, but can be difficult to use. It was mostly used for rain information, so stations didn’t see a need to make it readily accessible after 24 hours. Users who wanted the historical data from NEXRAD had to work through the following arduous steps:

  • Go to a website.

  • Make a request (but not one too large).

  • Wait for a robot to read the data off tape storage and copy it online.

  • Receive an email with instructions on where to download a user’s data.

  • Download the data.

The massive size of the dataset, more than 250TB, made it essentially impossible to do large-scale analysis. Nobody had the time to make these requests and download the data bit by bit.

However, in 2015, all NEXRAD data were moved to and made freely available in the cloud. Usage of the dataset increased almost immediately!

Researchers started using the NEXRAD data for other types of science. For example, they used NEXRAD radar readings of birds to monitor flight patterns. In particular, purple martins! Purple martins form huge roosts of up to 50,000 birds that can be tracked using radar. The purple martins perform stunning aerial performances that can now be tracked with the same technology previously reserved for rain measurements.

In another example of new NEXRAD uses, a NASA-led study linked variability in bird migration to large-scale climate patterns that originate thousands of miles away. The better land managers understand current migration patterns and foresee behavioral changes in these birds due to climate change, the better they can direct their conservation and habitat restoration efforts. The newly- accessible radar data provides valuable insight needed to achieve their goals. This study was funded by NASA, uses NOAA NEXRAD data, and made fully available for the first time by the AWS Public data program.

Who Does Open Science?#

As briefly discussed in previous lessons, open science doesn’t only involve researchers; many other stakeholders are affected by the outcomes of open science. Stakeholders include any individuals who can affect or be affected by open science projects.

Open science stakeholders

Scientific research should benefit humanity. Although open science has many stakeholders, the advantageous interaction between science and society takes place among three core groups: scientific researchers, policymakers, and the public. Researchers do science and share their results with policymakers and the general public to inform their decisions and improve their lives. The public helps to fund research through taxes and can provide input to future areas of study. Policymakers help to implement measures that are informed by scientific results to improve the health, environment, and livability of society.

These three stakeholder groups remain central to the world of open science. However, the inclusive nature of open science demands participation from the broader public. Growth in public participation in science can occur by removing barriers to those historically excluded and by expanding the community of people who support scientific research itself.

Here, we list some core groups who we envision as taking part in and/or benefitting from open science while being fully aware that this list is not exhaustive and the categories we choose here have very blurred boundaries.

Researchers#

Researchers are often thought of as the ones who do open science to benefit others. However, researchers themselves can also greatly benefit from open science. Their work can achieve higher visibility among colleagues and the public, they receive credit for a full range of activities related to their science (including time spent sharing data and code, for instance), and they have more access to datasets.

A team of supporters and collaborators enables this research to take place. Open science aims to include these supporting members of the scientific process and ensure they receive credit for their contribution to improving science.

Policymakers#

Policymakers represent another key community in the science environment. Policymakers can reference scientific findings to inform their decisions for the betterment of society. Those who help in the understanding and dissemination of these policies (including educators and science journalists) are crucial to this process. Policymakers can also play important roles in ensuring and facilitating open science by setting data management processes, encouraging open access legislation, and developing ethical guidelines for experiments. Policymakers can benefit from open science by gaining better access to scientific output via the open sharing of research results.

General public#

The public plays a crucial role in science today as consumers of scientific results who make decisions based on, and adhere to policies shaped by, scientific results. Open science can make scientific results, data, and workflows more accessible to the public by strengthening routes of access to trustworthy sources of information, which in turn increases trust in science. The public can also take part in open science through community science projects, for example as volunteers to collect or manage data. As a result, participants boost their understanding of science and feel empowered through opportunities to exert influence.

Open science can strengthen the connection between all of these groups. Communication between researchers and both the public and policymakers stands to drastically improve with more transparent and accessible scientific knowledge.

Activity 1: Think About the What and How of Open Science#

Estimated time for activity: 10 minutes. It is a group activity.

In this activity, reflect on your answers to the questions and discuss them in a group.

  • What does the act of open science look like? Does a scientist use or create something specific that would characterize their research as open? What comes to your mind?

  • Describe how you currently share your materials (data, code, results)?

  • How might you share materials in the future more openly?

  • What stands in the way?

Key Takeaways#

In this section, you learned:

  • The motivation for open science as well as its goals and outcomes.

  • Why we should be doing open science now and how technology has made it more achievable.

  • The definition of open science.

  • Different groups that do open science.


Section 2: Why is Open Science Important?#

In this section, you will learn how adopting open science benefits you as a researcher and society. You will also learn about some of the challenges and hurdles with using open science principles and how to navigate them.

“We need more WE science rather than ME science.”

Harlan Krumholz, Yale School of Medicine at 2022 CZI Meeting

Benefits of open science

Figure: There are many benefits of open science. CC-BY Danny Kingsley & Sarah Brown.

Benefits to You#

You are Your Best Future Collaborator!#

Doing open science not only lets other people understand and reproduce your results but lets you do so as well! Implementing open science principles such as good documentation and version control helps you, potential collaborators, and anyone else to understand your results.

If your work is shared publicly, you will never lose access, even if you move institutes or change jobs. Many researchers move around institutions and organizations. By having your data, software, and results in repositories, you will always have access to them.

Implementing best practices for open science in your work not only helps you document but also strengthens your funding proposals. Funding agencies have begun to realize that openly sharing research products can increase their citations received and uptake, resulting in a better return on investment.

Well-documented research products also demonstrate the quality of your work, which helps with public communication efforts and can also attract better collaborators. Reliability and a strong work ethic motivate others to want to work with you.

Give and Get Credit When Using Results of Others#

The Turing Way project illustration by Scriberia.

The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 license. DOI: 10.5281/zenodo.3332807.

In addition to documenting your own research, the practice of giving credit to everyone who has contributed will strengthen your scientific community’s reputation and actualize the shared values of open science. As people gain confidence in the benefits of cooperative research, they will also start giving credit to more contributions that might previously have gone unacknowledged. Different work performed as part of a paper can be given in an author contribution statement, often required by journals, like the example below, taken from this paper.

Example of acknowledgement section

Additionally, it is important to also include an acknowledgment statement to give credit to those who have shared resources, equipment, or knowledge, without which the final product or paper may not have been possible.

More Visibility and Impact#

In addition to improved scientific accuracy, adhering to open science practices potentially offers personal career benefits to researchers.

  • Openly published research has significantly more visibility and impact potential with large audiences across the internet, which can lead to more citations, like-minded collaborators, and career/funding opportunities, according to a 2016 study.

  • Publishing open access increases citation count by 18%, according to a 2018 study.

  • Articles that make their data openly accessible via a direct link to a repository see ~25% higher citation impact, according to a 2020 study.

Publishing as open access may have prohibitive costs for some researchers, depending on the venue. There are often other options that allow authors to share their work freely and openly. In Module 5 on Open Results, we discuss some of these other options, including preprints and diamond open access.

There are many different research outputs that can be openly shared and made citable:

  • Code

  • Data

  • Research talk slides

  • Lectures

  • Blog posts

  • And more!

All of these are tangible, scientific outputs! Much of our time as researchers is spent writing code, collecting data, putting together lectures, and not *just* publications. Publicly sharing materials makes receiving a citation more likely.

More Collaborations#

Open science practices can also enable stronger collaborations, both within and between disciplines, as evidenced by a 2016 study. The ease of access to open data brings new agents to the landscape that allow for broader and more diverse participation. Through open science practices, such as pre-registration, where researchers document their research plan at the start of a study, one allows for a stronger research design because feedback from various collaborators and stakeholders can be solicited before data collection begins. Similarly, preprints allow for swifter feedback on conclusions drawn from data once it is collected.

Benefits to Science#

Transparent Science is Reproducible Science#

When computers are used to produce scientific research, the code is considered a “method”. Much like a lab research setting, a set of instructions for working with cells or agar plates can be considered a method. Peer-reviewed methods are an essential step in the scientific process. When these steps are not shared, no one else can reproduce the work or build upon it for future scientific endeavors. Open methods allow people to judge whether or not the methods are trustworthy. In Section 1, the story of the Global Cooling Error presented a poignant example of science that was not reproducible because of a lack of data transparency.

Open Science Can Improve Accuracy#

A study from 2022 found that researchers who practice transparency and promote verifiability benefit from readers and stakeholders who judge whether results presented are accurate and, according to a related study, that the results are not produced by questionable research practices that lead to misleading or unreliable results.

Open science also allows others to scrutinize the analytic decisions of researchers, such as whether the analysis was planned before or after observing the data, according to a 2018 study.

This allows others to check if they can arrive at the same conclusion as the original research team and facilitates stronger public trust and support, according to a 2021 UNESCO report.

Case Study: Allen Brain Observatory. When Open Science Leads to More Discoveries#

Allen Institute logo

Since its founding, the Allen Institute has made open data one of its core principles. Specifically, it has become known for generating and sharing survey datasets within the field of neuroscience, taking inspiration from domains such as astronomy, where such surveys are common. These survey datasets are (1) collected in a highly standardized manner with stringent quality controls, (2) create a volume of data that is much larger than typical individual studies within their particular disciplines, and (3) are collected without a specific hypothesis to facilitate a diverse range of use cases.

The Allen Brain Observatory consists of a set of standardized instruments and protocols designed to carry out surveys of cellular-scale neurophysiology in awake brains. Its initial focus was on neuronal activity in the mouse visual cortex.

One of the use cases of Allen Brain Observatorty dataset in research community is generating novel discoveries about brain function:

  • Sweeney and Clopath, 2020 used Allen Brain Observatory two-photon imaging data to explore the stability of neural responses over time. The authors found that, indeed, population coupling is correlated with the change in orientation and direction tuning of neurons over the course of a single experiment, an unexpected result linking population activity with individual neural responses.

  • Bakhtiari et al., 2021 examined whether a deep artificial neural network (ANN) could model both the ventral and dorsal pathways of the visual system in a single network with a single cost function. Comparing the representations of these networks with the neural responses in the two-photon imaging dataset, they found that the single pathway produced ventral-like representations but failed to capture the representational similarity of the dorsal areas.

  • Fritsche et al., 2022 analyzed the time course of stimulus-specific adaptation in 2365 neurons in the Neuropixels dataset and discovered that a single presentation of a drifting or static grating in a specific orientation leads to a reduction in the response to the same visual stimulus up to eight trials (22 s) in the future. This stimulus-specific, long-term adaptation persists despite intervening stimuli, and is seen in all six visual cortical areas, but not in visual thalamic areas (LGN and LP), which returned to baseline after one or two trials. This is a remarkable example of a discovery that was not envisioned when designing the survey, but for which the stimulus set was well suited.

Information on the case study is taken from 2023 article.

Quality and Diversity of Scholarly Communications#

Furthermore, open science improves the state of scientific literature. Scientific journals have traditionally faced the severe issue of publication bias, where journal articles overwhelmingly feature novel and positive results, according to a 2018 study. This results in a state where scientific results in certain disciplines published may have a number of exaggerated effects, or even be “false positives” (wrongly claiming that an effect exists), making it difficult to evaluate the trustworthiness of published results, according to a 2011 and 2016 study. Open science practices, such as registered reports, mitigate publication bias and improve the trustworthiness of the scientific literature. Registered reports are journal publication formats that peer-review and accept articles before data collection is undertaken, eliminating the pressure to distort results, according to a 2022 study. Other open science practices, such as pre-registration, also allows a partial look into projects that for various reasons (such as lack of funding, logistical issues or shifts in organizational priorities) have not been completed or disseminated, according to a 2023 study, giving these projects a publicly available output that can help inform about the current state research.

Benefits to Society#

Collaboration, innovation, education, technology advancement, and science-based public policy are all improved by the open availability of research products. Sharing all research products (e.g., data, code, results) makes the scientific process more transparent, which may help increase public trust in science. Also, open science encourages IDEA (Inclusion, Diversity, Equity, Accessibility) and increases the involvement of citizen-scientists and non-experts in the research process. The inclusion of diverse perspectives from an open community invites unique perspectives that contribute to a more robust and often more accurate scientific outcome.

Scientists study issues that affect every aspect of life. Yet, public interest in science remains low due to a lack of trust, understanding, and sociocultural factors. How can scientists expect the public to trust science about complex and often contentious issues, whether it is vaccine development or landing on the moon if they don’t allow the public to see the process and results? Building trust in science is essential to a well-informed society. Open science provides a pathway to do this.

The public who funds government research through taxes should be entitled to its results and data, as long as safety and security are not an issue. Science should be more open to ensure its insights benefit the public who enables it.

Open science introduces more scrutiny into research that helps ensure accuracy and encourages efficiency through open discourse. This approach accelerates the pace of discovery and, subsequently, the dissemination of results to the public and policymakers.

Case Study: Open Science Can Accelerate the Pace of Science#

Open science practices accelerate the pace of scientific discovery by involving ideas and labor from the broader community. The rapid response to the Covid-19 Pandemic showed Open Science in action to accelerate discovery.

Researchers uploaded the initial genome sequence of SARS-CoV-2 into an open-access database in January 2020, creating a data-sharing precedent and metadata that would later enable insights about new COVID-19 variants. The NIH developed a dedicated platform for sharing research tools for COVID-19 and encouraged investigators to expedite reporting to ClinicalTrials.gov ahead of requirements. Open-science publishing agreements that support evidence dissemination have complemented these practices and policies. One day after the World Health Organization declared COVID-19 a public health emergency, more than 50 academic publishers issued a joint statement committing to open-access policies for COVID-19 research. Support for preprint servers has promoted awareness of research successes and failures, and journals have helped accelerate the distribution of actionable information, including by means of dedicated COVID-19 web pages, endorsement of preprints, and an emphasis on sharing data with public health authorities.

Open Science is Efficient Science#

Open science reciprocates the benefits it provides to researchers in the communities that scientists hope to serve. Data from one observation or science experiment can have unanticipated uses. In Section 1, we discussed an example where the use of radar data for tracking the effect of climate change was used to track bird migration.

Through open science practices, research waste can be avoided, such as unintentional and costly repetition of previous studies, according to a 2020 European Commission report. In the human sciences, this also reduces participant fatigue in the long term. By maximizing what is learned from publicly available data, one does not need to test repeatedly, especially on already vulnerable communities. By “giving away” science, individuals, communities, and organizations can more easily adopt research results to inform interventions for their own needs without the knowledge being gatekept by the original researchers and organizations involved. In this way, open science can strengthen the social and economic impacts of scientific results.

Open Science Attracts a Diverse Set of Participants#

The open sharing of scientific products and processes makes science accessible to everyone. This allows full participation from everyone, and also maximizes the number of people who can benefit from the work.

The best ways to include a diverse group of open science practitioners and stakeholders are to remove existing barriers and design for inclusion. Beyond this, it is important to learn how to communicate effectively with diverse collaborators and people at different skill levels, career levels, backgrounds, and areas of expertise. The ability to build diverse teams is a skill that everyone can learn. For example, NASA has its own commitment to diversity and inclusion.

Diversity of participants in scientific collaboration.

Image credit: Andy Brunning/Compound Interest. CC BY-NC-ND 4.0 DEED

Key Takeaways#

The following are the key takeaways from this section:

  • Citing the work of other scientists whose work you build upon or reuse supports the community-minded open science practice of using, making, and sharing.

  • Doing science openly can boost the visibility of research and lead to more meaningful collaborations.

  • Science quality and efficiency are improved when open science best practices are followed.

  • Open science helps society by allowing more people to participate in science, which increases the accuracy and impact of results.


Section 3: How to do Open Science#

The ability to discern when and how to share information in an appropriate manner is an essential skill of open science. Practitioners of open science must balance their pursuit to maximize openness while respecting diverse cultures, maintaining security and privacy, and following institutional policies and practices.

This section introduces important security and privacy considerations for scientists when sharing information. Next, the section discusses how sharing information may impact different communities. Following this, the section explains the topic of intellectual property, how it can be protected, and the different types of licenses available to facilitate sharing while ensuring the owner of the information receives credit for their work. Lastly, this covers the effect of rules and regulations set by an organization, grant, or publisher on a scientist’s options to make their research open access.

Maintaining Security and Protecting Privacy#

Previous sections have showcased a broad range of open science success stories, but we recognize that there are still plenty of valid concerns and unexplored challenges to implementing open science. Open science demands the valuable but complex practices of respecting diverse cultures, maintaining security, and protecting privacy. This lesson presents a strategic approach to making decisions about doing open science in common scenarios. For those scenarios that we cannot foresee, this lesson offers mitigation strategies to help overcome unique challenges with mindful preparation and community support.

A Country’s Military Secrets or Violates National Interests#

When the release of data or research can lead to national security concerns, there are added restrictions around sharing this information. In the U.S., sharing of this type of information often falls under International Traffic in Arms Regulations (ITAR) and Export Administration Regulations (EAR) export control regulations. Sharing ITAR/EAR-regulated data, equipment, resources, or research without clearance to do so can put the country’s national security at risk and may bring about both severe criminal and administrative penalties.

Human Patient Privacy#

NASA has collected human spaceflight biomedical data since the start of Apollo but the only human data in the Life Sciences Data Archive are from astronauts who signed releases for their data to be public.

In the U.S., health data is protected under the Health Insurance Portability and Accountability Act of 1996 (US-HIPAA) and it is not allowed to be shared without expressed written consent by the patient. As such, health information about astronauts is something NASA protects carefully, working to balance the publicity of the job with regulations and best practices for medical privacy while also enabling peer-reviewed biomedical research.

See this example and more at NASA’s Open Science Data Repository.

Respecting Diverse Cultures#

Open Science advocates for making research widely available, while also recognizing that there are many reasons why some information should not be released, and that these decisions need to involve the people who provided input and/or could be harmed by the consequences of release.

Indigenous, Cultural, and Conservation Concerns#

When considering the impacts of data sharing, it is important to recognize if those affected are equally represented in the discussion. For example, historically excluded communities, the environment, and wildlife are too often not considered when deciding to make research open access.

For example, while genomic research often relies on individual-based consent, it is often used to make decisions that impact indigenous communities without their consent.

Another example of how data can inadvertently impact vulnerable communities is the use of LiDAR by archaeologists to study remote areas. This type of data has the potential to reveal unprotected vulnerable indigenous sites in need of protection.

CARE Principles#

The CARE Principles of Indigenous Data Sovereignty are people- and purpose-oriented, and were originally set up to use data in a way that advances data governance and self-determination among Indigenous Peoples. CARE principles can be applied by involving communities or local stakeholders and should be covered at the start of a research project.

Environmental Justice#

When sharing your results, are you sharing them with the groups that are most impacted in ways that are accessible to them? When studying the impact or effect on a specific community, it is important to include that community in the design of your work and ensure that the results of the work are accessible - both freely available and understandable – to the communities involved.

Environmental justice is the fair treatment and meaningful involvement of all people, regardless of race, color, national origin, or income, with respect to the development, implementation, and enforcement of environmental laws, regulations, and policies. Read more about how NASA Earth data is being made more accessible to those communities most affected.

Protecting Endangered Species#

Humans aren’t the only group that can be negatively impacted by data sharing. Rare and endangered species can also be impacted. For example, the sharing of breeding sites for declining wildlife populations can further exacerbate the population decline. For this reason, rare animals may have their breeding sites kept secret.

Intellectual Property#

Intellectual property is the recognition of rights associated with the content created by human intellect. There are several different types of intellectual property and how they are recognized varies by country, type, and timescales.

It’s important to understand who has the rights to the content you create. It can depend on a number of different factors. Work that you create may belong to your employer, may be in the public domain, may depend on the license of underlying work, may belong to the publisher of your work, or may be your own intellectual property. Ownership may affect how your work can be shared.

Most Common Types of Intellectual Property Protection#

Copyright

A copyright protects original works of authorship. This could be artistic or literary works and also applies to software. In general, and if applicable, copyright is automatically applied at the moment of creation, with no further registration needed.

Most open licenses depend on copyright. The person(s) who owns the copyright has the right to apply for a license.

Example: An image in a scientific journal or something from the web. Generally speaking, using copyrighted images for teaching and education is considered fair use. However, if that includes posting images to a website, that could be considered a publication and, therefore, copyright infringement.

Trademark

A trademark can be applied to any content, including words, phrases, symbols, designs, or a combination of these things that identifies your product. Trademarks, in general, are not relevant for scientific purposes.

Patents

A patent is an exclusive right granted for an invention, which is a product or a process that provides, in general, a new way of doing something, or offers a new technical solution to a problem. Patents are another way to make your work open while protecting your intellectual property.

Many organizations have groups that will support the development and commercialization of inventions. NASA’s Tech Transfer office is an example of one of these, making much of NASA’s inventions available for licensing as part of the NASA Patent Portfolio.

Public Domain

In some cases, intellectual property is not protected at all. Public domain is when a creative work has no intellectual property rights associated with it. Some types of intellectual property expire after a certain time scale. Some types of work, such as those created by civil servants in the United States, are not covered by copyright and can appear immediately in the public domain. For others, the creator donates the work to the public domain, or intellectual property rights are not applicable.

Why Should You Care About Intellectual Property Policies?#

Why should I, as a scientist, care about this? Well, consider what happens to the ownership of your research if you move institutions:

  • Can you take your paper drafts, presentations, and copies of publications with you?

  • Can you take your data?

  • Can you take your software?

Worrying about intellectual property and copyright can seem like an unnecessary detail early on. However, anticipating changes to your situation by ensuring permanent ownership of your work in the planning phase of your research can help you avoid legal and institutional issues later on.

If you submit your manuscript to a publisher that requires that they own the copyright of the work, will you be able to access that paper when you change jobs and no longer have a subscription to that work? Are you able to meet the mandates of your funding agency to openly share your work? Can you reuse the figures that you made in derivative works? Will others be able to access your work? While these may seem like questions you shouldn’t have to worry about, they can become very difficult to deal with after the fact.

Licensing#

Licensing is a way to help to allow others to reuse your work legally. It is a way to specify under what conditions, if any, others can use, build upon, or distribute your work. It is also a method to ensure that your work is appropriately credited. It is generally illegal and may be a form of academic misconduct to reuse content without a license, even if the content can be found on the internet. This law protects content creators, just as it protects your work from being used by others without clear permission. Thankfully, it’s easy to allow others to reuse your work.By applying a license to your work, you make clear what others can do with the things you’re sharing and also establish the conditions under which you’re providing them (such as citing you).

If you don’t license your work, others can’t/shouldn’t re-use it – even if you want them to. Licenses can be applied to data, code, reports, publications, and almost any other “creative” output. There are also several different types of licenses and also the case where no license need to apply:

Permissive Licenses

Permissive Licenses allow users a wide range of rights, including the ability to use, modify, and distribute the work with no restrictions or very few. Examples of permissive licenses would be open source software licenses such as Apache 2.0 or MIT licenses or the Creative Commons licenses such as Creative Commons Attribution (CC-BY).

Protective Licenses

Protective Licenses are a legal technique of granting certain freedoms over copies of copyrighted works while including some limitations. This may include copyleft licenses, commercial licenses, or other restrictions.

Public Domain

Public Domain is not a license, but it is an indication that there are no reuse restrictions on the work. Creative Common Zero is a worldwide public domain mark that indicates that the material is free to use without any restrictions.

More details about licensing for each of these types of products can be found in later days, including different types of licenses, when to apply for a license, and tools for applying for licenses. Creative Commons and the Open Source Initiative are two resources with more information on open licenses.

Case Study: Neuromatch Academy Licensing#

All Academy material (tutorial code, tutorial videos, lecture power point slides, etc) is published under CC-BY license. This means the content creators are giving others permission to reuse the content. This also means that the Academy can’t publish content that it doesn’t own or that isn’t already under a CC BY license. It’s a decision Neuromatch, Inc. took to pursue the aim of facilitating inclusive, collaborative, and global participation in the computational sciences through education by making the learning content accessible and reproducible.

CC (‘Creative Commons‘) licenses are one of several public copyright licenses that enable the free distribution of content. A CC license is used when an author wants to give other people the right to share, use, and build upon a work that the author has created. There are lots of different CC licenses. CC BY is the most generous as it “lets others distribute, remix, adapt, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials.”

To check the copyright license of the figures that you would like to use, you can visit the ‘Rights and permissions’ section in the header of the article from which they originated. Below there is a list of distinct options and corresponding icons:

Different license options.

For example, it may look like the following in the context of a particular material:

Copyright example.

Academy can only use a figure or other content if it’s under a CC BY license. Because Neuromatch, Inc. is an educational nonprofit and the primary use of images is for education and not for commercial use, it can use content under any CC BY license.

Tip from Neuromatch team: A rule of thumb is that any image that you yourself did not produce is likely to be subject to someone else’s copyright. Figures from most journals are not under a CC BY license. Figures from many places on the internet (except Wikipedia) are usually subject to copyright. If the content you are using is not under CC BY, the best course of action is to use a different figure with a CC BY version from bioRxiv or with a similar figure published in an open-source journal or a new figure that you create yourself.

Policies and Practices around Open Science#

Preparing to Use and Make Controlled Research#

It is important to plan for the release of your data and results from the very beginning of your research project. Investigate and obtain all permits, approvals, and/or certifications needed to ensure you can share your research products.

Remember: Reputable journals and repositories will reject submissions if compliance can’t be documented!

Materials - sharing and commercial product agreement

  • Can be permissive or restrictive.

  • Many versions are available.

Human or animal subject institutional review boards

  • Check experiment-specific requirements early.

  • Be sure to comply with all aspects of ongoing review.

Collecting permits

  • Don’t assume collection is allowed just because a sampling location seems unmanaged.

  • Engage and consult with local communities to ensure their concerns are addressed.

Sharing Controlled Research#

As we’ve previously shown, different kinds of intellectual property are released using different formal structures. It is important to understand these structures and to check with specialist communities when preparing your research plan. Methods for sharing results may follow different standards of practice or may require a special data format for distribution or submission to common repositories.

Creative commons vs. open source vs. public domain licenses

  • Can be permissive or restrictive.

  • Many versions are available.

Repositories

  • General and discipline-specific options.

  • Check submission requirements early.

  • Often have user communities willing to help.

Guiding elements for selecting between options

  • Choose ‘supported’ versions with active and friendly communities.

  • Take precautions to reduce security risk.

What are the rules for science? Before sharing, check you have the right to do so:

  1. What does your supervisor or Principal Investigator say?

  2. What does your grant/contract say?

  3. What does your organization say?

  4. What does your funding agency say?

  5. If you are planning to publish, what does the publisher say?

Remember, sometimes what they say may conflict, for example:

  • If your grant / funder says outputs should be open, usually your institute will permit you to share items even if they are normally more restrictive.

  • Different types of outputs may have different types of restrictions. (e.g. software or hardware might have one expectation, whilst data might have others).

Universities and other institutions may have OSPOs (Open Source Policy Office) or commercialization offices. Most institutes will have intellectual property counsel to help answer questions. Librarians are another good resource to consult when looking for advice on sharing.

Early is Better#

It is important to think about what policies may affect your research outputs as early as possible so that when you want to share information, you have either already obtained approvals or know where to go to get approvals to share. This ensures that you don’t inadvertently share (or fail to share) something that could affect your career, negatively impact others, or pose legal issues.

Remember: You can’t unshare something that is already shared! Equally, if your research requires ethical approval or consent to share, this may be harder to gain after you’ve done your study.

Reusing Science Ethically - Give Credit!#

It’s always important to properly source any content you use and remember to only share properly licensed content. Even if a license does not require attribution, providing credit helps increase reproducibility by providing the provenance of your work. This is the norm in scientific communities.

Remember when reusing science:

  • Open science is a partnership, and giving credit is critical to make it work.

  • Consider citing all resources used: datasets, software, infrastructure, etc.

  • Hopefully, others will reciprocate when reusing your work. (Scientific ethics dictate they should).

Key Takeaways#

In this section, you learned:

  • Situations when it may be inappropriate or harmful to share your data or research. These include maintaining security, protecting privacy, and respecting diverse communities.

  • What intellectual property is, who owns it, and how it is protected through licenses.

  • Various organizations within science (e.g. universities, publications, funding agencies, etc.) may have their own individual sharing policies that are best to consider at the beginning of a research project to avoid any potential pitfalls along the way.


Section 4: When Not to Be Open#

In this section, you will consider potential barriers to adopting open science practices. Barriers can come in the form of personal fears as a result of misaligned social challenges or institutional/infrastructure barriers. We begin with an exercise to identify your own concerns or fears about adopting open science. This leads to a discussion about common barriers and mitigation strategies.

Activity 2: Self-Reflection on Open Science Concerns#

Estimated time for activity: 10 minutes. It is a group activity.

In this activity, reflect on the given topic and discuss your thoughts in a group.

Take a moment to think about what fears or concerns you have about adopting open science. These could be concerns you have experienced in your work or fears you have about being more open moving forward. There are no wrong answers here – this is a time for you to reflect on what might be keeping you back from doing open science.

Some Fears Around Adopting Open Science Practices#

Now that you’ve reflected on some of your concerns or fears around open science, below we have listed a few common fears of doing open science and some potential mitigation strategies. Even if you personally don’t have this fear, it can be useful to think about the different concerns that others may have to better understand and even help others address them.

Mistakes: What if my work is wrong or inelegant?

It can be intimidating to share your research materials publicly because someone might find a mistake or inefficiency. But isn’t it better for science if we can quickly find and fix mistakes or improve quality? Peer review is a core pillar of the scientific method and is a mechanism for others to help find and correct mistakes and make improvements. To make this work, we will need to be more open to finding and fixing mistakes or inefficiencies. It’s true that in many science communities, a mistake is considered a failure, or a certain style may be considered lackluster. However, open science policies aim to change the perception of mistakes from that of failure to a step in the discovery process that can be aided by open community feedback.

Scooping: What if someone re-uses my work and gets the credit?

Yes, this can happen. Depositing your work early and making it citable are ways to establish your work. This serves as evidence of when you started working on it and makes it easier for others to cite you. Details of how to do this are provided in the following days. In many fields, if it is clear that someone is actively working on a problem, the decision to scoop that work may have a short term gain but long-term loss. In science, reputations are very important and being collaborative generally leads to increased career successes. Read more about scooping here.

Misinterpretation of my work.

This can happen regardless of the form or openness of your work - many publications have ended up being misinterpreted. Openness does help to provide further context of the work. Documentation of your research plan and software management practices allow others to understand your work fully, and thus help reduce the risk that others will misinterpret your work. For example, if you share code, you can include a description of what the code does, along with brief usage instructions and examples. In Day 4, we will discuss proper data and code documentation that can help reduce misinterpretation.

My work will be used, but not cited.

Science ethics dictates that you should be cited if your work is used. Part of open science is valuing all steps of the scientific workflow, and encouraging researchers to cite code, data, or other non-published articles. Make it easy for others to cite you by adding a digital object identifier (DOI - discussed later in the course) to your research product. Remember to cite others’ materials, so you’re not adding to the problem.

Data is too sensitive to share.

Following appropriate anonymization or using controlled access can address this concern.

I don’t want to maintain or update my work.

Sharing what you did allows others to reproduce, replicate, and build upon your work. That doesn’t mean you have to maintain it for the rest of your life, or even at all. If you don’t plan to maintain your code, it is still recommended that you share the code publicly and archive it. By adding appropriate licensing, documentation, and contributing guidelines, you can make it clear how long you plan to keep your materials maintained (if at all). In fact - others might help maintain it for you!

My work won’t be useful to anyone else.

You never know how materials might be used. Individuals who contributed to all different types of software projects ended up helping NASA land a rover on Mars!

Partially drawn from Malvika Sharan’s “Ten Lessons Against Open Science You Can Win”.

Some of the fears listed above are not unique to open science and can occur in closed scientific systems. For example, scooping and reusing without citation are both examples of scientific misconduct that can happen in closed science scenarios. Open science practices can provide more avenues for recourse, such as making a preprint available or giving your data or code a DOI and license. Having more of your work shared in citable ways gives you more power to prove when misconduct has occurred.

Another example of a fear that occurs in both open and closed spaces is the commitment to maintaining your work beyond publication. Maintenance is a consideration regardless of whether your work was shared - you need to decide how long to store your data and code for yourself in order to reproduce your work, should any questions arise even after publication (we cover sharing and archiving data and code in later days, Open Data and Open Code.) By sharing your research materials, you may actually increase the longevity and impact of what you’ve done if others find your materials useful and help maintain and build on top of them.

We recognize that this is not an exhaustive list of concerns and fears toward adopting open science. This list is developed to provide guidance and instill confidence in researchers who intend to do their work more openly moving forward.

Misaligned Incentives#

In this section, we discuss barriers that block participation in open science that stem from misaligned incentive structures. These all relate to scientific incentives for individuals and organizations and are not aligned with open values.

We distinguish between concerns and fears, those associated with changing the culture of how we do science, from the structural barriers that block researchers’ abilities to adopt open science practices. We recognize that there is overlap in these categories, but this framing might be useful for understanding what we have control of as individuals and where we need to encourage more structural changes to our scientific ecosystem.

Incentives can come in many forms, but most in science involve proposal funding and career advancement. In both of these cases, metrics are used for measuring scientific success (e.g., publication and citation count, as discussed earlier in this course). These current metrics do not capture the entire impact of activities that scientists spend their time doing. Below, we present a few examples of misaligned incentives. While there aren’t perfect answers for overcoming these yet, agencies like NASA and initiatives like DORA and COARA are actively working to update these metrics that define what success means in science, and it will take community action to ensure that open and inclusive practices get the merit they deserve.

Challenge: Overvaluing Novelty

Nobel prize.

Awards (for example, prizes or funding) are often given to those who make a big, new scientific discovery or who create a new, exciting tool. This practice overlooks the community that wrote code, curated datasets, maintained fundamental existing tools, and many other important steps that enabled these novelties.

Prizes often disincentivize crediting a team, since only one or a small group can be awarded a prize (for example, a Nobel Prize can be awarded to up to 3 people only). This emphasis on novelty and the individual is starting to change, with awards being offered to groups (e.g., The White House Office of Science & Technology Policy Open Science Recognition Challenge) and addition of funding solicitations offered for maintaining tools and infrastructure. However, it will take time for these changes to become the norm.

Challenge: It Takes More Time to be Open

Doing open science often requires more time and effort from researchers to start and maintain. For instance, it can take significantly more time to document and clean code to a degree that the public can easily understand and use it. At the moment, the scientific system doesn’t always reward extra effort like this, which can make it difficult for individuals to spend their time on open activities because it takes time away from starting their next paper. After all, published papers are the main currency of the current scientific system.

Updated metrics of success can help to incentivize individuals to do their work openly. The science community is currently in a transition phase where new metrics are being developed, but the old metrics still dominate in many fields and organizations. It’s important for researchers to recognize that they might not be able to achieve complete openness until the system and culture shift.

Social Barriers#

Meaningful collaborations across diverse communities can require additional time and effort to coordinate across groups and to address conflicts. While interacting with the community can be one of the most fulfilling things about Open Science, it might also be a source of disagreements about the direction of the project or how it should be used. That’s where licenses and codes of conduct come into play. Clear rules for community- and colleague- interactions and use of resources provide a framework to make decisions in a fair and agreed-upon manner. This can all take additional time, especially at the beginning of a research project, but can save time and headaches down the road.

Strategies for Communicating Across Differences#

These are ways you can encourage openness in your discussions around research. For in-person sessions, it’s good to encourage discussion of these strategies:

  • Presume that everyone you work with is doing the best they can at the time.

  • Attempt collaboration before conflict.

  • Listen carefully and actively.

  • Encourage other people to listen as much as they speak.

  • Practice empathy and humility.

  • Ask questions that seek to understand your colleagues’ context.

  • Participate in an authentic and active way that supports the health and longevity of your community.

  • Exercise consideration and respect in your speech and actions.

  • Treat other people’s identities and cultures with respect: e.g., make an effort to say people’s names correctly and refer to them by their chosen pronouns.

  • Be mindful of your surroundings and of your fellow participants, and take action if you notice a dangerous situation or someone in distress.

Institutional and Infrastructure Barriers#

Institutional Barriers: Institutions Often Move Slowly#

Institutional barriers to the researcher or practitioner present an additional challenge to adopting open science practices. Researchers interested in adopting open science practices might lack support from their department or project supervisors. The budget, resources, or time in a project cycle might be insufficient to practice open science. Institutions might not recognize open science practices in recruiting, training, or promoting in the organization. Even if organizations show interest in moving toward open science, they can move slowly when setting up new systems of support.

In these situations, there isn’t always an obvious mitigation strategy. While we encourage individuals to practice open science, there may be aspects that just aren’t feasible at this point in time without spending a lot of extra time and effort, time that may not be recognized or supported by your institution. It’s best to work within the bounds of the system you are in, and while the entire scientific community is in a transition phase to being more open, it may be that it doesn’t make sense to be open in every way until the institutional barriers are lowered. That said, the more individuals that push for openness, the - more it will become part of the scientific mindset, and the more likely our organizations are to recognize and support our efforts.

Tools & Infrastructure#

Do the right tools and infrastructure exist to support my work?

There are many tools and resources for making our code, data, and results more open, but the required infrastructure is still being built, and may not be in place yet to support open science in each discipline. This is where community input can be helpful. Perhaps there is a community already working on implementing the infrastructure you need. If not, you can start discussions at conferences or on open online forums to help organize the creation of the tools and infrastructure you and your community need to effectively do open science.

How can I get around institute-specific infrastructure when trying to collaborate with people outside my organization?

Some of the infrastructure (like our computing platforms) is institute-specific, which can be a barrier to collaboration outside of the organization. However, by planning for open collaboration from the start, you can minimize these barriers. For example, you can use freely available tools like GitHub and Google Docs for communication and coordination, even if the computing facilities are institute-specific.

Open Science is Worth the Effort!

While there are many challenges to the adoption of open science, we believe that its benefits and its ethical imperative to the self and to scientific communities, citizens, and policy-makers outweigh the cost of barriers. In addition, the recognition of barriers and areas for caution provides a first step toward resolving them.

Key Takeaways#

The following are the key takeaways from this section:

  • There are valid concerns and fears around making our science more open, but there are often specific open science practices that can help to mitigate these fears.

  • The misalignment of incentives creates real-world challenges that act as barriers to adopting open science practices. There are ways that individuals can minimize or work with these barriers, as well as organizations and groups that are actively working to update the incentive structure.

  • Working openly and collaboratively has its challenges, but there are some strategies for communicating across differences.

  • There are also institutional and infrastructure barriers to adopting open practices, but by using general tools and infrastructure, we can minimize some of these challenges.


Section 5: Planning for Open Science: From Theory to Practice#

This day is nearly over, but there’s so much more information available about open science – so our last section is for everyone who wants to learn more. In this section, you review ways to start your journey with open science, including a list of resources that you can use now.

Planning for Open Science#

Questions to ask when planning for open science.

It is important to think about, discuss, and plan for desired outcomes and processes when you begin your research. Learn about where the best repositories are for your materials; discuss credit and authorship for each separate open science output, and start using open science tools to organize your work. Reach out to repositories in your discipline and institution (usually library) for help. Including this information in your plans will make you more likely to receive funding.

Planning for outputs in advance includes:

  • Speaking about it and organizing with your research team;

  • Deciding which tools to use;

  • Thinking about authorship and credit;

  • Engaging with relevant stakeholders and research partners, for example, industry, around open science;

  • Identifying repositories for software and data;

  • Identifying journals (or other outlets) for publications;

  • Highlighting these approaches in your grant and much more.

In reality, there is an exploratory stage where sharing one’s product may not be part of the plan. During active research and data exploration, data, code, and ideas may be created and deleted even daily. It may not be efficient to spend time making these fully open (e.g., creating DOIs, documentation) because you are just exploring! Still, one may choose to make their code public through this process (it should be in some version control repository anyway; there is no harm in making it public). Part of this planning is beginning to think about what would be valuable to science and figuring out how you might share it.

It is important to discuss open science with your research team, lab, group, or partners regularly. Much of responsible open science may seem to be related to outputs – such as data, software, and publications – but preparing and organizing work for these in advance is critical. It is much more difficult to follow leading practices for these at the end of research, in the ‘afterthought’ mode. Open science is both a mindset and culture that starts when you begin a project.

Open Science and Data Management Plans#

Federal agencies and funders consider data management crucial for open science because it ensures that research data is well-organized, accessible, and preserved. In recent years, many have included a requirement as part of proposals or projects plans for an Open Science and Data Management Plan (OSDMP). The OSDMP includes a description of the resources to be used, the products that will be created, how they will be shared, and who will be responsible. These plans can include the data, software, publications, and project governance.

Open science and data management plans are essential because they enhance the credibility and reproducibility of research by ensuring that data is well-documented, organized, and preserved over time. Effective OSDMPs can have the following benefits:

Transparency

Not only builds trust in scientific findings but also allows other researchers to validate and build upon them, fostering a culture of openness and cooperation.

Effective

Data management can lead to more efficient and cost-effective research processes. By reducing the time spent searching for and organizing data, researchers can dedicate more time to analysis and interpretation, potentially accelerating the pace of discovery and innovation.

Reproducibility

A key tenet of the scientific method is reproducibility, and a well-developed OSDMP helps ensure that others are able to validate your results.

Preservation

The research produced by federal funding represents a significant investment, and it is important that research is saved for future generations to access and understand.

Inclusive

OSDMPs can include research tools and processes that can significantly improve research outcomes through collaboration and consultation.

You will learn more about OSDMPs in Day 2.

An Open Strategy#

In today’s world, many foundations and agencies that award research grants increasingly expect proposals to include an open science strategy. By including an open science strategy document in your scientific plan, you ensure accessibility and openness in each step of your workflow. Conclude your comprehensive plan with clearly defined steps to make research outputs easily accessible and openly available. The steps identified in your strategy should be integrated into your everyday scientific processes and practices.

Requirements

Every major research foundation and federal government agency now requires scientists to file a data management plan (DMP) along with their proposed scientific research plan. Some ask for additional details on software/code and publications.

Include Entire Data Workflow Details in the Plan

Describe your management workflow for data and related research. Other elements, such as code or a publication, have their own lifecycle and workflow which needs to be in the plan.

Include Open Terminology and Concepts

Plans that are successful typically include clear terminology about how information is made findable, accessible, interoperable, and reusable. This can include licenses, repositories, formats, and governance of the project.

Preservation

Research materials are valuable and reusable long after the project’s financial support ends. Reuse can extend beyond our own lifetimes. Therefore, researchers must arrange steps for preservation and accessibility to ensure work is not lost after a research interaction ends.

Designing for Openness#

Open Science Applies to the Entire Workflow#

Open Science Workflow Phases.

Open Science Workflow Phases Source: Opensciency

Regardless of your science discipline or the methodology that you use, the workflow remains relatively the same. It has a planning phase, an implementation phase, and a release phase. Within these phases, there are milestones that vary depending on the workflow you follow. For the purpose of our discussion in this section and the other days in the curriculum, we have adopted the scientific workflow with general milestones described in the Opensciency curriculum. The details in your workflow may vary, but the overall concepts are the same. What is relevant here is that when adopting open science, it permeates all phases of the workflow. You prepare for it in the planning phase but then continue to integrate the principles of it throughout the implementation and release phases.

Products created throughout the scientific process are needed to enable others to reproduce the findings. Researchers who wish to make their results reproducible must make key elements of their study openly available for others to test.

Open Science Workflow Products.

Open Science Workflow Phases Source: Opensciency

Continuing through the workflow, this updated diagram now shows the types of scientific products that are created at each milestone. The specialized products that you create may vary or be completely different, but the focus on discovery for the public remains the same. Any type of products you create can be modified to support the principles and concepts of open science.

Use, Make, Share#

Here, we introduce the “Use, Make, Share” framework that can start to gradually increase your adoption of open science depending on the nature and scope of your project. Throughout the course, we will explore how this framework can be used to make your science more open!

What Resources Will You Use?#

There are already many open science resources for you to use! Open science already has a long history. For example, the act that created NASA mandated sharing of its discoveries with all of humanity and NASA has been sharing its data openly on the internet since the 1980s. Now, there are already over 100 Petabytes of openly available NASA data for you to search, download, and use and examples of these services are provided in Day 3. Technology and practices have been developed around code that make it easy to collaborate on building complex solutions, and examples are given in Day 4. A range of services make it easy to share and discover open access publications and these are discussed in Day 5.

What Outputs Will You Make?#

Throughout the research process, there will be different products and results produced. These can range from data sets, samples, code, reports, manuscripts, conference proceedings, blog posts, and videos. Each of these have different considerations about how to make them including how they can be made in open and collaborative ways.

There are also different ways to run a scientific project. Is your project going to be open from inception or open at publication? There are valid reasons for both approaches, but generally the earlier you are open with data, code, and results, the more opportunities there are to grow collaboration networks and build with others (which is quite fun). Often researchers choose to be open within their project teams during development, exchanging data, code, and results, but then only sharing with the world once they feel they have a result they can trust. While this approach has been the cultural ‘norm’ within many communities, this is changing as groups grow more comfortable with openness earlier in projects and experience valuable contributions from others and build new collaboration networks.

Days 3, 4, and 5 will discuss how to make your data, code, and results open.

How Will You Share?#

Where you choose to share your research materials and results will have a large influence on its impact – how easy it is for others to find it, how long it is available, and how easy it is to reuse.

Will you share data in a file filled with columns of unlabeled numbers without any units or explanations, or will it be in an open, standard format and following the Findable, Accessible, Interoperable, Reusable (FAIR) principles? Day 3 has more details to help you better understand how to share your data and explains ideas like FAIR and best practices in sharing data. This includes different considerations for where to share your data as well so that it is both accessible and preserved.

For software, since it is often updated and changed, many researchers first share it on a version control platform like GitHub or GitLab but then archive a version of it in a repository that has long-term preservation capabilities – more on this in Day 4!

For results, open access publications and preprint servers are common locations to share. Day 5 discusses all these options.

Steps to Continue Your Open Science Journey#

Here, we will explore the next steps to open science that everybody can take. The thought that open science can impact your entire scientific workflow may seem overwhelming and unachievable, but this is not the case. You can start slowly and gradually increase your adoption depending on the nature and scope of your project. Here are a few immediate ways that you can start engaging in open science.

Where to Go From Here#

  • Get involved: Become part of an open science community in your sector.

  • Start using/sharing the open science tools of your community.

  • Learn how to use/archive data in repositories and community tools and resources.

  • Concise statement of the Ethos of Open Science: Find, collaborate, and share!

Identify Your Open Science Communities#

Here are the steps you can take to find your own science community:

  • Talk with your colleagues.

  • Read your field’s literature.

  • Run searches, in general and discipline-specific areas.

  • Investigate online communities encouraging open science, such as:

Join open science communities. There are generic ones as listed here or you can seek out communities that are not only within your domain but also within your geographical area.

Explore Open Repositories#

There are many repositories that host open data, software, and results. We share many of these resources in the later modules, but here are two NASA repositories that allow you to search for existing data collections that might be relevant to your interests.

Four Steps to Open Science that Anyone Can Take#

  1. Keep seeking best practices for open science, and develop plans to be more open in your science or research.

  2. Think about all the different types of reviews you are involved with, and how to improve them with a goal of openness.

  3. Ask colleagues about open science activities, and award credit for them in evaluations.

  4. Engage with underrepresented communities to ensure science encourages a more equitable, impactful, and positive future.

Additional Resources#

In addition to the resources listed elsewhere in this training, the community resources below are excellent sources of information about Open Software.

Disclaimer: Please note that we reference several papers throughout the course, and depending on the paper, it might be blocked by a paywall. If you would like to get a copy of the paper, please contact the Author or search for it in an online preprint archive. For example, bioRxiv.org.

Key Takeaways#

There is no one way of doing open science, and any steps you take to make your science more open are extremely valuable, especially as we transition to a more open scientific ecosystem in the future. We want people to be able to identify the most important things they “can” openly share, but with the ultimate goal of complete openness.

The following are the key takeaways from this section:

  • Preparing and organizing in advance are crucial components for ensuring the effectiveness of open science work.

  • Open Science and Data Management Plans (OSDMP) provide a plan for how open science is integrated into a project, including the sharing of data, software, and results.

  • Designing for openness is a critical aspect of making sure that open science is integrated into the entire scientific workflow from start to finish. This includes resources that can be used, products that will be made, and how the science will be shared.

  • Open Science is already happening - there are already teams conducting their research openly and many resources that can be used to make your research more open.

  • There are more opportunities to participate and learn about Open Science – this is just the start!

Activity 3: Use, Make, Share#

Estimated time for activity: 10 minutes. It is a group activity.

Take a moment to answer the following questions on your current research or on research that you would like to do:

  • What data, software, or publications do you currently use or would like to use? Are they open or closed?

  • What are the tools and processes that you currently use? Is it easy to include others in collaboration?

  • How is your work shared or planned to be shared? Can anyone access your results?

Discuss the answers in the group.


Summary#

After completing this day, you are able to:

  • Explain what open science is, why it’s a good thing to do, and list some of the benefits and challenges of open science adoption.

  • Describe the practice of open science, including considerations when writing a management plan and the tasks in the “Use, Make, Share” framework.

  • Evaluate available options when determining whether research products should or should not be open.

  • List ways to connect with others who are part of the open science community.