Day 4: Open Code

Day 4: Open Code#

By Neuromatch Academy & NASA

Content creators: NASA, Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan

Content reviewers: Leanna Kalinowski, Hlib Solodzhuk, Ohad Zivan, Shubhrojit Misra, Viviana Greco, Courtney Dean

Production editors: Hlib Solodzhuk, Konstantine Tsafatinos, Ella Batty, Spiros Chavlis

Tutorial Objectives#

Estimated timing of tutorial: 2 hours

This day focuses on the practice and application of open code as part of the open science workflow. It provides a ‘how to’ process that follows the code development lifecycle and “Use, Make, Share” framework. Some of the key topics discussed include: benefits and limitations of open code, how to discover and assess code, considerations and methods for programming following open principles, and finally, when and how to share your code.

Section 1: Introduction to Open Code#

This section defines the key terms, core principles, benefits, and challenges of open code. The practice of making code openly available to the public occurs within a spectrum from more to less protected. Ethical and legal conditions can limit the degree of openness that researchers can permit. This section will introduce the critical questions to consider when determining the appropriate accessibility of code to external users along with best practices to overcome common constraints to maximize availability. The section concludes with a discussion on the software lifecycle and how it fits with the “Use, Make, Share” framework and its relationship to a management plan.

Success Stories#

Why does good science demand that researchers make their code open-access? Sharing your code (and data) makes it easier for others to reproduce your results, helping to validate findings and reduce resources required to duplicate experiments. As a bonus, this decision can lead to new collaborations made possible through a shared dataset and a common understanding of scientific material.

Many journals and funding agencies require that you share your code at the time of publication. However, the prospect of opening code up to criticism, not receiving attribution, or missing out on a result that external researchers discover can deter scientists from making their code open-access. What if people find an error? What if they criticize your coding style? What if they take your code and publish a new result without including you? This module will help you gain confidence in sharing your code by walking you through the basic details to consider when practicing open science.

Let’s review some well-known examples of groups that shared their code and what the impacts were.

Case Study: Climate Models (Isca)

New open-source sets of climate models incorporate features that aim to make climate research more collaborative, efficient and reliable. Scientists have published an open-source framework of climate models (Isca), which contains models that are easy to obtain, completely free, documented, and come with software to make installation and operation easier. All changes are documented and can be reverted. Therefore, anyone can easily use the same models. Although the Isca model was initially used to examine the tropical upper atmosphere, researchers from other fields of science have used it to study the life cycle of weather systems, the Indian monsoon, and the effect of volcanic eruptions on climate. New research across all of these fields was possible within only one year of the Isca’s first publication. This is how we want all of science to work!

Credit

Case Study: Computational Neuroscience: Nengo Software

Neural Engineering Object (Nengo) is an open-source software for simulating large-scale neural systems. Nengo has been used to build increasingly sophisticated neural subsystems for the last decade: path integration, working memory, list memory, inductive reasoning, motor control, and decision-making. The prominent achievement on its development path was Spaun, the world’s largest functional brain model (with the further extension that led to Spaun 2.0). Nengo only depends on one third-party library (more on that later in the lesson), it is easy to integrate Nengo models in arbitrary CPython programs, opening up possibilities for using neurally implemented algorithms in web services, games, and other applications. It has a simple object model, which makes it easy to document, test, and modify.

Credit

Definitions and Considerations of Open Code#

All science builds on what has already been accomplished. Code is no different. Many scientists use code to do data analysis. This process begins with the acquisition of data, either by running an experiment or model that generates data or by identifying observational data that may be useful to test a hypothesis. Next, the data is analyzed. It is very likely that the code required to read or analyze a new data set was already created by someone. The existing code might require some degree of modification to meet a researcher’s unique parameters. Even the development of a new model can incorporate specific elements of existing code from different sources.

Understanding how to find and use others’ code, create your own, and share it is an important part of advancing open science. Just like good data management practices, knowing some of the details about how to share it will not only help you use it later, but also help others understand how to use and cite it so you get credit!

What is Code vs Software?#

When we write “software,” we are actually writing text code and using an interpreter or compiler to translate it into a program that the machine can run. Code is a language that humans can type and understand. Software is often a collection of programs, data, and other information that a computer system uses to perform specific tasks. An example is a software library, which is a suite of data and programming code that is used to develop software programs and applications.

Often, scientists write and publish code that helps others reproduce their results rather than creating software packages. But many scientists aren’t starting their code from scratch. There are large open-source software libraries that scientists use and contribute to, such as scipy, numpy, matplotlib, and others. These libraries let everyone do science faster and better because they have been written, tested, and are used by thousands, if not hundreds, of thousands of people. These libraries have been widely adopted because they are open-source – which makes it easier to collaborate with anyone, anywhere.

What is Open Source Software#

Open-source software is distributed with its source code without cost, making it available for others to use, modify, and distribute with its original rights and permissions.

Often, open-source software is transparently shared in a public repository and sometimes maintained through collaboration. Open-source software development is the basis for a vast range of research software packages.

There are a variety of license choices that can be made for open software which can allow the creator to retain various levels of ownership and rights. The choice of license impacts reuse by others. But first, let’s break down the main types of software scientists use based on their purpose by showing examples of each type.

Types of Software#

Scientists use and produce a wide variety of different types of software during projects. While many researchers might just use equations in a spreadsheet, others may use open-source libraries for advanced machine learning model development and plotting results, while others may contribute to open-source libraries in their field and grow their reputation and impact that way. Here are some examples of different types of software that you might encounter.

General Purpose Software – General purpose software is produced for wide use and not specialized scientific purposes. This includes both commercial software and open-source software. Many widely used productivity software packages are open-source success stories:

Linux kernel, GNU userspace, and various Linux and UNIX distributions
PostgreSQL – open source enterprise-grade database
WordPress and Apache web hosting tools
Firefox and Chrome
- Chrome’s engine is Chromium, which is forked from WebKit, which was forked KHTML. This was possible because it had a license that allowed for this type of reuse. All major browsers today except Firefox can be traced back to KHTML.
Android operating system, among others
- You can look at the Android source code, but you can’t modify it and install it on a device. And even if you could, you couldn’t use any of the standard services (e.g. Google Store) with that. So it’s “open” in the same sense that last night’s lottery numbers are “open”.

Operational Software – Software delivered to individuals as part of a program or product. Examples include automated workflows, data consolidation, and role-based interfacing and reporting.

Fprime – Space mission flight software

Infrastructure Software – Forms the central framework of computer systems, also as known as the computer’s set-up foundation. Examples include operating systems, database management systems, web servers, middleware, and virtualization software.

Fprime – Space mission flight software
PODAAC – Distributed archiving and processing software
UFS – Operational weather forecasting model software
Metadata Compliance Checker, APIs, Web apps, Giovanni, McIDAS

Libraries – Libraries are generic tools for implementing well-known algorithms and providing statistical analysis or visualization, which are incorporated into other software categories. Examples include:

NumPy – Scientific computing with python
scikit-image – Image processing algorithms in python
deal.II – Library of algorithms to solve partial differential equations with finite elements

Modeling and Simulation Software – Modeling and Simulation Software either implements solutions to mathematical equations given input data and boundary conditions or infers models from data. They often use libraries. Examples include: first-principles models, data-assimilation tools, empirical models, machine learning, mission planning, and engineering tools, among others.

OpenFOAM – Computational fluid dynamics software
MOM6 – General ocean circulation model
ASPECT – Planetary convection software
Atmospheric radiative transfer, stellar evolution, upper ocean turbulence, solar wind predictions, orbit propagation (e.g., OpenGGCM, MESA)

Analysis Software - Analysis software is developed to manipulate measurements or model results to visualize or gain understanding. This software often evolves from single-use utility software and may incorporate libraries.

Photutils – tools for detecting and performing photometry of astronomical sources

Single-Use Utility Software – Single-use utility software is written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way. This code often uses libraries for analysis, plotting, or reading data. This software is the most common type that gets included into Open Science and Data Management Plans (OSDMP), which we will talk about shortly. Examples include:

Angus et al. 2019 – Fitting a gyro relation to Praesepe
Webb telescope spots CO2 on exoplanet for the first time: what it means for finding alien life. All the data and models presented in this publication can be found here.
Constraining the increased frequency of global precipitation extremes under warming
Code at: https://doi.org/10.5281/zenodo.6288035 (2022)

Principles, Benefits, and Challenges#

Principles of Open Code#

Open software principles are derived from open-source software best practices. They establish guidelines that advance open science and aim to enhance the value and impact of research.


Transparency	Whether you are developing software or solving a business problem, we all have access to the information and materials necessary for doing our best work. When these materials are accessible, we can build upon each other’s ideas and discoveries. We can make more effective decisions and understand how those decisions affect us.
Collaboration	When we’re free to participate, we can enhance each other’s work in unanticipated ways. When we can modify what others have shared, we unlock new possibilities. By initiating new projects together, we can solve problems that no one can solve alone. And when we implement open standards, we enable others to contribute in the future.
Share early and often	Rapid prototypes can lead to rapid discoveries. An iterative approach leads to better solutions faster. When you’re free to experiment, you can look at problems in new ways and seek answers in new places. You can learn by doing.
Inclusive	Good ideas can come from anywhere, and the best ideas should win. Only by including diverse perspectives in our conversations can we be certain we’ve identified the best ideas, and good decision-makers continually seek those perspectives. We may not operate by consensus, but successful work determines which projects gather support and effort from the community.
Community	Communities form when different people unite around a common purpose. Shared values guide decision-making, and community goals supersede individual interests and agendas.

Credit: The open source way | Opensource.com

Sharing code enhances science because it enables reproducibility, reusability, and replicability. The decision to share code benefits the scientific community because it increases transparency, participation, and collaboration. Sharing code at any point in the research process can be valuable.

In most cases, the source code used to generate results in peer-reviewed papers should be published, cited, and accessible.

Benefits of Moving to Open Software#

Science moves faster when researchers are able to work together, help correct errors, build on each other’s results, and share resources. Sharing software is a key part of open science that:

Accelerates science by making it easier to use and build on software developed in previous work.
Minimizes the time and cost of repeated development of similar software and the reproduction of scientific computations.
Increases the potential number of users and developers and thus helps improve quality and trust in the software.
Increases the likelihood that developers gain visibility, sustainability, software quality, and advance their employability.

Challenges of Moving to Open Software#

It is not uncommon for research groups to spend years developing code, writing papers with the results, and gaining scientific influence by not sharing the code. Anyone new who wants to work on a similar project is at a huge disadvantage because they would have to start from scratch. Also, anyone wanting to work in that area is forced to collaborate with the group. This group retains a very real competitive advantage by keeping it closed-source. However, this approach stifles innovation and hurts scientific progress. Many funding agencies are now requiring that code is shared at the time of publication, if not before. But challenges and fears remain:

Openness has costs: time spent documenting, publishing, responding to users/maintenance and cleaning up/enhancing quality.
Effort is required to learn how to leverage the new tools and knowledge (resources are available to ease this effort).

Ultimately, you are free to deploy the open software principles and resources in your research to maximize its impact and meet the expectations of your sponsors and community while managing costs.

Key Takeaways: Relating Principles to Benefits and Challenges

Making software more open by following the principles has benefits and challenges, which are related.
Greater benefits typically come with greater challenges.
In most cases, individual scientists and society will both benefit from more open software.

Software Management Plans (SMP)#

Software management plans encompass both code and software.


What?	Description of types, management, preservation, and release of software.
When?	The schedule for software archiving and sharing.
Where?	Location where software will be shared and archived over the long term.
How?	Enable reuse of software through assigning a DOI, license, contribution guidelines, etc.
Who?	Roles and responsibilities of the team members.

As your research starts using, creating, and sharing code, the SMP provides a guidebook for everyone on the project that establishes a common understanding.

Is your project sharing all code publicly or just code that goes into a publication? Will your team be contributing back to open-source projects or just writing code that builds on them to produce results? Considering these questions early will influence how much time and energy you may want to spend on documentation and how you plan to share the code.

Open Code is a Spectrum#

Just like data, code can be shared in many different ways to increase reusability. Code can be shared without any documentation, purely as a reproducibility artifact, or code can be well-written, documented, and openly-licensed to maximize re-use. Both of these approaches have value and depend on the time, energy, and funding that researchers have available.

There is a spectrum of openness when it comes to open software that ranges from open-source software to closed-source software.
An example of something “in-between” could be an executable file with documentation on how the code works.
Some projects may be open from inception and continuously share all code throughout development. Others may share some of the code at the time of publication. Other projects may only make code available once funding ends. A variety of valid reasons factor into a project’s approach to sharing.
While some factors restrict the degree of openness that software can be, each step towards sharing advances the open science movement.
By sharing more ideas and software, communities have driven creative, scientific, and technological advancement faster than the restricted pace of closed science. Peer production and mass collaboration create more sustainable software development.

While researchers and institutions may not be able to share all their code, they can make efforts to shift the openness spectrum from closed code to open-source code and software.

The Practice of ‘Open’#

Review how the key tasks in the software development life cycle are covered in the “Use, Make, Share” framework flow.

As with open data, different aspects of open software are described in terms of Using, Making, and Sharing of open software.

A key difference with software is that the process is typically more cyclical and repetitive than with data or results. Typically, software constantly evolves. Thus, the boundaries between “Use-Make- Share” are less rigid and the process is typically more dynamic and circular than pre-planned/fixed and sequential.

Key Takeaways#

In this section, you learned that:

In open-source software, anyone can see the underlying source code.
Open-source principles promote transparency, collaboration, sharing, inclusiveness, and communities.
Open-source software accelerates science, minimizes time and cost of repeated development of similar software and reproducing scientific computations, and can improve quality and trust in science.
Licenses for open-source software dictate its shareability and reusability to developers and prospective contributors. Funding entities and affiliated institutions may impose restrictions on how developers license their software.
A software management plan (SMP) is a project guidebook with a common understanding of data management practices that a research team can work from.

Section 2: Using Open Code#

In this section, you learn the steps for using existing open code in your work. These steps include discovering, assessing, reusing, citing, and acknowledging.

Discovering Open Code and Software#

Many people discover code through discussions with their colleagues or by reading journal articles and attending talks at conferences. This is a great way to find out about code that might have applications for your scientific problem.

What other ways can someone search for open code? As a first step, look for code that already exists because chances are that someone else has already had a similar problem and published their code online. A common way to search for existing code is with a general search engine. Search engines offer one indicator of a code’s relevancy, how recently it was updated, and how frequently others reference it.


Example	I’m a new graduate student starting to work on modeling turbulence in the Southern Ocean to better understand sea surface temperature (or ocean heat uptake) and climate change. Is there some software available to model how eddies in the ocean affect sea-surface temperature?
Exercise	General Search on the term “Software for ocean turbulence modeling”
Result	General Ocean Turbulence Model (GOTM)

This successful search is predicated on the developers of GOTM making their code open.

Open Software Discovery Depends on Developers Following FAIR Principles#

Discovering open software depends on developers making their software easy to find. The Findable, Accessible, Interoperable and Reusable (FAIR) Principles for research software suggest:

Software and its associated metadata must be easy for humans and machines to find.
Software must be described with rich, searchable, and indexable metadata.
Software must be findable from all relevant search points

Reference: “The FAIR Guiding Principles for scientific data management and stewardship” Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). See also Module 1.

However, you may have more specific needs. The following sections cover additional ways to help discover relevant software that meets specific research demands.

How to Search for Open Code#

A successful search for open code demands a clearly defined purpose. Developers must first determine the tasks they expect their code to carry out. The requirements associated with these tasks can determine the best suited programming language.

Next, familiarize yourself with the terminology of others who created open software with similar requirements to your own. The keywords affiliated with your programming purpose or requirements can serve as a starting point when searching for relevant code. These keywords can be found in community forums about open-source programming and in related scientific journal articles. With the adoption of open-access principles by many academic journals, prospective programmers can peruse scientific papers from fields related to their research in order to find and sometimes make use of existing code that will fulfill their requirements.

Know Where to Search#

The open software ecosystem is vast, organic, multifaceted, and highly distributed.

If you are looking for scientific software, community standards increasingly require code to be published and linked to scientific papers.

Thus, the scientific literature and its ancillary code archives are increasingly a great place to look for scientific open code.

Most open code is not developed by or for scientists. However, open code enables research every day.

Where to Look Depends on What You Need#

There are several popular search engines for code snippets. First, you can simply search on Google. Other commonly used search engines include GitHub Code Search and Stack Overflow. These search engines allow you to search for specific code snippets by programming language, keyword, or other criteria. GitHub Code Search allows you to search GitHub, a popular code repository for scientific software. Stack Overflow allows you to search forums, where users discuss solutions to coding problems.


GitHub	GitLab	Bitbucket

GitHub Code Search

In this example, we will practice searching for open access code on GitHub. Let’s work through a scenario in which you would like to search for the Lomb and Scargle method for estimating a power spectrum.

GitHub enables users to collaborate on a shared project and track their changes with version control. Users can create a repository and grant others access or make it open access. GitHub involves a large community of open-access users who make their code available for free.

Begin by visiting the GitHub website to search for openly available software packages. You will need to create a free account for this action. Navigate to the Search Code page to begin your search and access tutorials on the interface and capabilities of the search portal. Alternatively, you can simply input your search terms in the search bar while on your profile page. Next, input the related keywords into the search bar. Search for “Lomb Scargle” and find several repositories with relevant code in various languages, along with thousands of related snippets of code. Congratulations! You have begun your open access software journey and can now view the work of thousands of others who once were where you are now. Upwards and onwards!

Screenshot of the repositories returned from our search.

Screenshot of the code snippets returned from our search.

With open software, knowing where to search and what to search for can be a challenging problem. You can always start with a Google Search. However, it can be valuable to think through some of the questions that guide the discovery process. If the user lacks relevant experience, it can also be helpful to engage experienced colleagues at this stage.

Review the flow chart that illustrates how the search follows the definition of the need.

Algorothm for the relevant software search.

Open Software is Aggregated and Searchable in Repositories#

A software repository is an online collection of stand-alone application software packages. Repositories typically control access and track the deployments/downloads of packages.

Software packages are often provided as executables without code.

The collection typically includes metadata, documentation, and licensing restrictions on each package. It may include different software package versions and the platforms or environments on which the software package can be executed.

Most research code should be open-source software, which is stored in code repositories.

Examples of software repositories are:#


Software Heritage	Open Source Development Network (OSDN)

SourceForge	Free and Open-Source Software Hub (FOSSHUB)

Googlecode	Comprehensive Perl Archive Network

PyPl	CRAN

Activity 1: Find Code For Your Research#

Estimated time for activity: 7 minutes. It is an individual activity.

In this activity, you are invited to find open-source code resources that might be beneficial for your current research and/or studies! You can start your searches with something general, such as “spike sorting”, and then end with more specific to your particular area; who knows, you might find a lab that practices similar research techniques as your own!

Assessing Open Code and Software#

So, you’ve discovered some exciting open code that might help you solve your scientific problem. Can you trust this code you discovered on the web? Will it be useful? How much time will it take to learn it? Could the code contain malware? Could you get in legal trouble for using it?

Examples: You found the “General Ocean Turbulence Model (GOTM)” on the internet, and it looks promising. Or, you just found lots of code snippets and functions related to the Lomb-Scargle power spectrum. Now, you would like to assess these pieces of code to help you decide if you should use them. This subsection discusses some best practices for assessing if the code will help you.

Four General Considerations for Assessing Open Software#

Software assessment criteria are similar, for any level of openness:

Functionality: Will it be useful for your scientific problem?
Interoperability: How hard will it be to use?
Security: Is it safe? Would using the software create a security risk?
Licenses/restrictions: Can you use it? Is it legal to use the software in your project?

Functionality: Assessing Scientific Utility#

Does the software meet your scientific needs?

Does it address your specific science question?
Do studies similar to yours use it?
What papers cite it, and how do they use it?
Talk to your advisors or colleagues who might have experience with it.

Testing the scientific compatibility

Does the software contain scientific test cases? If so, reproduce a case that is applicable to your problem; make sure the results are as expected.
If you’ve done similar scientific analysis/modeling previously, reproduce your prior results with the new software. Are the results consistent?
Incrementally modify a given test case to address new scientific questions. Alternatively, develop your own case, if necessary, following relevant examples.

Interoperability: Ease of Use#

Is the code written in a language that you are familiar with?

It can be easier to use coding languages that you are familiar with and then import the code into existing software rather than try to use a new language. On the other hand, the use of existing packages and executables can accelerate your work.

Check for good documentation

Read the README file. Does the software meet your functional requirements? Are the environmental dependencies well-defined and reasonable?

Check the evidence of interoperability with other projects and codes

It is a good sign if you can find evidence that the code has been used successfully by other users who have similar scientific or technical needs.

Factors for assessing the quality of open source software#

To quickly assess the community usage and quality of software repository, use the tools from the repository where you found it. GitHub, for example, permits a quick scan of development activity as evidenced by the number of times the code has been downloaded or ‘forked’ in GitHub parlance. You can also view the amount of activity in a community. GitHub also provides insights into the quality of the software.

The Importance of the README File

Example above: Astropy
Always the starting point when assessing software.
Explains what the software does, how to install and use it, or points to files with that information.
Assumes limited prior knowledge by the reader / potential user.
Includes a compatibility description, e.g., dependencies.
Includes usage examples and/or test cases.

Security: Considerations When Using Open Code#

You have found some Open Code that will help you solve your scientific problem, and it looks easy to use. However, you may still have some reservations. Perhaps you are unsure if the code poses a security risk, for example.

The risks are relatively low for small snippets of code that are easy for you to fully understand. However, you may not be able to fully understand all components of a large Open Software Package.

Open software is perceived to have more security risks. This is generally less of a problem for open-source code than executables because the code can be audited for security vulnerabilities by the community. How can you assess security in this case?

Consult with your institutional open software policies and IT staff
Use authoritative, reputable sources to minimize security risks
Set strict security rules and standards when using a dependency
Use security tools to check for vulnerabilities (e.g., Open Worldwide Application Security Project®)
Avoid unsupported open-source software. Switch to actively developed components or develop it yourself
Check with your latest institutional policies on using Machine Learning and Artificial Intelligence tools
Use caution when using external tools with secure or closed access data. It may be possible for the external tool to publicly share what should be restricted information

Licenses#

So, you want to reuse some open code you discovered. It is essential to check the legal restrictions and requirements imposed on users, which are generally provided in the license.

Although licensing is a nuanced subject that you will learn more about in Section 3, it is useful to be aware that there are generally two classes of license: permissive and non-permissive. Permissive licenses, most commonly Apache 2.0, MIT, or BSD, will generally allow you to use the code for your scientific research with little restriction, whereas non-permissive licenses, such as copy-left licenses, impose substantial restrictions on how you use the code and require more careful consideration.

Reusing Open Code#

Software can be reused in a variety of ways. A software package can be executed on its own to provide a complete analysis or models depending on the input parameters. Alternatively, the package could be imported as part of a larger library to provide specific functionality. Also, code snippets can be copied into existing code, if permitted, or the code could be re-written and incorporated into new software.

If you simply intend to reuse a code snippet, continuously test that your selected code works as you expect. If you are reusing a more complex code, there are additional considerations.

Selecting the Appropriate Version for Reuse#

Consider the following when selecting among multiple versions of open-source software.


Use the latest stable release when possible	Just like software updates to your phone or computer’s operating system or apps, it is important to use the latest stable release. Developers often release developmental versions that include new features or bug fixes that are not fully tested. For this reason, using a developmental release is generally not recommended.
Determine the origin of the version you intend to use	Determine whether the version you intend to use comes from a modified open-source project or from its original source project. With this information, determine which source is more appropriate for your project.
Check for issues and bugs	Check for any known issues or bugs with your selected version that could cause problems. Find current information on issues or bugs by checking release notes, issue trackers, and developer forums.

Resolve Problems in Reusing Software#

Implement tests to verify that the software performs as expected in your application.
If you run into problems, revisit the release notes, issue tracker, and/or user/developer forums.
Don’t be afraid to ask experienced colleagues for help.
It is better to seek and obtain help in a public forum than in private (e.g., email). Part of open science is working in the open. Often you may find through a search that other users have similar questions. Someone may have already offered a solution. If not, it is likely that others will benefit from your question being answered in public.

Citing and Acknowledging Open Code Use#

Imagine that you’ve used Open Code pulled from the web and it made a big difference for your project research paper. How should you provide due credit for the open access code that contributed to your research?

Example: You managed to implement GOTM to learn something new about ocean turbulence in the Southern Ocean, or you managed to compute a Lomb-Scargle periodogram using astropy. Here are some questions to consider:

Should you cite the Open Code?

Cite any code that you view as having contributed to your research:

Did the code play a critical part in your research?
Did the code provide something novel?

In most cases, a code snippet on Stack Overflow does not constitute a citable research contribution. However, an author can still decide to cite it if they choose.

Instances when shared code directly impacts the scientific results and require a detailed description include:

Numerical modeling or simulation
Automated analysis, such as image processing or optical recognition

See the journal where you are publishing if they have any specific instructions on how to cite software (e.g., AAS Software Citation Suggestions).

In some cases, a software’s licensing terms and conditions require acknowledgment or citation in the references or bibliography of any publications based on research that made use of the software.

How to cite?

Ideally, use and cite code that is archived in a long-term repository with a persistent DOI. Follow the guidance about the preferred citation format, which is provided in the long-term repository and may appear in a README or a CITATION file.

DOIs provide a persistent identifier/link for research outputs. Thus, it is preferable to cite code in long-term repositories linked to a DOI. URLs (e.g., Stack Overflow) and active repositories (e.g., on GitHub) are mutable but can be used if there is no alternative.

Packages may provide a way to cite individual versions as well. For reproducibility, cite both the overall package and the version that is used in your work. As the functionality of a package may evolve with the release of new versions, this helps provide a specific description of your work.

If you are writing software, you can also cite in the comments and documentation of the software that you have used.

Key Takeaways#

In this section, you learned that:

Open code exists in a vast, organic, and distributed ecosystem. Discovering Open Code depends on defining your requirements, knowing where to look, and developers using FAIR principles.
Scientific papers are now a good place to discover scientific Open Code since many journals require the code used in the paper to be linked via a DOI.
Before use, it is important to assess open software for functionality, quality, interoperability, security, and license/reuse restrictions. Your first step should be to look for a README file.
When reusing open software, use the latest supported version and test the software to ensure it functions as expected. If problems arise, reach out to the developers or user community, ideally via a public forum.
It is important to cite and acknowledge open software that significantly contributes to your work, as well as share your lessons learned and any contributions with the developers and user community.

Section 3: Making Open Code#

In this section, you will learn about the practical steps to make code openly accessible. Large volumes and well-established software have different needs than an incipient project. For example, a script written to create a simple plot has different requirements than a software package that models the Earth’s climate. The size of a research team can also determine the steps required to make code open access. This lesson covers the process of making code usable to other researchers through documentation, considerations around licenses, and software development best practices.

How do We Plan for Making Code?#

Code is written to solve a challenge. This can range from producing a plot to data processing Earth observations to modeling the Universe. The challenges associated with writing code can range in difficulty from simpler tasks, such as the use of spreadsheets, to more complex activities, such as the creation of extensive libraries and the use of high-performance or cloud computing. Code can be developed as an individual, team, or community. Once written, code might be used for decades or never again.

When starting a research project, it is useful to answer the following questions:

What problem am I trying to solve, and are others in my community facing it as well?
Are there existing solutions? (In Section 2, we explored how to look for existing solutions.)
Did you find code that was close to what you want but didn’t quite meet your needs?

You could potentially contribute to it instead of writing something new.

Even if a solution already exists, there might be good reasons to develop your own code. Instances include:

The code is written in a different programming language than you are familiar with.
The license is not open enough to adopt it.
To try new techniques or to develop a deeper understanding of the problem.

It might take more time to start a new project, or it might take more time to integrate someone else’s code than writing your own. You will have to make that call.

We looked for existing code, and though we found a few things that were close we decided in the end our needs were unique enough - we’re starting a new project!

Starting a New Project#

When starting a new project, the key things to consider are:

Define the project scope, its primary features, any limitations, and the intended audience.
Consider the resources required for the software to run. Will it be on a personal computer, a high-performance computing server, or on the cloud?
How will it be managed?

This section focuses predominantly on the question of how to manage open access code.

Who will be working on the project? What are some of the development best practices? How will you share it openly? How will it be licensed?

Organizing a Project#

Source

Software projects can be organized in a variety of ways, each that involves unique considerations about how to begin. Many projects start out as a single script that was only intended for a single use. However, a script can grow into a much larger project with unforeseen applications in its original or new field of research. Other projects can start with formal requirements and standards.

Making code public has many advantages:

It enables open collaboration.
It invites constructive feedback that contributes to a code’s accuracy and robustness.
People with less experience with the subject matter will learn more.
Those with less programming experience can learn from those with more programming experience as they improve the code.
It provides an intermediate product that can still be cited.

When naming a project, conduct a quick search of the envisioned name to see what shows up. Avoid names with many other uses, as this will make it difficult for others to discover the code. Also, do not choose embarrassing or trademarked names.

Hosting the product on a version control platform ensures the permanence of your project. If code only exists on your computer, it may disappear if the computer is damaged or lost.

Documenting the production and management of your code benefits both you and those who might use your code in the future. You are your own best collaborator. Documentation can save you from a headache should you reuse the code in six months or attempt to recall meticulous details about your process later on.

Questions to consider when choosing a programming language:

Will potential collaborators be able to contribute in the chosen language?
Which languages are you most experienced with?
Are there any limitations from your computing environment that would impede your ability to write or manage this code?
Languages have strengths and weaknesses; which are most important for your project?

Before someone else can use your code, they’re going to ask some questions:

Where can I find your code?
Is your code documented?
In what ways am I allowed to use your code?
Will you accept changes to your code? If I find a bug, what do I do?
How do I trust your code works?
How do I know if the code will be supported long-term?

Importance of Version Control#

Your code will change significantly over the lifetime of your project. Just as we appreciate the ability to track earlier versions of documents or versions created by different people, inevitably, someone will want to be able to revert, compare, and synthesize changes in code.

The most popular tool for version control is git. Git is a system that tracks changes in computer files, similar to Google Docs or SharePoint but more applicable to code script. Git is usually used in conjunction with a version control platform such as GitHub, GitLab, or Bitbucket. These tools were covered in Day 2.

Version control enables the following:

Helps developers keep track of changes to a project’s code (as well as supplemental files and documentation) over the entire course of a project’s evolution.
Revisions to a project’s files can be tracked, including contributions made by different people.
Undesirable changes (like errors or bugs) can be reverted at any time.

Version control is a good practice for coding, even if you are not immediately sharing the code. You can use version control with your code privately on your computer or use the private mode on hosting services (e.g., GitHub and GitLab). By setting up version control early on, you prepare your code for intended and unforeseen future use.

Further Resources on version control

Describing Our Code to Others#

README#

The first stop for a user when they approach a new project should be the README file. Aptly named, this file contains orientation information that will help a user understand a project’s purpose, provides examples of how it can be used, and lists other important information that the creator deems pertinent.

At the minimum, a README should contain the name of the project and a very short paragraph of what the software is. Two to three sentences in a plain-language style that does not assume who is reading it. It’s the elevator pitch for the project.


Bad README example	“This code recomputes the fundamental permutation factor of the downward flow (for J < 10, obviously).”
Good README example	“LeapKitten. This Python software package takes any picture of a kitten (JPEG, PNG) and uses artificial intelligence to output what it would look like leaping into the air. In addition, the code takes leap years into account on the timestamp on the image.”

In addition, the following information is helpful to add to the README, especially if they are not listed elsewhere:

A list of any code dependencies the software has, e.g. “Numpy, kitten-rng, and human-readable must be installed to run this software.”
How to install and a brief description of how to run the software.
Detailed description of the software, especially if there is no external documentation.
Examples of how to use the software.
Acknowledgement of team members or sources of support.

As seen in these examples, README files can be useful for a collection of scripts supporting a publication or an extensively developed software package.

Contributor Guidelines#

The CONTRIBUTING.md file gives information about how to contribute to the project. It details how the contribution process works and what type of contributions are needed. While not every project has a CONTRIBUTING.md file, the existence of one is a clear indicator that contributions are welcomed.

You’ll need to decide for yourself when your project has progressed enough to consider inviting contributors. When it has, create a document called CONTRIBUTING at the top level of your report.

The Astropy contributing guidelines and Numpy contributing guidelines provide two examples.

Bonus Tip: Even if you are developing your code publicly, this does not mean you have to accept contributions from others or maintain your code forever. The contributing guidelines or README are good places to indicate what your expectations are for your code. This can clarify that the code is not maintained or not accepting contributions.

Code of Conduct#

The code of conduct sets ground rules for participants’ behavior and helps to facilitate a friendly, welcoming environment. While not every project has a CODE_OF_CONDUCT file, its presence signals that this is a welcoming project to contribute to.

Code Documentation#

Code Level Documentation for the Developer

Your software should be documented within the source code. Each function should have comments at the start that briefly state, in plain language, what the function is for. This is not only for other developers, but yourself a week later when you forgot what you wrote.

Examples:

# This function takes the image array and crops it from the center to 50% of the original size.

Without going into details of the data type, calling parameters, etc. this description immediately puts someone looking at the code into the context of what the function aims to accomplish; they can then explore the details.

While you should consider placing a description at the start of a function, use your discretion on where you put similar descriptions of code. At the start of a complex loop or analysis would be a good idea. Don’t go overboard - things like this aren’t useful:

# set x to 17 x = 17

Descriptive variable, class, and function names can make your code very readable. Sometimes even great coders are working fast and will name variables ‘a’, ‘temp’, or other names that probably won’t make a lot of sense in a week or two when they come back to something they were working on. Names like ‘baking_time’ or ‘velocity’ are more clear. Variable names should be easy to understand and clearly represent what they are.

Ideally, someone who doesn’t write in the software language of the code can read the comments in the file and have a rough idea of what is happening.

Use the comments to put URLs that reference where you might have found the algorithm you’re using (e.g. Stack Overflow) or the journal paper where you found the formula you’re implementing.

Code Level Documentation for the User#

If you are developing code that you expect others to use, produce a manual on how to use the code. As code constantly develops, it is much easier to document while or even before you write any code.

If you write your documentation within the code itself, there are pieces of software that can then extract it, format it, and present it as a polished manual. Examples of documentation generated from the code can be seen for Astropy or NumPy.

They look fancy, but very similar too. These sites were completely generated from comments and documents written in the source code. Different from the comments written for developers of the code above, these comments were written specifically for the audience of external users of the code: the manual.

While there are multiple software packages for automatic documentation generation, the most commonly used ones are Sphinx for Python and Doxygen for most everything else. Markdown is also a popular choice for the formatting language for documentation.

Programming and Documenting#

Establishing a Development Environment - Establishing an appropriate development environment will help you write good, clean code and will help you maintain the project as it evolves.

Configure any necessary tools for writing the code. Perhaps an IDE (Integrated Development Environment) or text editor. Some popular examples include VS Code, Pycharm, R Studio, Xcode.
Set up a package manager. For example, for Python, one could use ‘anaconda’ or ‘poetry’.
Create a virtual environment specific to your project to isolate its dependencies (and their versions) from those used for other projects

Structuring Files and Folders - How you structure the files in your project from the beginning will contribute to the success of the final results.

Different programming languages have different standard folder structures. Familiarize yourself with the standards before starting, as it will help others collaborate and will likely save you from difficulties later.

There are a variety of sample code structures that can be used to get started. For example, for Python, there is Cookiecutter and an Astropy package template.

What License Should We Choose for Our Code?#

Licensing Considerations when Using Open Software#

Open-source software licenses are the basis for how scientists use, make, and share code and software. Understanding some of the nuances of these licenses is important because it will affect how your project can license and share code.

A software license is a legal document that states the rights of the developer and user of a piece of software.

An open source license is a type of software license approved by the Open Source Initiative (OSI) as compliant with the Open Source Definition. An open-source license grants permission for anyone to inspect, use, modify, and distribute the software’s source code for any purpose.

Licenses ensure that developers receive credit and control over how their work is used. Without a license, software is assumed copyrighted and without permission. Programmers include licenses to allow reuse.

Licenses take various forms in order to outline:

Contractual obligations (if any exist) between the developer and user.
What the user may do with the software.
To whom the user may distribute the software (if any such right exists).
Length of time the user has the right to use the software.

Some Common Types of Software License#

Public Domain

Anyone free to use.

Lesser General Domain

Can link to open source libraries, and code can be licensed under any license type.

Permissive

Gives users wide but not complete latitude to reuse/relicense.

Non-permissive

Allows users to reuse, but also gives users the responsibility to share their changes with the community.

Copyleft

Can be distributed or modified if all the code involved is licensed under the same license.

Proprietary

Cannot be copied, modified, or distributed.

Before you choose a license, first check with your organization or employer. They may have specific guidelines about what software license you are allowed to use. Your research grant may also stipulate permissible license types. The software management plan should specify what license you plan to use.

If a license is not shared with a code, creative work is assumed to be copyrighted by default in the United States. It does not need to be registered, and it is assumed to be automatically protected by copyright the moment it is created.

For software, the license is shared in a file called LICENSE at the top of the repository. It’s a standard location people will know to look at. It’s not bad practice to put a one line version of the license at the top of each file of code as well, with a pointer to where one could find the full license.

Types of Open-Source Software Licenses#

There are two main types of open-source licenses. Permissive and protective (sometimes referred to as copy-left). The difference in these types of licenses is primarily related to the type of license users of the code are allowed to apply to their derivative works.

Permissive License

The Open Source Initiative defines a permissive software license as a license that guarantees the freedoms to use, modify, redistribute, and create derivative works. An example of this type of license is the Apache 2.0 license by the Apache Software Foundation. It is the most popular and widely used permissive license.

Users have wide latitude for reuse under this license. They are generally free to incorporate the code into their project or use it how they wish. A user of permissive-license open source in a product could redeploy the open source software with a wide range of licenses, including proprietary closed source software.

Protective License

Protective (copyleft) licenses are a legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. This allows users to reuse, but also requires users to share their changes with the community using the same license. An example of a protective license is the General Public License (GPL) that ensures users have the freedom and responsibility to share their changes with the community. It is the most widely used protective license. These types of licenses can result in less re-use by users who may prefer or be required to only use permissive licenses.

Common Licenses for Open Software#

Some of the most popular licenses used in open software are:

Permissive (can apply any license to derivative works)

Protective/ copyleft (all derivative works must distribute all its source code under the same license)

GNU General Public License (GPL)
Mozilla Public License
Common Development and Distribution License (CDDL)

For more information on different types of licenses please refer to the Open Source Initiative OSI.

Programming Best Practices#

In this subsection, some best practices in development are provided, including code review, testing, security, and accessibility. These best practices will improve the quality of code, reproducibility of results, and security of a project. Combined, these actions help improve the robustness of open access code and help to meet the unique challenges that can arise with multiple contributors and revisions that occur over an extended period of time.

Code Review#

Code benefits from peer review in the same way as science. Having someone else read over your code and test it is one of the best ways to improve the quality of the code.

Many version control platforms have built-in tools that enable developers to review, comment, and iterate on each other’s code. These can be done in the open and allow anyone to comment.

Here is a great example of the discussion that can happen when the original creator of an algorithm comments on a python implementation made by a first-time contributor to the Astropy project. The open and constructive discussion led to a better implementation of the algorithm along with possible future improvements.

Software packages can be reviewed as their own products as well. Many scientific publications now accept papers focused on software. There are entities like PyOpenSci and the Journal of Open Source Software that provide open peer review of scientific packages. See more details about JOSS in the next lesson on sharing your code.

Testing#

A proven method to evaluate the reproducibility of your software is through testing. There are many types of testing that range from testing the smallest testable parts of a code to verifying if a code works as a whole under different scenarios. Code testing can include a wide range of different techniques. The following lesson section provides only a brief introduction to the topic.

The main objective of code testing is to evaluate if a code does what its authors intended it to do. Comprehensively testing code can be very difficult as it involves testing the code for generating expected outputs as well as for failing when it should.

Scientific validation

Whether producing a script or an entire data processing pipeline, the validation of software is critical to ensuring the quality and trustworthiness of the scientific results. This could mean manually calculating the results to check the output of the code, comparing it to previously produced results, or having another team member test it.

Reproducibility testing

Given the same inputs and parameters, can the same results be produced? Making the configuration files, input data, etc., openly available so users can easily run and produce the same published results is a critical way to increase trust in your code.

Built-in tests

Unit tests enable software developers to bolster their confidence in their code’s ability to perform as expected. Unit tests are small functions that sit outside the code base and test a specific function or run a specific test. For example, if a function takes an image and flips it horizontally, one test might check that the resulting image is the same size. Another compares the output using a known image with the expected result. Another checks that a new image is returned.

Automated testing

Built-in tests can usually be run both manually and automatically. Most version control platforms offer services for running tests automatically. When run this way, code can be checked to see if changes raise any problems. This process of checking the code automatically as it is developed is called continuous development & continuous integration (CI/CD). If a small change made in one part of the code results in an unexpected change in another part, running the tests will uncover this immediately.

Minimizing the Risk of Security Vulnerabilities#

Whether using open-source, closed-source, or commercial software, it is important to consider the security risks inherent in the development of software.

Ensure minimal, DRY (Don’t repeat yourself) code (easier to maintain and fix).
Use global variables or key managers for credentials. Never include credentials in your code.
Use well-tested and maintained dependencies. In packages that you maintain, keep the list of dependencies up to date.
Create software with tools that provide automated scanning and auditing.
If there are unsupported dependencies that you rely on, assess them to determine how they might introduce security risks and whether it would be appropriate to switch to a different package.

Security tools and security vulnerabilities

Commercial and open-source tools have been developed to address the challenge of identifying the security vulnerabilities in different source components. If you do not have any technology to secure your open source usage, you can consider using the Dependabot or OWASP dependency check tools.

The Open Web Application Security Project (OWASP), is an online community that produces free tools and technologies in the field of web application security. OWASP dependency check is a utility created for developers, which identifies project dependencies and checks if they contain any known, publicly disclosed, open-source vulnerabilities.

Test components and dependencies

Testing the security of the open-source components you are using is the best way to ensure the safety of your applications and your organization. Your commitment to timely and frequent analysis of open-source components should be the same as to your proprietary code.

This is especially true as the component in question may have unknown security vulnerabilities or dependencies that differ with each use case. It is possible for a component to be secure in a particular application but vulnerable in another.

Creating FAIR Software#

Findable

Software includes a persistent and unique identifier and rich metadata, so it is easy for humans and machines to find.

Accessible

Software is retrievable from its identifier via standard communication protocols.

Interoperable

Software interoperates with other software; it exchanges data and/or metadata via community standards.

Reusable

Fully described metadata with provenance, meeting community standards. License permits reuse.

Additional Helpful Tips#

Here are some further suggestions on how to make your code more accessible, reproducible, and transparent:


Descriptive Names	Variables, functions, and similar entities should be given descriptive names as opposed to vague names. Descriptive names instantly give other programmers an idea of what the variable or function is. For example, the variable name colourOfCat is a good name because it describes what it intends to do, which is to encompass the color of a cat.
Metadata File	Consider including a metadata file for your software to make it more discoverable. A ‘codemeta.json’ can be created using Code Meta’s generator to include with your package.
Operation Documentation	Share details about how you are running the code. For example, document the version of a software library you are using, or the version of the compiler. These are often shared in an ‘environment.yml’ file.
Automation	Consider the following scenario: You are getting ready to publish your paper that includes 17 plots that all depend on a data set released by a mission. Right before you are about to submit, the mission releases an updated version of the data set. How easy will it be to recreate those plots? Software allows you to automate the running of scripts and alert programmers when written so that input files are not hardcoding. This allows programmers to easily re-run code if an initial parameter changes.
Using Standards	Most languages have their own coding style adopted by their respective communities. Following those conventions makes it easier for others to contribute to your code and makes your project more inclusive.
Portability	Allows individuals the ability to transfer their personal data between platforms.
Naming	Many historical terms used in software have negative connotations depending on the context. When considering different terms or naming, consider how different audiences may react to those terms.

Key Takeaways#

In this section, you learned:

Planning a new project requires programmers to have a clearly defined purpose, recognize any resource limitations, and envision a data management plan.
Using a repository with version control allows developers to track changes across time and from multiple contributors, which can help with troubleshooting for errors and with managing a team of programmers.
A README file should include the name of a project and short but clear description of the software.
Licenses ensure that developers receive credit and control over how their work is used. Without a license, software is assumed copyrighted and without permissions
Testing, labeling, and implementing security measures are examples of programming best practices that support Open Science.

Section 4: Sharing Open Code#

In this section, you learn the steps for sharing the software that you developed. These steps include determining if, when, and where software should be shared, which roles are needed, and how to enable others to use the code.

Legal and Security Concerns#

Legal concerns

Anyone writing research code and software should familiarize themselves with their organization's policies on sharing and publishing software. Funding agencies, government or private, may have strict software openness requirements. In other cases, sharing software may not be allowed by the organization.

Legal concerns can include questions such as:

Does a developer or institution own the software?
Does sharing (or not sharing) the software violate the funding agency’s policies?
Are there any local laws or regulations in your area that govern the sharing of intellectual property?
What software license is required?

Once you decide to participate in or begin a new open software project, familiarize yourself with your organization’s policies and practices.

Find out more about the legal concerns here.

Security concerns

Security is a concern when sharing software. Bad actors can attach malicious code to software in an attempt to infiltrate computer systems through security vulnerabilities, potentially exposing sensitive and proprietary information that can lead to great financial loss for users. Security risks must be considered when sharing software.

Security concerns can include:

Does your organization’s Information Technology (IT) policy allow you to checkout the code you want to use on your machine?
Is the repository you want to contribute to reputable?
Are there any open security-related issues with the code?

Once you decide to participate in or begin a new open software project, familiarize yourself with your organization's IT policies.

Find out more about the security concerns here.

How: How to Enable Reuse of Code#

Now that you have shared your code in the appropriate way, it’s important to consider if you’ve made it easy for others (or your future self) to reuse your code.

Assigning a License#

As you may recall from the previous section, assigning an appropriate license is necessary for others to know how to use your code.

As an example, here’s how you’d assign a license to a GitHub repository:

Choose the appropriate software-sharing license that meets your organization’s requirements. To create a license template in GitHub, add a new file and type “LICENSE” in the name field; then, the “Choose a license template” option will appear.

Make sure that your GitHub repository is public, making it searchable by anyone.

Making the Code Citable#

Not all code needs to be citable. When released on its own however, there are a few best practices for how to make your code citable.

Adding code to a GitHub repository is not sufficient for archiving code. To archive, we must assign a persistent identifier.

Producing a persistent identifier for your code is the best way to make it citable. This could take form through a peer-reviewed publication that describes the software or by archiving the software with a long-term repository that produces a DOI or similar identifier. For code shared on GitHub, a DOI can be easily produced for each release of the software from Zenodo.

Activity 2: Create a DOI for a Test Code File#

Estimated time for activity: 10 minutes. It is an individual activity.

You can create Digital Object Identifiers (DOIs) for your code that makes it citable. You do this by archiving a GitHub code repository at Zenodo and issuing a DOI for the record.

Steps for this activity:

Part 1: Create a test public GitHub repository.

Navigate to the login page for GitHub and login. If you haven’t already, create a free user account.
Create a new repository with this link.
Type a short, memorable name for your repository. For example, “os-test”.
Set the repository visibility ‘Public’ by selecting this option below the repository description.
In the following section ‘Initialize this repository with:’ select ‘Add a README file’.
Select any license.
Click ‘Create repository’.
You will be automatically directed to your new repository webpage.
Now, we will get a DOI from the Zenodo application. Note that we are going to use https://sandbox.zenodo.org/ to do this. This offers all the same capabilities as https://zenodo.org but is a testing site! You need to sign up for this website even if you have already signed up for Zenodo - they have the separate database of users. Please sign up with GitHub or with classic registration but not with your ORCID!

Part 2: Create an archived repository and affiliated DOI.

Navigate to the Zenodo GitHub page. Click on the button ‘Connect’ to allow Zenodo to access your GitHub repositories.

Review the information about access permissions, then click ‘Authorize Zenodo’.
Sync your GitHub with Zenodo by clicking ‘Sync now’ in the upper right corner.

To the right of the name of the repository you want to archive (‘os-test’), toggle the button to On.
Click on the name of the repository.
Click the big green button that has ‘username/os-test’

Add a tag ‘test’. You may have to create a new tag for ‘test’ if prompted.
Scroll down and click the green ‘publish release’ button

Navigate to the Zenodo GitHub page and see the DOI for ‘os-test’
Congrats! You have your DOI for the repo.

Zenodo archives your repository and issues a new DOI each time you create a new GitHub release. Follow the steps at “Managing releases in a repository” to create a new one.

Making it Easy to Cite Your Code#

Information about how to cite the software can then be added to your README or other documentation in your repository. Another useful step for making your repository citation information accessible is to add a CITATION file to the repository.

CITATION files are a means to make citation information easily accessible in open-source software repositories. A citation file format (CFF) is a human and machine-readable standard format that has been developed for CITATION files.

Adding Contributor Guidelines#

If you are hoping for community input on your software, it is a best practice to include CONTRIBUTING and CODE_OF_CONDUCT files in your repository that outline expectations for member interactions.

We won’t go into these in detail here, but you can check out the Xarray package’s GitHub repository for a good example.

Who: Roles and Responsibilities of the Team Members in Implementing the SMP#

When writing a SMP, it’s important to include a plan for the roles and responsibilities needed to share and (if applicable) maintain your code. Your community will consist of members in different roles – some actively engaged, some with only a passing interest. Sometimes, multiple roles can easily be done by one person (e.g., if you are just archiving a piece of code).

Some roles might include:

Who will add the code to a public repository?

Uploading the code
Assigning a license

Who will take care of code documentation

Writing a README
Adding explanatory comments to the code

Who will help with code reuse?

Adding CITATION, CONTRIBUTING, and CODE_OF_CONDUCT files

Who will maintain the software (if applicable)?

Who will respond to community input (e.g. via GitHub issues)?
Who will be responsible for making decisions about which code to add/update from other contributors? (e.g. via GitHub pull requests)

All of these roles may or may not be needed, depending on the size of your project. Have a transparent process for assigning any roles to community members.

Key Takeaways#

In this section, you learned the key steps in sharing open software:

Should you share? When sharing software, the policies of your institution and funding agency must be followed. These may limit the openness of the software. Software sharing policies also vary by organization.
When to share? Follow guidance from your organization, funding agency, or publisher.
Where to share? It depends on whether you are archiving or sharing for community input. Use domain-specific repositories where appropriate.
How to enable reuse? Enable reuse by assigning a DOI and including a license, citation information, and contributor guidelines.
Who helps share? Plan for the roles and responsibilities when sharing and (if applicable) for maintaining software.

Section 5: From Theory to Practice#

This section ties the concepts of open-access software development to the operation of a software management plan. The section also introduces you to the community aspect of open software. It begins with a discussion on writing software management plans and then continues with information on how to connect with open software communities. This information is contextualized with an introduction to the benefits of a software community and the roles involved in these groups. A list of communities is also presented, and you are asked to explore and engage with some of them. The section wraps up with helpful suggestions to contribute to open software and additional resources.

How Do We Plan to Make our Code Open?#

If you are planning a project that requires a management plan, writing that plan is a good first step. There is a threshold above which you should write a software/data management plan. “Software” here means scientifically or technically relevant computer programs as both source code and executable software.

SMP is required

You need a SMP to:

Propose for funding (e.g., NASA, NSF, and likely everywhere soon)
Collaborate on a team that intends to release code to the public
Successfully manage any large mission or project

SMP is not required

You probably don’t need a SMP if you are working on:

A paper by yourself (or with a very small group of collaborators)
The initial exploration of ideas or experimentation with analysis code
Education-focused activities

Perhaps your project does not fit into these categories. For example, if your aim is for your results to be reproduced by others, then writing a SMP is your discretion.

The following material assumes that you have met the threshold and are writing a data/software management plan.

If you are applying for funding, it is almost guaranteed that there will be specific data management requirements detailed in the funding opportunity. For example, the funder may require a certain license or use of a specific repository. Make sure to cross-reference your plan with these requirements.

Examples of Software Management Plans

Policies

What are the policies for a SMP? (what does the funding agency say to do?)

Data formats
Plan for data/code archival/preservation
Roles and responsibilities

Funding Agencies

Scientific funding agencies generally solicit peer reviews to support funding decisions. These reviews explicitly or implicitly evaluate related open software. Community participation is necessary to arrive at consensus regarding community standards for funding.

For example, NASA policy explicitly states that “funded software should follow best practices in the relevant open source and research communities.”

Established Open Software Policies of Professional Societies

Professional societies such as AAAS, AGU, AAS, etc., influence funding agency policies and directly influence the policies surrounding software used to generate publications. It is important to engage with the community via consensus papers and professional societies to guide policy decisions regarding open-source software in science.

Science/AAAS explicitly states that “In general, all computer code central to the findings being reported should be available to readers to ensure reproducibility.”

Institutions

The individual institutions where we work impose highly variable restrictions on open-source software due to security, privacy, intellectual property, commercial, or other concerns that do not necessarily align with the ethos of open science. It is important to engage with the institutional community to facilitate the movement toward policies that facilitate open-source software as a foundation of open science.

Activity 3: Discussing an SMP#

Estimated time for activity: 12 minutes. It is a group activity.

In this activity, review the SMP below and think about these questions:

What kinds of software does the SMP describe?
When will it be shared?
Where will it be shared?
How will it be shared so it is a citable artifact?
Who will be responsible for different aspects of the software?
What are some of the limitations for some of the software?
How does not having an agreed upon plan when you start code development have impacts years down the line?
Are results reproducible without the original IDL code?
Are there things in the example plan that you would add or be more specific about?

Discuss all of the questions in a group.

Example: Software Management Plan

1. Expected Software Types

We will use established simulation models to conduct initial simulations for this work. These simulation models are written in Fortran and developed over the last decade. While not publicly available, they are available for the project to use (private communication). The simulation models will lead to the generation of output files as described in the Data Management Plan (DMP). We will develop analysis software in Python to analyze the model output files, which will enable the development of derived data products, maps, and figures. Development of the Python analysis software will be shared on a GitHub repository.

2. Development of Analysis Software

All new development of Python code will be conducted openly on GitHub by members of this project. We will post and follow the established Code of Conduct for software development for our research project, which includes guidelines for contributions by additional members of the scientific community.

3. Repositories and Timeline for Sharing Software

This work will support the development of two peer-reviewed journal articles. All source code developed in Python to support each article will be archived on Zenodo no later than the article’s publication date. The software will be made available under a permissive Apache License 2.0. Zenodo will assign a DOI to the archived software when it is archived.

4. Software Sharing Exemptions

This work does not support further development of the existing Fortran simulation models, which are maintained independently. We do not have permission to publicly share the Fortran source code for the simulation models.

5. Roles and Responsibilities

Initial simulation modeling and the development of Python analysis software will be completed by PhD students and postdocs. The PI of this project holds overall responsibility for the execution of this plan.

Engage and Build Communities#

Open software communities are social learning spaces where individuals come together to learn a new skill, exchange knowledge and experiences, and then apply what they’ve learned from the community in their day-to-day work.

Communities offer:

A low entry point for learning and improving your use of software in research.
An opportunity to share individual experiences, identify common hurdles, and iteratively enhance knowledge and resolve problems.
A way to build the culture around open source software in science and a great way to keep updated on the latest tools and practices.
A non-hierarchical community of practice where all members of the community should be treated equally.

Connect with Communities#

Here are some communities that can help you get started:

Subscribe to and/or participate in forums (e.g., GitHub discussions, Stack Overflow, or discipline/software specific), in-person workshops, conferences, hackathons, etc., related to your discipline or software you contribute to or use. Connect on social media. And last but not least, talk with your colleagues!

Learn more about community building from this tutorial by The Turing Way team.

Activity 4: Browse Through Some of the Communities of Practice#

Estimated time for activity: 6 minutes. It is an individual activity.

Find and browse through the websites associated with two communities of practice listed in the previous section, “Connect with Communities,” or explore more of the options if the proposed ones are not of particular interest to you.
Identify at least two points of entry for engagement, e.g., an upcoming event (virtual or in person), how you could contribute, forums, etc.

Contribute to Open-Source Software#

Contributing to open software provides many advantages and opens doors to a number of rewarding opportunities. There are few other industries that can boast the massive number of global contributions that the open-source community can. Contributing to open-source software is a great way to improve your coding skills and document your work while growing your community.

There are several types of contributing to open software. Not all of them require writing actual code:


Add New Features	The most obvious case for contributing to open software is enhancing its usability by adding new features.
Fix Bugs	Alternatively, you can reply to an already opened issue by fixing it.
Report Issues and Make Suggestions About Improving Code	Reporting an issue is a valuable contribution, even if you don’t know how to fix it. For example, you might be using a different browser in which the software has not been tested yet, have discovered a particularly uninformative error message, or be unable to feed a valuable user experience back to the developers that can help to improve the overall usability of the software.
Improving and Contributing to Documentation	Contributing to documentation constitutes a great starting point for contributing to open-source software and is often overlooked in its importance. Writing documentation allows you to familiarize yourself with the use of the software while helping to teach others.
Create Tutorials, Use Cases, or Visuals	Another way to contribute is to make your experience and use of the software publicly available. For example, you could create a tutorial based on your use of the software, summarize a use case, or provide a summary of your use in a graphic. This part of contribution is particularly appealing as it does not create much extra work to just publish what you have used the software for.
Improve Layout, Automatization, Structure of Code	Apart from creating new code, a good way to contribute to open-source software can also be to improve, restructure or automatize existing code. This is called refactoring and helps to make the software project more effective and stable.
Organize and Attend a Community Meet-Up	Another way to contribute to open-source software is via community building. Many software products and toolboxes have a lively community of users that meet on a regular basis in person and online to discuss and improve the software and its use. Participating in or even organizing such a meetup can be a good way to improve your knowledge of the software, get to know its community, and contribute to open-source projects.
Code Review	Requests to integrate new contributions into the main code base usually require a review of the contribution by at least one other user. Similar to peer review, code review entails writing a short summary about the quality of the code and making suggestions about improvements.

Additional Resources#

References and Guides#

In addition to the resources listed elsewhere in this training, the community resources below are excellent sources of information about Open Software.

Additional Training#

In addition to the resources listed elsewhere in this training, the below resources represent additional training on Open Source Software.

A Journal with Thousands of Open-Source Research Software Success Stories#

The Journal of Open Source Software has presented a venue for enhancing the quality and minimizing the effort of publishing open-source research software:

Peer-reviewed, open source “journal” covering open source research software published via GitHub.
The emphasis is on the software.
Published thousands of open-source research software projects, several of which are highly cited. JOSS is one of several journals. Click here for a list of many more journals that publish software.

Key Takeaways#

In this section, you learned:

When an SMP should be written and that your funding organization or institution may have rules around how you develop and share your code.
That joining software communities can be a great way to exchange knowledge and learn new skills around open code.
That there are many ways to contribute to open code, and that not all of them require writing code.

Summary#

After completing this day, you should be able to:

Explain what open-source software means, including the software development cycle, the benefits, some common limitations, and how they are addressed.
Assess open-source software for reuse by evaluating provided documentation, including README files and licensing details, and then cite the software appropriately.
Create an open-source software management plan that includes the strategy for selecting open software dependencies and open repositories and how open elements, including metadata, README files, and version control, will be included to make the software reusable and findable.
Evaluate whether your open-source software can be shared and the best options for sharing to increase visibility.
List the responsibilities a software developer has once the open-source software is shared, including managing legal requirements and ensuring the software is maintained.

Day 4: Open Code

Contents

Day 4: Open Code#

Tutorial Objectives#

Section 1: Introduction to Open Code#

Success Stories#

Definitions and Considerations of Open Code#

What is Code vs Software?#

What is Open Source Software#

Types of Software#

Principles, Benefits, and Challenges#

Principles of Open Code#

Benefits of Moving to Open Software#

Challenges of Moving to Open Software#

When Not to Share#

Licensing Code#

Software Management Plans (SMP)#

Open Code is a Spectrum#

The Practice of ‘Open’#

Key Takeaways#

Section 2: Using Open Code#

Discovering Open Code and Software#

Open Software Discovery Depends on Developers Following FAIR Principles#

How to Search for Open Code#

Know Where to Search#

Where to Look Depends on What You Need#

Open Software is Aggregated and Searchable in Repositories#

Examples of software repositories are:#

Activity 1: Find Code For Your Research#

Assessing Open Code and Software#

Four General Considerations for Assessing Open Software#

Functionality: Assessing Scientific Utility#

Interoperability: Ease of Use#

Factors for assessing the quality of open source software#

Security: Considerations When Using Open Code#

Licenses#

Reusing Open Code#

Selecting the Appropriate Version for Reuse#

Resolve Problems in Reusing Software#

Citing and Acknowledging Open Code Use#

Key Takeaways#

Section 3: Making Open Code#

How do We Plan for Making Code?#

Starting a New Project#

Organizing a Project#

Importance of Version Control#

Describing Our Code to Others#

README#

Contributor Guidelines#

Code of Conduct#

Code Documentation#

Code Level Documentation for the User#

Programming and Documenting#

What License Should We Choose for Our Code?#

Licensing Considerations when Using Open Software#

Some Common Types of Software License#

Types of Open-Source Software Licenses#

Common Licenses for Open Software#

Programming Best Practices#

Code Review#

Testing#

Minimizing the Risk of Security Vulnerabilities#

Creating FAIR Software#

Additional Helpful Tips#

Key Takeaways#

Section 4: Sharing Open Code#

Planning to Share Your Code#

Open Source Code Development#

Archiving Open Code#

Should You Share Your Software?#

Legal and Security Concerns#

Sharing Software Created with US Agency Funding#

When: The Schedule for Code Archiving and Sharing#

Where: Where To Share Open Code#

How: How to Enable Reuse of Code#

Assigning a License#

Making the Code Citable#

Activity 2: Create a DOI for a Test Code File#

Making it Easy to Cite Your Code#

Adding Contributor Guidelines#