Glossary of open science and open publishing
means online access to publications/data that are made freely available immediately or with a limited time embargo, subject to copyright and intellectual property rights.
is open access implemented through repositories and online repositories. An article, monograph or data is deposited by the author or his/her representative (called self-archiving) at the time of publication or immediately after publication.
means publishing in journals and books with immediate open access, with the financial costs of publication (APC/BPC, Article Processing Charges/Book Processing Charges) borne by the author/institution/grant agency).
are "educational and research materials in any medium, digital or otherwise, that are in the public domain or have been published under a licence that permits free access, use, adaptation and redistribution with no or limited restrictions".
refers to scientific peer review mechanisms in which either the identity of the reviewer or the author is not hidden. The actors in the review and publication process are not anonymised and the reviews are openly accessible.
is characterized as data that can be freely used and subsequently redistributed by anyone, but with the sole condition of attribution and citation. We consider it to be digital information that is available at any time and to any user. It is data or content that is freely accessible without any restrictions.
The key characteristics of open data are:
- Availability and access: the data must be accessible, as far as possible understandable and easily downloadable.
- Reuse and redistribution: data must be subject to reuse and redistribution.
- Universal participation: everyone must be able to use and redistribute information without discrimination.
However, scientific outputs should be as open as possible and as closed as necessary. In some cases, access to data may be restricted for reasons of national security, data confidentiality, privacy or respect for the object of study. This includes, for example, legal processes, trade secrets, intellectual property rights, personal information, protection of human subjects or endangered and rare species
is data in any format or form collected, observed, generated, created, and obtained throughout the course of a research project. This includes numerical, descriptive, audio, visual, or physical forms recorded by an institutional staff member, generated by equipment, or derived from models and simulations.
or DMP (from Data Management Plan) is a formal document that describes how research data will be handled both during and after the research process. The data management plan specifies how the data will be stored, secured and managed, and adherence to FAIR principles is recommended.
form a set of guidelines and standards in the world of science and research that should be considered when publishing research data. FAIR is an acronym consisting of four words indicating that shared research data must beFindable,Accessible,InteroperableandReusable.
The term open data is associated with FAIR data, which is an essential part of open science and describes some of the central principles of good data management and open access to research data. These principles focus mainly on machine readability of data, but also on human understanding of research data to enable its reuse. The FAIR principles were first published in 2016. They have been adopted by the European Union, but also by a number of other organisations, including universities and various research institutions.
FAIR is an acronym consisting of four words that stand for data must be:
FINDABLE
The first step in using data is making it findable. Metadata and data should be easily findable by both human and computer.
-
Metadata should be assigned a persistent identifier. A persistent identifier helps to remove ambiguity in published data.
-
Data should be described by metadata that includes information about the context, quality and conditions or characteristics of the data. This helps to better locate the data and make it more reusable and citable.
-
Metadata should be registered or indexed in a findable source, as identifiers and metadata descriptions alone do not guarantee their easy searchability on the Internet.
ACCESSIBLE
If a user finds the data they want, they need to know how to access it.
-
Metadata can be found by its identifier and accessed using a standardised communication protocol (http). This protocol is open and universally implementable.
-
Metadata should be accessible even if the data is no longer available. Data files can degrade or disappear over time and metadata storage is generally easier and cheaper.
INTEROPERABLE
Metadata should use a formal, accessible, shared and widely applicable language for knowledge representation. Ontologies, thesauri and data models should be used to ensure automatic discoverability and interoperability of datasets.
-
Metadata should also include references to other metadata. The goal of making connections between metadata sources is to better understand the data context.
REUSABLE
The main goal of FAIR principles is to optimize data reuse. Therefore, both data and metadata should be well described.
-
Metadata should be richly described with precise and relevant attributes so that it can be used in that particular context. One important attribute is clear and accessible licenses about the use of the data. In order for others to reuse the data, to know where the data came from and how to cite it, accurate information about the origin of the data is also essential.
-
If the data is similar, it is also easier to reuse it. This means the same type of data, data organized in a standardized form, established and sustainable formats, and the use of controlled vocabularies. Therefore, if these 'community' standards or examples of good practice exist, they should be followed.
is an electronic copy of a publication. It can be a preliminary document before it is peer-reviewed and published (preprint), the final version of a peer-reviewed manuscript accepted for publication after the peer-review process (postprint), or an already published publication.
is the data used to describe, search and manage publications. These are generally the title of the publication, author data, journal/proceedings name, publisher data, date of publication, DOI, ISBN/ISSN, etc.
generally includes software whose source code is publicly available in a human- and machine-readable way in a modifiable format under an open licence that grants others the right to use, modify, distribute, create derivative works from, share the software, its source code or design.
covers public participation in scientific projects. Through citizen science, citizens can participate in many stages of the scientific process, from the design of a research question, to data collection, data interpretation and analysis, to the publication and dissemination of results. Citizen science has the potential to raise awareness of science and strengthen collaboration with the scientific community, opening up new opportunities not only for the development of science but also for the development of society.
or article factories, which are associated with buying authorship in scientific articles and 'manufacturing' manuscripts for sale. Article factories are a new phenomenon related to testimonial communication and publishing. Their fraudulent practices particularly affect journal editors, who have to implement effective procedures and find efficient tools to catch fake manuscripts within a set publishing process. The non-profit organisation Committee on Publication Ethics (COPE) defines article factories as "profit-driven, unofficial and potentially illegal organisations that produce and sell fraudulent manuscripts that appear to resemble genuine research.
This is a digital platform that stores research results and provides content free of charge, online and permanently. According to the Directory of Open Access Repositories, most of the content that is freely available (83.2%) is available through institutional repositories.
Examples of repositories:
-
Zenodo - A universal open access repository operated by Cern. It serves primarily scientists and researchers as a repository for research data, documents or other research outputs. Data in any data file can be uploaded to the repository and each file is automatically assigned a Digital Object Identifier (DOI). The repository was established in 2013 as an open science repository without the need for institutional inclusion.
-
Re3data - A global registry of research data repositories covering research data repositories from different scientific disciplines. It offers researchers, organizations, libraries and publishers an overview of existing international research data repositories. Re3data promotes a culture of sharing, better accessibility and visibility of research data. The repository was officially launched in May 2013.
-
OAPEN Library (Open Access Publishing Network) - A central repository for hosting and disseminating OA books, it provides services to researchers, publishers, libraries and research funders in the areas of hosting, deposit (repository), quality assurance, dissemination and digital preservation. Publishers are required to publish scholarly, peer-reviewed books, freely available or under open license. Publishers can find various guides, information on services, review policy, ERC conditions for publishers, etc. on the website
-
RoarMap - Registry of Open Access Repository Mandates and Policies(RoarMap) is an international database mapping the growth of individual open access mandates and policies adopted by, for example, universities, scientific institutions and research funding bodies that require their researchers to provide open access to their peer-reviewed outputs by depositing them in an open access repository.
-
ROAR- Registry of Open Access Repositories.