LibGuides: Research Data Management: Data publishing and pre­ser­va­tion

Publishing (meta)data in line with the FAIR data principles

Research data and related published research results produced at Hanken are open and available for shared use. Hanken's Guidelines on Open Science and Research (2021, p.4) states that "Hanken endeavours to ensure the findability and citability of the research data produced by the school’s researchers, while sees to that the degree of data openness and sharing is ethically and legally justifiable."

The openness and reuse of research data increase the visibility and impact of your research, improve data verifiability and research reproducibility, and contribute to attaining the SDGs in many aspects, for example, by saving time and resources in data production. Appropriate data management and carefully organized and described research (meta)data that are published for data retrieval and reuse are recognised and considered as part of a researcher’s academic merits. See Benefits of open data and data reuse below.

The FAIR data principles (findable, accessible, interoperable, and reusable) are a set of guiding principles to ensure your digital research data to be truly open and reusable. See FAIR data principles below.

FAIR data principles are mainly about metadata which appears in almost all the FAIR principles. Metadata are data about data and describe the context, content, structure, compilation, and management of your research data. It is through the metadata that your research data become visible, findable and first assessed for actual downloads and reuse. Creating appropriate and rich metadata is the key to making data open, understandable, and reusable. See Metadata and data documentation below.

When opening and publishing your research (meta)data, consider the following questions for your research data to go FAIR:

1. How to describe and publish the metadata of your research data?

It is strongly recommended to use Fairdata Qvain metadata tool to describe and publish your (meta)data in line with the FAIR data principles. Qvain is part of the Fairdata services offered by the Ministry of Education and Culture and produced by CSC. Data described and published by Qvain are transferred automatically to both Etsin (research dataset finder, also part of the Fairdata services) and Finnish National Research Information Hub (research.fi, a service also commissioned by the Ministry and CSC).

You can log in Qvain with your HAKA account, click CREATE DATASET, and fill in the form. Please see Qvain User Guide.
It is through the metadata that your research data become visible, findable and first assessed for downloads and reuse. Creating appropriate and rich metadata is the key to making data truly open, understandable, and reusable.

Open/FAIR data can increase the visibility and impact of your research, facilitate disciplinary and interdisciplinary collaboration, improve data verifiability and research reproducibility, decrease duplication costs in data production, improve knowledge sharing, and contribute to attaining several SDGs. The openness and reuse of research data are recognised as part of a researcher’s academic merits. See Benefits of open data and data reuse below.

Please note that even if you cannot publish and archive your research data, because, e.g., your data contain personal information, sensitive personal data or confidential data, you can still publish the metadata of your research data. The metadata of the data holding personal or confidential information can be published, although the actual data cannot be. You can publish a general description of your research and research data, and leave your contact information so that others can find your research and request for further information about it and your data.

Creative Commons CC BY 4.0 license is recommended for published (meta)data when possible.

2. Where will the research data be opened and published?

Research data are archived and published in a national or international repository, e.g., Zenodo, IDA or Aila, when possible.

Choose suitable repositories for publishing and archiving your data already at the beginning of the project and archive your research data when possible.
Recommended general repositories include:
- Zenodo by the OpenAIRE project and CERN.
- IDA, Data storage and archival service, also part of the Fairdata services. Please read the instructions on how to Apply for IDA storage space.
- Aila by the Finnish Social Science Data Archive (FSD). You can contact them directly asiakaspalvelu.fsd@uta.fi for further assistance.
Check specific repositories for one data type in re3data.org, a registry of research data repositories covering over 2,000 repositories.
Criteria for choosing a repository include:
- Choose a repository which uses persistent identifiers (e.g., DOI, URN) for metadata and, if applicable, for the underlying data. See The use of Persistent Identifiers for Research Datasets: Recommendation by the Finnish Scientific Community for Open Research.
- A repository which publishes machine-readable metadata and uses a known metadata standard.
- A repository widely used by your colleagues. Also check the recommendations of the publishers, learned societies, and funders in your own field.
- A repository which allows you to choose the terms of use and internationally standardized licenses under which the data can be reused, and states them clearly as part of the metadata.
Define an appropriate access type (open, embargoed or restricted) to research data based on the feature of the data, your research process, need for the protection of trade secrets and other confidential data, and intellectual property agreements, as well as funders’ and publishers’ requirements.
If your data has long-term value, consider preserving your data in Digital Preservation Service for Research Data maintained by the Ministry and CSC. See Long-term preservation of data below.

3. What part of the data will be opened and published?

Anonymised data are published and archived in a data repository for shared reuse whenever possible.
According to Data Protection Act (1050/2018, Section 4 (4) and GDPR (point (e) of Art. 6 (1), if processing research material containing personal data and processing personal data included in their metadata for archiving purposes is necessary and proportionate to the aim of public interest pursued and to the rights of the data subject, it is lawful. Pseudonymised data are still personal data. Restricted access can be used as a measure to archive pseudonymised data. The research participants need to be informed of your open data plans in the privacy notice. See Anonymize data prior to publishing and archiving.

4. When will the data be available? Do you need to set any embargo period?

5. Which license will you use to open and publish your (meta)data? Licensing is necessary for publishing data. It is recommended to use Creative Commons (CC) license CC BY 4.0 for published datasets when possible. See IPRs in data management.

6. Will some part of the data be and erased and destroyed? See Data erasure and (meta)data publishing.

7. Other important issues in publishing research data include using open, standard, interchangeable, and non-proprietary data formats, sensible and consistent file naming conventions, well-organised directory structure, and version control. See Data formats and organizing.

8. Remember to register your datasets in Hanken's research database - Haris and add the persistent identifiers (PIDs, e.g., DOI and URN) for your metadata (for example, from Qvain) and for your datasets in the repository where you have archived and published your datasets.

If a publication has a related dataset, create two separate records in Haris – one for the publication and one for the dataset. The records can then be connected under Relations to other content in the template.

The information you have registered in Haris about your datasets will be displayed on Haris public portal under Dataset.

More information, please see Register your datasets in the LibGuide on Haris.

Contact openresearch@hanken.fi or haris @hanken.fi if your have questions about publishing your research (meta)data or reporting datasets in Haris.

FAIR data principles

The FAIR data principles, formulated by Force11, are the guiding principles on how to make data truly open. FAIR is the acronym for "findable, accessible, interoperable, and reusable":

The FAIR data principles can be formularized as “Findable + Accessible + Interoperable = Reusable.” Making data reusable, and reusing and benefiting from existing datasets, are the fundamental motives of open data. A FAIR + (FAIR and Reproducible) solution is also promoted (See Christophe Bontemps and Valérie Orozco. 2021. “Toward a FAIR Reproducible Research”, in Abdelaati Daouia and Anne Ruiz-Gazen (eds.) Advances in Contemporary Statistics and Econometrics. Springer International Publishing.)

FAIR is not equal to open or free. Data can be closed and paid for yet perfectly FAIR, while data that are open and free are often not FAIR, and thus regarded as being cost-inefficient and re-useless.

The FAIR data principles are mainly about metadata which appears in almost all the FAIR principles. It is through the metadata that your research data become visible, findable and first assessed for downloads and reuse. Creating appropriate and rich metadata is the key to making data open, understandable, and reusable.

In order to ensure that your research (meta)data are FAIR, follow the following steps:

Save your data in a open file format such as Rich Text Format (.rtf) or .csv. These are more interoperable and less subject to loss and obsolescence than proprietary formats.
Archive your data in an established digital repository at the end of the project. Remember to choose a repository that provides a persistent identifier (PID), such as DOI or URN.
Create descriptive metadata for the data. Most of the FAIR data principles concerns metadata. It is crucial to describe and document your research data to make them truly open and reusable. See Data documentation and metadata in the following section.
License your data with a license that clearly state the conditions and restrictions for reuse.

It is recommended to use the Fairdata services offered by the Ministry of Education and Culture and produced by CSC for data management, data storage, metadata creation, dataset dissemination and distribution as well as digital preservation of research data. The services include:

IDA, Research Data Storage – Safe storage for research data.
Qvain, Research Metadata Tool – A metadata tool for describing and publishing datasets.
Etsin, Research Dataset Finder – Discover, access and download research data from all fields of science.
PAS, Digital Preservation Service for Research Data – Reliable preservation of digital information for decades or even centuries.

Read How to make the research dataset FAIR? and learn more about the Fairdata services.

More information, see:

Metadata and data documentation

Data documentation means describing the data, is data about data, and provides information about the who, what, when, where, why, how of the data. Investing time in documenting the data makes it easy to understand them for both others and yourself, and decrease the risk of false interpretation of the data. Data documentation can be a readme file (human-readable) and metadata (computer-readable):

Readme files are text documents (e.g., in the format .txt) providing information about data files to ensure they are interpreted correctly. A readme file explains what data a research project has, how the data were created, where the data originate from, how to interpret them, what the abbreviations mean, what software is needed to use the data, how the data have been modified, and can include information about the title, creator, funder, relevant dates of data collection and publication, location, methodology, subject, file formats, file naming system and folder structure, data version, licence, and repository.

Write a readme file about your data and data files. Put the readme file in the most obvious place in the data file folders to ensure that it can be noticed and seen immediately.

Metadata are technical data that describe a research dataset. When making data FAIR (Findable, Accessible, Interoperatable, and Reusable), metadata play the key role. Systematically described research data is the key to making your data understandable, findable and reusable.

Metadata should be machine-readable and machine-actionable. That is, data need to be richly and systematically described in the way that machine can interpret and navigate all the metadata and linked data across different websites, and retrieve and transmit the right ones for a person conducting semantic queries. There are standard methods available for data documentation called metadata standards, which should be used if suitable for the data. The Fairdata Qvain metadata tool makes describing and publishing research data smooth and effortless for researchers without requiring technical skills.

It is strongly recommended to use Fairdata Qvain metadata tool to describe and publish your (meta)data. Qvain is part of the Fairdata services to support your research data to go FAIR. Data described and published by Qvain metadata tool are transferred automatically to Finnish metadata warehouse Metax, which is integrated with both Etsin (research dataset finder) and the Finnish National Research Information Hub (in Finnish: Tutkimustietovaranto, a service also commissioned by the Ministry of Education and CSC).

You can log in Qvain with your HAKA account, click CREATE DATASET, and fill in the form. Please see Qvain User Guide.

If you cannot publish and archive your research data, because, e.g., your data contain personal information, sensitive personal data or confidential data, you can still publish the metadata of your research data. The metadata of the data holding personal or confidential information can be published, although the actual data cannot be.

More information, see:

Data documentation by CSC
Data description and metadata by the Finnish Social Science Data Archive (FSD)
Making a research project understandable - Guide for data documentation by Siiri Fuchs and Mari Elisa Kuusniemi at Helsinki University Library
Disciplinary Metadata by the Digital Curation Centre (DCC)

Long-term preservation of data

Long-term preservation means that data are preserved for several decades or even centuries. You can categorise your datasets according to the anticipated retention periods:

1) Data to be destroyed upon the ending of the project.
2) Data to be archived for a verification period, which varies across disciplines, e.g., 5–15 years.
3) Data to be archived for potential reuse, e.g., for 25 years.
4) Data with long-term value to be preserved by a curated facility for future generations for tens or hundreds of years.

Long-term preservation refers to the fourth category. That is, data are preserved for more than 25 years. When creating your data, consider how long it will be retained. Also remember to check discipline-specific, funder-related, and publishers' data retention time length requirements.

Finnish Ministry of Education and Culture has established Fairdata-PAS service (Digital Preservation Service for Research Data, DPS for Research Data) for Finnish research organizations for long-term preservation of the nationally most significant research data. The service is meant for digital preservation of research datasets that have significant value to the organization or on a national level currently and especially also in the future.

If you wish to sign up for the queue for DPS for Research Data, please contact openresearch@hanken.fi.

More information, see Digital Preservation (Fairdata-PAS): Guidelines for UH Evaluators by the University of Helsinki.

Benefits of open data and data reuse

Making research data open and reusable, and reusing and benefiting from existing datasets, are the fundamental motives of open data. The FAIR data principles can be formularized as “Findable + Accessible + Interoperable = Reusable.” The openness and reuse of research data:

Increases the visibility and impact of your research.
Are recognised as part of a researcher’s academic merits. Activities related to the promotion of good data management and the appropriate opening of research data are part of academic work and are valued and included as impact merits in research evaluation criteria of recruitments and career promotion decisions (National policy and executive plan on open access to research data, 2021, p.5; Hanken's Guidelines on Open Science and Research, 2021, p.7).
Speeds up the adoption of your research findings and the creation of innovations.
Facilitates disciplinary and interdisciplinary collaboration, both within the scientific community and in the wider social circle.
Improves knowledge sharing, and increases the transparency and reliability of science, both empowering and democratizing science.
Contributes to attaining several SDGs.
Reusing published data from previous studies not only saves time and resources in data production,
but also improves data repeatability and verifiability, research reproducibility, and the reliability of research outputs.

More information about the benefits of open data, see:

UNESCO Recommendation on Open Science (2021).
Open science and the SDGs in the LibGuide on Open science.

When reusing data, good practices for the attribution of authorship and data citation shall be followed. See Reusing and citing data.

Research Data Management

Publishing (meta)data in line with the FAIR data principles

FAIR data principles

Metadata and data documentation

Long-term pre­ser­va­tion of data

Benefits of open data and data reuse

Long-term preservation of data