Converting non-digital to digital data
Not all research data are born digitally. Many projects still rely on physical materials such as handwritten notes, printed documents, audio tapes, or physical artefacts. To ensure that such materials remain usable, shareable, and preserved for the long term, they can be converted into digital formats using appropriate technologies.
As a general guideline, primary research data and associated materials should be retained for at least ten years after the completion of the study.
Digitizing text
Paper-based materials such as notes, manuscripts, or printed reports can be easily scanned and stored as digital images. When stored as searchable PDFs, Optical Character Recognition (OCR) software can be used to convert the text within scanned images into editable, machine-readable text. This makes the material easier to search, analyze, and reuse in digital environments.
Digitizing video and audio material
Video and audio recordings can be converted into digital media files using a variety of hardware and software tools. These digitized versions preserve the content while reducing the risk of deterioration over time. If the content of speech is the main focus, recordings can be transcribed into text files and the original audio or video files archived or deleted, depending on ethical or project-specific requirements.
Digitizing physical objects
Researchers working with tangible materials such as manuscripts, textiles, artefacts, or artworks often rely on digital technologies to document and study them. When the object cannot be scanned directly, high-resolution photography is often the best approach. Each image should be reviewed to ensure:
- it accurately represents the original object
- it provides sufficient detail and clarity
- it is stored in a format suitable for long-term preservation and reuse
Why digitize?
Digitizing research materials offers multiple accessibility, preservation, and efficiency benefits:
- A well-planned digitization process enables researchers and collaborators to share, compare, and access materials easily and effortlessly.
- Digitization reduces or eliminates the need for printing, photocopying, and physical storage, lowering long-term project costs.
- Digital files can be accessed anytime and anywhere through secure cloud services or institutional networks.
- Access controls and user permissions can be applied to restrict or monitor data use, protecting confidential or sensitive materials.
- Physical objects and paper-based records degrade over time, whereas digital copies ensure that valuable information remains intact and usable.
Sensitive data
Managing sensitive data
Sensitive data refers to any information that must be protected from unauthorized access or disclosure. Such protection may be required by law, ethical standards, or institutional policy, particularly when data involves personal privacy, security, or proprietary information.
Sensitive data can take many forms, for example:
- Personal identifiers, including names, national IDs, or contact details, as well as information on health, cultural or economic characteristics, and geolocation data.
- Security-related data: Passwords, cybersecurity keys, or data relevant to national security.
- Confidential data: Business or financial information, unpublished research, and data protected by intellectual property rights.
- Combined datasets: Data that, when merged, could indirectly identify individuals or reveal sensitive information.
- Sensitive metadata: Even metadata can contain sensitive information, such as details that identify individuals or institutions.
Ethical and legal responsibilities
Sensitive or personal data must be managed with informed consent, transparency, and care throughout the entire process from data collection to long-term storage. Special attention should be given to collecting, processing, handling and storing data throughout the research process. In particular, textual, image and sound data that contains personal data with which a living person can be identified has to be handled with care. This concerns both direct identifiers, such as names, addresses, or photographs and indirect identifiers, including workplace details or other contextual information that could reveal a person’s identity when combined with other data.
The European Union has strong data protection rules, including the General Data Protection Regulation (GDPR), which applies to personal and certain types of non-personal sensitive data.
How to prepare sensitive data?
Anonymization
Permanently removing all identifiers, ensuring individuals cannot be re-identified in any way. Once data is truly anonymized, it is no longer considered personal data under GDPR.
Pseudonymization
Replacing personal identifiers with pseudonyms or codes. While this allows tracking data back to its origin through additional information stored separately, the data remain legally sensitive, since re-identification is technically possible, but they are considered secure as personal identifiers are not directly linked.
Encryption
Converting data into a coded format that can only be read with a decryption key. It is an essential method for protecting data during storage and transfer. Effective encryption relies on strong algorithms and secure key management. However, encrypted data may be less reusable if the encryption prevents legitimate access.
If these three are not viable options, the dataset should not be made publicly available. Instead:
- Archive it under a closed or restricted license in a data repository.
- Publish metadata only, describing the dataset and access conditions. This maintains transparency while protecting sensitive content.
Help and further information
OpenAIRE – Open Access Infrastructure for Research in Europe: Sensitive data guide
Amnesia – An OpenAIRE anonymization tool
DATICE – The Icelandic Research Data Service guide on access control