Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Management: Data Documentation & Metadata

What is metadata?

  • Metadata is information created to describe something with the intent of aiding future discovery and retrieval of that thing.
  • A common example to help illustrate this point is to think of a library's catalog.  Information describing characteristics of a book (e.g., author, title, subject headings, & call number) are stored in a searchable computer system to assist patrons in identifying a book of interest (based on title, author, or subject headings) and then locate the book in the library building (based on call number).    
  • Creating metadata is important for making your data discoverable and accessible.

 

Jenn Riley (2004) Understanding Metadata: What is Metadata and What is it For? https://groups.niso.org/apps/group_public/download.php/17446/Understanding%20Metadata.pdf

Controlled vocabularies

  • The use of consistent language when developing metadata assists later users in identifying your item, this is accomplished through use of controlled vocabularies.
  • Preferred terms/spellings are chosen to represent distinct concepts in order to remove confusion which might be caused by synonyms and word variants.
  • Sometimes referred to as a taxonomy or thesaurus

There are several controlled vocabularies that are especially useful in biomedical research, including:

  • ICD (International Classification of Disease) - "The global standard for health data, clinical documentation, and statistical aggregation."
  • LOINC (Logical Observation Identifiers Names and Codes) - "The international standard for identifying health measurements, observations, and documents."
  • MeSH (Medical Subject Headings) - maintained by the National Library of Medicine, MeSH has long been used in the indexing of scientific articles. 
  • RxNorm - "RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software,"
  • UMLS (Unified Medical Language System) - "The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records."

File Naming Conventions

When naming files, here are some guidelines to ensure that your files are accessible long term

  • Be consistent and descriptive - use the same format for all files
  • Create a 'readme.txt' file with an explanation of the file naming system to guide those who may be trying to decipher your files in the future (for more information on creating a readme.txt file, see this page)
  • Be as descriptive as possible, consider including the following info in the file name
    • Date
    • Experiment name
    • Name/initials of researcher
    • Type of data
    • Version 
  • Enter dates as YYYYMMDD or YYMMDD, so that your files are arranged in chronological order
  • Avoid special characters (e.g., % ^ & * $ #) as they may be misread by some systems
  • Avoid spaces in file names, try to use one of the following
    • Camel case: FileName.txt
    • Snake case: file_name.txt
    • Kebab case: file-name.txt

File name example

  • DST_20201201_LB_SEQ.txt
    • DST - acronym for the experiment being performed
    • 20201201 - date on which the data was collected
    • LB - experimenter's initials
    • SEQ - type of data

 

Stanford Libraries (n.d.) Best practices for file naming. https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming

Elements to include in your metadata

Here are some of the general elements to consider including when you create metadata for your work.  However, it's also advisable to look at established schema to identify necessary elements based on specific need (see 'Metadata Schemas' box below)

  • Title
    • Name of the research project
  • Creator
    • Names and contact info of individuals/organizations responsible for data creation
  • Identifier
    • Unique identifier  for the project
  • Dates
    • Important dates associated with the project (start and end dates, timeframe covered by data, release dates)
  • Subject
    • Keywords which describe the data
  • Funders
    • Names of those funding the research
  • Rights
    • Any intellectual property rights associated with the data
  • Language
    • Language of the content
  • Location
    • Location where data was collected if important to the type of data
  • Methodology
    • Details on how the data was created - i.e., protocols, equipment, & software used

 

MIT Libraries (n.d.) Data Management: Documentation & Metadata. https://libraries.mit.edu/data-management/store/documentation/

Metadata schemas

There are numerous resources which can help in the creation of metadata, among these are schema which describe the various elements of metadata that should be included as well as a standard way of formatting those elements.

Here are some schema to consider

Electronic Lab Notebook (ELN)

Basics of Electronic Lab Notebooks (ELNs)

Paper-based lab notebooks have long been used in research labs as a means of tracking the progress of ongoing experiments.  In these notebooks, researchers often include written descriptions of experimental design/conditions, resulting raw data, as well as lab equipment printouts and images pertaining to their experimental results.

In recent years, the ELNs have been growing in popularity as an alternative to the traditional paper-based notebook.   ELNs provide the same basic functionality as their paper-based counterparts, but they can also have numerous additional benefits, including:

  • Increased findability of results through search features
  • Increased security by password protection
  • Automatic backup features to minimize chances of data loss
  • Improved data sharing among lab members 

Here are some resources to learn more about ELNs:

  • Harvard University (2020) Electronic Lab Notebooks. https://datamanagement.hms.harvard.edu/electronic-lab-notebooks
    • Contains a chart comparing features across many popular ELNs
  • Kwok, R (2018) How to pick an electronic notebook. Nature. https://www.nature.com/articles/d41586-018-05895-3

 

Jupyter Notebook

The Jupyter Notebook is a tool that allows users to create notebooks which combine narrative text along with active code elements. 

For more information, read:

The Himmelfarb Health Sciences Library
Questions? Ask us.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

The George Washington University