Metadata is structured information that describes a dataset and the project that produced the dataset. It provides the context and details by addressing the who, what, when, where, why, and how about the dataset. To make the dataset more findable, accessible, interoperable, and reusable (i.e., in compliance with the FAIR Data Principles), metadata often includes but is not limited to these elements:
This video from the National Archives of Australia provides a quick overview of metadata and its use in daily life.
Some disciplines have their own metadata standards that determine what information should be captured for dataset descriptions. However, there are also discipline-neutral metadata standards for general use. These two resources can help researchers find what metadata standards are available:
Additionally, some disciplines use specific controlled vocabularies to ensure accurate and consistent descriptions of research datasets. If you have any questions about metadata, metadata standards, and/or controlled vocabulary, feel free to contact the Center for Digital Scholarship.
A research dataset should have a Readme file that holds the metadata about the dataset. The Readme file can be a plain text file (with the .txt extension) or a sheet in a spreadsheet (with the .csv extension). It enhances the transparency of a research project and is the first file a researcher should look at when handling a dataset. If there are multiple files in a dataset, the Readme file offers information about the relations and hierarchy among the files.
Cornell University provides a Readme file template that indicates what information would be useful to researchers who may reuse a dataset. The Data Cooperative at the University of Arizona Libraries has provided Readme file examples for reference. This video from Harvard Library discusses the importance and uses of Readme files.
If you have questions about Readme files, feel free to contact the Center for Digital Scholarship.
A research dataset is sometimes accompanies by an auxiliary resource such as a codebook or a data dictionary. Survey researchers often use a codebook to document the layout and structure of the data file and to explain how data elements are represented by codes for categorization and analysis. A data dictionary defines the data elements in a dataset by specifying names, labels, units, constraints, and other characteristics. If your dataset includes R or Python code or scripts, the data dictionary should address the purpose of the code and how to use it to process the data. This article provides a good introduction to compiling a data dictionary while this article explains some principles of organizing data in a spreadsheet. If you prefer watching videos, the following two provide overviews of a codebook and a data dictionary.