Skip to Main Content

Research Data Management and Sharing

This guide addresses good practices for research data management and sharing.

Choosing a Data Format

The format of your research data files may be determined by the equipment, computer hardware, and/or software you use.  In general, recommended file formats that best support sharing, reuse, preservation, and interoperability are non-proprietary, software-neutral, unencrypted, uncompressed, and in common use in the research communities.‚Äč

The Library of Congress has made available a detailed list and a summary of recommended file formats.  Here is an overview of the recommended formats:

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MP4, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PNG, GIF
  • Tabular data: CSV
  • Text: XML, EPUB, PDF/A, RTF, ODF, TXT
  • Web archive: WARC

When considering data file formats, you may want to evaluate the following:

  • Is the data reliant on proprietary software to access it? If yes, preserving a copy in an open/non-proprietary file format will help ensure that you and others can access it in the future. Consider creating a copy of the original data with a copy of the software if possible.
  • If you plan to deposit your data in a repository, does the repository have specific guidelines or requirements with respect to the file format? If yes, create a copy in the required format for deposit and document the conversion for users.
  • Will converting to another file format modify the data or cause a loss of features? If yes, consider creating a copy in an open/non-proprietary format but preserving the data in its native format.

Naming Data Files

Planning how you are going to name your files will help you organize and find the files and avoid mistakes or duplication.  Suggested principles for developing a naming convention are as follows:

  • Be consistent 
  • Give each file a brief and descriptive name, which may include a combination of elements such as date, location, and researcher's surname
  • Put the elements in a particular order to facilitate sorting
  • Use underscores or hyphens instead of spaces to separate the elements
  • Use letters and numbers instead of special characters like  ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “)
  • Standardize the date for chronological sorting, i.e., YYYYMMDD or YYYY-MM-DD
  • Include a number at the end to indicate the version (if applicable), e.g.,
    • Original document: Bisondata_1.0 
    • Original document with minor revisions: Bisondata_1.1 
    • Document with substantial revisions: Bisondata_2.0 
  • Document the naming convention
  • Ensure that all members on the research team understand and use the convention

Harvard Medical School has provided helpful guidance on file naming conventions.  Also, Kristin Briney at California Institute of Technology has created a worksheet to help researchers come up with naming conventions.

A comparison of data sorted by date, with one column using a standardized data format of YYYYMMDD

Versioning

Versioning should be taken into consideration in the process of developing a naming convention. The ETDplus project offers the following guidance for version control.

A simple method to designate a revision is to note it at the end of the file name. This way, files can be grouped by their name and sorted by version number. For example:

  • image1_v1.jpg
  • image1_v2.jpg
  • image2_v1.jpg
  • image2_v2.jpg
  • ...
If you use version numbers, one issue that can arise is that computers will sort files based on the position of the characters. This can lead to strange, unhelpful results. For example:
  • image1_v1.jpg
  • image1_v10.jpg
  • image1_v2.jpg
  • ...
A good practice that helps you avoid these problems is to use dates to designate version numbers. If you choose this strategy, format dates as year-month-day (20210930 or 2021-09-30). Using this order will help avoid confusion when collaborating with researchers or systems that use a day-month-year or month-day-year.  Also, you can sort the versions in chronological order. For example:
  • image1_20151021
  • image1_20151214
  • image1_20160123
  • ...
If your files are created and/or edited collaboratively by a team, you can consider incorporating the team members' names or initials into the file naming conventions.  This way, you will have a better idea of who on your team has created which version of the file.  For example:
  • dataset1_20160402_KES
  • dataset1_20160501_WTC
  • dataset1_20160814_GSC