Skip to main content

Research Data Management

This guide is designed to support research data management by providing guidance on good practices and pointing to additional resources.

Choosing a Data Format

The format of the digital data files you work with during your research may be determined by the research equipment and computer hardware and software that you use. However, depending on the format and the tools you are using, converting to a different file format may be best for preservation and access.

Recommended file formats that best support sharing, reuse, and preservation are formats that are open, software-neutral, unencrypted, uncompressed, and in use within disciplinary communities.‚Äč

Stanford University Libraries Data Management Services has made a useful overview of recommended file formats available: 

  • Containers: TAR, GZIP, ZIP

  • Databases: XML, CSV

  • Geospatial: SHP, DBF, GeoTIFF, NetCDF

  • Moving images: MOV, MPEG, AVI, MXF

  • Sounds: WAVE, AIFF, MP3, MXF

  • Statistics: ASCII, DTA, POR, SAS, SAV

  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

  • Tabular data: CSV

  • Text: XML, PDF/A, HTML, ASCII, UTF-8

  • Web archive: WARC

When making a plan of action for data, evaluating the following may be useful:

  • Is the data reliant on proprietary software to access it? If yes, preserving a copy in an open, sustainable file format will help to ensure that both you and others can access it in the future. Consider creating a copy of the original data format with a copy of the software if possible.
  • If you will be depositing your data in a repository at the end of your project, does the repository have specific guidelines or requirements with respect to file format? If yes, create a copy in the required format for deposit and document the conversion for users.
  • Will converting to another file format modify the data or cause a loss of features? If yes, consider creating a copy in an open format but preserving the original data format.

File Naming

Planning how you are going to name your files will help you find files easier, avoid duplication, and help you close-out projects quicker. When naming files remember to:

  • Be consistent 
  • Ensure that your project team knows the 
  • Give files a meaningful, descriptive name. A file name might include a combination of elements, such as type of equipment used, date, and researcher's surname 
  • Decide on the best order for elements in a file name; it will affect how the files are sorted
  • Be brief 
  • Use underscores instead of spaces to separate words/dates
  • Stick to letters and numbers (rather than special characters like  ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ‘ “)
  • Account for versions. The US Geological Survey recommends the following: Include a number behind the file name to indicate the version, e.g.:
    • original document: Bisondata_1.0 

    • original document with minor revisions: Bisondata_1.1 

    • document with substantial revisions: Bisondata_2.0 
  • Use date standard that can allow for chronological sorting: YYYYMMDD

A comparison of data sorted by date, with one column using a standardized data format of YYYYMMDD

Versioning

Versioning should be taken into account when developing the folder and file naming structure. The ETDplus project, led by the Educopia Institute, offers guidance for version control; the following guidance is taken from the ETDplus brief on version control, available on the project site:

A simple method to designate a revision is to note it at the end of the file name. This way, files can be grouped by their name and sorted by version number. For example:

  • image1_v1.jpg
  • image1_v2.jpg
  • image2_v1.jpg
  • image2_v2.jpg
  • ...
If you use version numbers, one issue that can arise is that computers will sort files based on the position of the characters. This can lead to strange, unhelpful results. For example:
  • image1_v1.jpg
  • image1_v10.jpg
  • image1_v2.jpg
  • ...
A good practice that can help you to avoid these problems is to use dates to designate version numbers. If you choose this strategy, format dates as year-month-day (20150930). Using this order will help avoid confusion when collaborating with other researchers or systems that use a day-month-year or month-day-year, and it will help your computer sort versions in chronological order. For example:
  • image1_20151021
  • image1_20151214
  • image1_20160123
  • ...
If the files you are using are created or edited collaboratively, you may want to incorporate names or initials into your file naming conventions so that you know which versions contain updates by each individual on your team. For example:
  • dataset1_20160402_KES
  • dataset1_20160301_WTC
  • dataset1_20160814_GSC