Skip to main content

Requirements for primary data for sharing with Database Manager team

  1. Do not share patient protected health information (PHI), e.g., MRN. Create a new unique identified in place of MRN for each record/observation. Be sure you can link back the data shared to the source based on the new ID, by creating a cross-referencing system for MRN and your new unique identifier.
  2. No color coding (no formatting).
  3. No additional header rows, just variable names.
  4. Include variable/column name/labels for all columns/variables.
  5. Include variable definitions in a separate spreadsheet for each variable – Do not use Excel Comments functionality to provide data or data dictionary.
  6. Do not use special characters to signify analytical data (i.e., using * in the study ID to specify case vs control).
  7. Do not put comments, words or characters in date or numeric fields.
  8. Do not put lab result and date test conducted in same column.
  9. For date and number fields, only one value per column. i.e., if a lab test was run multiple times and all values are analytically relevant, each result should be in a separate column.
  10. Use consistent formats for all variables, e.g. one format for date values, currency values, etc.
  11. For ages given in months or days (especially for pediatrics), do not convert to years. If dates are available, provide just the dates in the same format.
  12. Do not use the same column header for multiple columns.
  13. Remove PHI (protected health information) before sharing.
  14. Audit and edit data for missingness and fill in all missing values with available information before sharing, if missing leave blank.
  15. For efficiency, share only what is considered complete data for analysis.
  16. Share data in csv, xls, xlsx format.
  17. Variable names should be less than 20 characters to allow analytical software to work with them more efficiently. This will also save us some storage space on the disks. Same variable names in the data file must be the same in the data dictionary.

 

Example of data table for sharing:

Variable 1 (unique record identifier-study id) Variable 2 Variable 3 Variable 4 Variable 5 Variable 6 Variable 7
Observation 1
Observation 2
Observation 3
Observation 4
Observation 5

 

Example data dictionary

Variable Definition Values
Variable 1 (unique record identifier-study id) Unique identifier for observations 1-152
Variable 2 Age of patient 18-104 years
Variable 3 Treatment category received 0-no treatment received

1-chemotherapy only

2-radiation therapy + chemotherapy

3-radiation therapy only

Variable 4 Etc etc
Variable 5
Variable 6
Variable 7
Variable 8
Variable 9