Data Management

Responsbile data management begins with planning for data collection and continues after the work is published. It involves record-keeping in a way that insures accuracy and avoids bias; it guides criteria for including and excluding data from statistical analyses; and it entails responsibility for collection, use, and sharing of data.

Legally, data are usually the property of the institution, not the investigator. Research by investigators within an institution is supported either directly by the institution, or indirectly by funding awarded to the intstitution. Typically, products of work by employees of an institution are the property of that institution.

Records of research are necessary, and can have legal standing for a variety of purposes such as demonstration of priority for claims of intellectual property, ownership, or patent rights. In addition, nearly all aspects of misconduct allegations hinge on the extent and quality of documentation of the research.

Principles of Data Management

  • Integrity of research depends on the integrity of the data. Because data provide a factual basis for scientific work, the integrity of research depends on all aspects of the collection, use and sharing of data.
  • Integrity of the data is a shared responsibility. Although the ultimate responsibility falls to the principal investigator, all people involved in the research have an obligation to act in a manner that ensures the integrity of the data.
  • Research data should be shared with other scientists. Progress in science will be achieved most readily when information is freely exchanged. In most cases, an open data policy reflects postively on those who share and increases the likelihood for new insights, collaboration, and reciprocal sharing.


The responsible conduct of research includes considerations that begin even before data collection begins. Carefully designing the study so as to identify what data will be needed helps assure that resources are not wasted and that significant results can be obtained.

Data Collection and Recordkeeping

Because data collection can be repetitious, time-consuming and tedious, its importance can be underestimated. Care should be taken to assure that those responsible for collecting data are adequately trained and motivated, and that they employ methods that limit or eliminate the effect of bias, and that they keep records of what was done by whom and when.

Analysis and Selection of Data

The use of statistical methods varies widely among research disciplines. However, the research processes used to quantify confidence in accepting or rejecting hypotheses depend largely on industry-accepted statistical and experimental assumptions. Violation of those assumptions, or a misunderstanding of the methods of analysis, can result in significant misrepresentation of the results of a study, regardless of intent.

Because it is often impossible to report everything that has been done, researchers must make decisions about which studies, data points, and methods of analysis to present. The selection on what to present should be based on objective criteria, preferably ones specified before data collection. Critically evaluate the reasons for inclusion or exclusion of data, and the measures to be taken to avoid bias. Clearly document how the data were obtained, selected, and analyzed.

Responsibility for Data

Absent some explicit agreement, the principal investigator has primary responsibility for decisions about the collection, use, and sharing of data. Student or postdoctoral researchers working with a faculty or staff principal investigator should assume that their original data will stay with the principal investigator. However, most institutions have the expectation that graduating students may take copies of their research records. If regulations preclude researchers taking such copies, then the principal investigator is responsible for making this clear to members of the research group before work begins.

Retention of data

The quality of data supporting published work becomes moot if the data are lost. At a minimum, enough data should be retained to theoretically reconstruct what was done.

Any data stored will be rendered useless if there are insufficient records to locate and identify the material in question. Ease of access must also be balanced against security and confidentiality concerns. Although the institution is the legal owner of the data, it is usually the responsibility of the principal investigator to ensure that records are stored in a secure, accessible fashion.

Federal regulations and/or institutional guidelines determine how long data must be retained. For example, under current Health and Human Services requirements, research records must be maintained for at least three years after the final expenditure report. In general, formal requirements should be regarded as minimal contraints. Decisions about retention of records should take into account the extent to which a line of research is still being pursued, the likelihood of ongoing interest in the research, continued assurances of confidentialtity for any human subjects, and the space and expense necessary for storage.

Sharing of data

Although sharing of data is generally in the best interest of research, it is clear that there are risks involved for the individual researcher. There are several factors to consider when determining what to be shared, including loss of credit, compromise of confidentiality for human subjects, and expense and time to meet sharing requests. However, reasonable strategies to minimize potential problems should make it possible to accomodate sharing requests.

Before publication, it is best to maintain an open data policy with appropriate caution. After publication, be prepared to grant reasonable access to the raw data; that is, honor requests that are in the interest of scientific inquiry and can be accomplished without inordinate expense or delay.