RefSeq Processing Pipelines.
Image source: RefSeq Handbook.

Quality Control: Data

The variety of internet-based biological databases makes it difficult to list quality control methods. Some general factors to consider include:

  1. For a standalone database or repository:
    1. The type of databased being accessed and whether it is curated. If so, what are the curation criteria?
    2. What are the quality assurance procedures implemented by database developers?
    3. If the data are subject to change, how often is the database updated?
    4. Are metadata available describing experimental details that are not captured in the database?
    5. Is it possible to evaluate the quality of the data within the database?
    6. If data are edited in the curation process, are raw data also accessible?
  2. For a data warehouse or data integration tool, all of the above apply to the collaborating databases. Additional considerations include:
    1. Frequency of data updates from collaborating databases.
    2. Quality assurance procedures to ensure that data from collaborating databases are accurately reported by the data warehouse/integration tool.
    3. If the database is new, it might worth double checking results of simple queries with collaborating databases using the same query.
  3. Provide user-feedback to database managers helps to maintain data quality.