The Seven C’s of Data Curation for the Two C's—Command and Control

February, 2015
IDA document: D-5441
FFRDC: Systems and Analyses Center
Type: Documents
Division: Operational Evaluation Division
Agre, Jonathan R.; Vassiliou, Marius S.; Gordon, Karen D. See more authors
Many important and complex C2 activities require the use of disparate data sources (structured and unstructured) that are time varying, at various levels of quality (completeness, accuracy, etc.), and of ambiguous origins. Currently, dealing with such disparate data is manually intensive and expensive, in large part because of problems with the quality of the data and its ability to be quickly processed. Data curation can enable automated data discovery, advanced search and retrieval, improvement in the overall data quality, and increased data reuse. The process can be described using what we call the "Seven C's" of data curation: (1) Collect—Interface to the data sources and accept the inputs; (2) Characterize—Capture available metadata; (3) Clean—Identify and correct data quality issues; (4) Contextualize—Provide context and provenance; (5) Categorize—Fit within framework that defines the problem domain; (6) Correlate—Find relationships among the various data; and, (7) Catalog—Store and make data and metadata accessible with application program interfaces (APIs) for search and analysis. The benefits of the data curation process are a reduction in problem-solving time, improved data quality, increased confidence in solutions, reduced time and manual effort to perform the curation itself, and the ability to solve problems that were previously too complex or time-consuming to solve because of data problems.