Data Quality

Data Quality

-

English

Description

This volume provides an expose of research and practice in the data quality field for technically oriented readers. It is based on the research conducted at the MIT Total Data Quality Management (TDQM) programme and work from other research institutions. This book is intended primarily for researchers, practitioners, educators and graduate students in the fields of computer science, information technology, and other interdisciplinary areas. It forms a theoretical foundation that is both rigorous and relevant for dealing with advanced issues related to data quality. Written with the goal to provide an overview of the cumulated research results from the MIT TDQM research perspective as it relates to database research, this book is an introduction to Ph.D.s who wish to further pursue their research in the data quality area.

Subjects

Informations

Published by
Published 01 January 2000
Reads 8
EAN13 0306469871
License: All rights reserved
Language English

Legal information: rental price per page €. This information is given for information only in accordance with current legislation.

Report a problem
Table of Contents
Preface
Chapter 1
Introduction
Fundamental Concepts Data vs. Information Product vs. Infomation manufacturing The Information Manufacturing System Information Quality is a Multi-Dimensional Concept The TDQM Cycle A Framework for TDQM Define IP Measure IP Analyze IP Improve IP Summary Book Organization
Chapter 2
Extending the Relational Model to Capture Data Quality Attributes
The Polygen Model Architecture Polygen Data Structure Polygen Algebra The Attributebased Model
xiii
1
1
2 2 3 4 4 5 6 7 12 13 14 14 14
19
19
20 21 22 23 25
viii
Data structure Data manipulation QICompatibility and QIVEqual Quality Indicator Algebra Data Integrity Conclusion
Chapter 3
Extending the ER Model to Represent Data Quality Requirements
Motivating Example Quality Requirements Identification Requirements Modeling Modeling Data Quality Requirements Data Quality Dimension Entity Data Quality Measure Entity Attribute Gerund Representation Conceptual Design Example Concluding Remarks
Chapter 4
Automating Data Quality Judgment
Introduction Quality Indicators and Quality Parameters Overview Related Work Data Quality Reasoner Representation of Local Dominance Relationships Reasoning Component of DQR FirstOrder Data Quality Reasoner The Q-Reduction Algorithm Q-Merge Algorithm Conclusion
26 27 28 29 31 32
37
37
38 39 40 41 42 42 43 44 47
49
49
50 51 51 52 52 53 54 57 58 60 61
Chapter 5
Developing a Data Quality Algebra
Assumptions Notation Definitions A Data Quality Algebra Accuracy Estimation of Data Derived by Selection Worst case when error distribution is non-uniform Best case when error distribution is non-uniform Accuracy Estimation of Data derived by Projection Worst case when error distribution is non-uniform Best case when error distribution is non-uniform Illustrative Example Conclusion
Chapter 6
The MIT Context Interchange Project
Integrating Heterogeneous Sources and Uses The Information Integration Challenge Information Extraction and Dissemination Challenges Information Interpretation Challenges Overview of The Context Interchange Approach Context Interchange Architecture Wrapping Mediation Conclusion
Chapter 7
The European Union Data Warehouse Quality Project
Introduction The D WQ Project DWQ Project Structure
i
x
63
63
64 65 65 69 69 70 71 71 72 73 73 76
79
79
80 81 82 82 84 84 85 86 91
93
93
93 95 95
x
DWQ Objectives Data Warehouse Components The Linkage to Data Quality D WQ Architectural Framework The Conceptual Perspective The Logical Perspective The Physical Perspective Relationships between the Perspectives Research Issues Rich Data Warehouse Architecture Modeling Languages Data Extraction and Reconciliation Data Aggregation and Customization Query Optimization Update Propagation Schema and Instance Evolution Quantitative Design Optimization Overview of the DWQ demonstration Source Integration (Steps 1,2) Aggregation and OLAP Query Generation (Steps 3,4) Design Optimization and Data Reconciliation (Steps 5,6) Summary and Conclusions
Chapter 8
The Purdue University Data Quality Project
Introduction Background Related Work Searching Process Matching Process Methodology Data Pre-processing Sorting, Sampling & the Sorted Neighborhood Approach Clustering and Classification Decision Tree Transformation and Simplification Experiments and Observations Conclusions
96 97 98 100 101 101 102 103 103 104 105 105 106 107 108 108 108 109 113 113 114
119
119
120 121 122 122 123 124 125 126 127 128 130 135
Chapter 9
Conclusion
Followup Research Other Research Future Directions
Bibliography Index
xi
139
139
140 143 143
149 163