Guide

The end-user manual for PageSeeder

 

Data Quality

The productivity of PageSeeder users is directly affected by the quality of the data. This document is aimed at helping to explain where to look for problems at what can be done to address them.

Description of problems

Of all the factors that contribute to the effectiveness of a PageSeeder implementation, few deliver the returns of high quality data.  As a discipline, data quality covers processes for monitoring, measuring, reporting and remediating data issues. The processes that can contribute to quality are the following:

  • Data profiling and data quality measurement – The analysis of data to capture statistics (metadata) that provide insight into the quality of data and help to identify data quality issues.

  • Parsing and standardization – The decomposition of text fields into component parts and the formatting of values into consistent layouts, based on industry standards, local standards (for example, postal authority standards for address data), user-defined business rules, and knowledge bases of values and patterns.

  • Generalized "cleansing" – The modification of data values to meet domain restrictions, integrity constraints or other business rules that define when the quality of data is sufficient for an organization.

  • Matching – The identifying, linking or merging of related entries within or across sets of data.

  • Monitoring – The deployment of controls to ensure that data continues to conform to business rules that define data quality for an organization.

  • Issue resolution and workflow – The identification, quarantining, escalation and resolution of data quality issues through processes and interfaces that enable collaboration with key roles, such as data steward.

  • Enrichment – The enhancement of the value of internally held data by appending related attributes from external sources (for example, consumer demographic attributes and geographic descriptors).

In addition, data quality tools provide a range of related functional abilities that are not unique to this market but that are required to execute many of the core functions of data quality assurance, or for specific data quality applications:

  • Connectivity/adapters confer the ability to interact with a range of different data structure types.

  • Subject-area-specific support provides standardization capabilities for specific data subject areas.

  • International support provides the ability to offer relevant data quality operations on a global basis (such as handling data in multiple languages and writing systems).

  • Metadata management enables the ability to capture, reconcile and interoperate metadata relating to the data quality process.

  • Configuration environment abilities enable the creation, management and deployment of data quality rules.

  • Operations and administration facilities support the monitoring, managing, auditing and control of data quality processes.

  • Service enablement provides service-oriented characteristics and support for service-oriented architecture (SOA) deployments.

  • Alternative deployment options offer abilities to implement some or all data quality functions and/or services beyond on-premises deployments (for example, via the cloud).

The tools provided by vendors in this market are generally used by organizations for internal deployment in their IT infrastructure. They use them to directly support various scenarios that require better data quality for business operations (such as transactional processing, master data management [MDM], big data, business intelligence [BI] and analytics) and to enable staff in data-quality-oriented roles, such as data stewards, to carry out data quality improvement work. Off-premises solutions, in the form of hosted data quality offerings, SaaS delivery models and cloud services, continue to evolve and grow in popularity.

Created on , last edited on