Home Contact Clients Government Site Map
IBM InfoSphere QualityStage

Up
DataStage
QualityStage
Information Analyzer
Business Glossary
Metadata Workbench
Information Services Director
Change Data Capture
Fast Track



Data Cleansing-Standardize, De-Dupe, Enrich, Merge, Survivorship

Information that you can Trust!

Staying in touch with customers

Often businesses spend as much time and effort gathering new customers as they do on anything else. It’s also one of the most costly functions of doing business. So it’s important to make sure you don’t lose the customers you’ve spent so much energy to acquire. To support this cause, businesses end up acquiring various third-party applications and tools and also develop many home-grown applications thus creating a huge wealth of data across various systems within the enterprise.

The quality of data within the enterprise is measured according to its reliability and validity; the completeness and accuracy of a data set. It is usually measured by comparing the data set to another data set identified as the "gold standard", and assessing the level of agreement, thus creating a Master Record.  

Understanding data domains improves data quality assessment programs

Greater precision and reliability of intended information usage

Empowerment of business user

The following are typical tasks of Data Quality:
Cleanse
Standardize
De-dupe
Validate (U.S. and Global Name and Address matching)
Enrichment (with D&B, Experian, Transunion, US Postal Service, Government watch list, Patriot Act, etc)
Set up unmerged and suspect data for correction, survivorship, reject process

Specialized data quality functions seamlessly integrated with DataStage
Visual tools for defining complex matching and survivorship logic
Ensures clean, standardized, de-duplicated information
Enables a single version of the truth

AVS Systems Data Quality Assessment (DQA) enables customers to use the following processes to make a significant impact in the data that drives an organization’s success.

Investigation of source data to understand the nature, scope, and detail of data quality challenges.
Standardization to ensure that data is formatted and conforms to organization-wide specifications, including name and firm standards as well as address cleansing and verification.
Matching of data to identify duplicate records within and across data sets.
Survivorship to eliminate duplicate records and create the “best record/golden copy”

A process for re-engineering data can help accomplish the following goals:

Resolve conflicting and ambiguous meanings for data values
Identify new or hidden attributes from free-form and loosely controlled source fields
Standardize data to make it easier to find
Identify duplication and relationships among such business entities as customers, prospects, vendors, suppliers, parts, locations, and events
Create one unique view of the business entity
Facilitate enrichment of re-engineered data, such as adding information from 3rd party vendor sources or applying standard postal certification routines (ex: USPS CASS Certification and WAVES for International validation)

 

Note: You can use a data re-engineering process in batch or real time for continuous data quality improvement.

The following methods can be used in the data re-engineering process to restructure the data. The methods are usually performed in the order listed:

  1. Source access or extraction*
  2. Conditioning
  3. Standardization
  4. Address verification*
  5. Matching
  6. Group association*
  7. Survivorship*
  8. Data enrichment*
  9. Output formatting
  10. Auditing the load process

An asterisk (*) indicates optional methods. Data enrichment can take place prior to matching to add additional match fields (for example, a Dun and Bradstreet number or Experian for organization matching).

Note: We can provide details of these 10 steps if required. For the purpose of keeping the document small, we are avoiding explaining these steps in detail.  

Here’s what we do

Profiling includes:

Collect inventory of Data Sources (ex: files, databases)
Import metadata from Identified Data Sources
Perform Column analysis, Primary-Key analysis, Domain analysis, cross-domain analysis
Set up analysis results for review and approval process
Set up baseline for approved analysis results
Investigate data structure, word and character discreet investigation

Data Quality Rigor includes:

Develop Frequency distribution and Pattern report
Uncovering information buried in free-form fields and identifying
relationships between data values
Develop matching and blocking strategy for match design specification
Develop Standardization, De-Dupe, Merge, Enrich for name, address, city, state, zip, phone, fax, email, etc for profiled data
Develop review process for master data, duplicates, unmatched and unresolved data
Setup data for survivorship
Automate Data Quality measures and compare with baseline results
Ensure that the data is of highest quality

Increase your return on investment

 

Back Next

Copyright 1991-2011 AVS SYSTEMS, INC. All rights reserved