| |
|
 |
IBM
InfoSphere QualityStage |
|
|
|
|
 |
Data
Cleansing-Standardize,
De-Dupe, Enrich, Merge,
Survivorship |
|
|
Information
that you can Trust!
Staying in touch
with customers
Often
businesses spend as much time and effort
gathering new customers as they do on anything
else. It’s also one of the most costly
functions of doing business. So it’s
important to make sure you don’t lose the
customers you’ve spent so much energy to
acquire. To support this cause, businesses end
up acquiring various third-party applications
and tools and also develop many home-grown
applications thus creating a huge wealth of
data across various systems within the
enterprise.
The
quality of data within the enterprise is
measured according to its reliability and
validity; the completeness and accuracy of a
data set. It is usually measured by comparing
the data set to another data set identified as
the "gold standard", and assessing
the level of agreement, thus creating a Master
Record.
 |
Understanding
data domains improves data quality
assessment programs |
 |
Greater
precision and reliability of intended
information usage |
 |
Empowerment
of business user |
The
following are typical tasks of Data Quality:
 | Cleanse |
 | Standardize |
 | De-dupe |
 | Validate
(U.S. and Global Name and Address
matching) |
 | Enrichment
(with D&B, Experian, Transunion, US Postal
Service, Government watch list, Patriot
Act, etc) |
 | Set
up unmerged and suspect data for
correction, survivorship, reject process |

 | Specialized
data quality functions seamlessly
integrated with DataStage |
 | Visual
tools for defining complex matching and
survivorship logic |
 | Ensures
clean, standardized, de-duplicated
information |
 | Enables
a single version of the truth |
AVS
Systems Data Quality Assessment (DQA) enables
customers to use the following processes to
make a significant impact in the data that
drives an organization’s success.
 | Investigation
of source data to understand the nature,
scope, and detail of data quality
challenges.
|
 | Standardization
to ensure that data is formatted and
conforms to organization-wide
specifications, including name and firm
standards as well as address cleansing and
verification.
|
 | Matching
of data to identify duplicate records
within and across data sets.
|
 | Survivorship
to eliminate duplicate records and create
the “best record/golden copy”
|
A
process for re-engineering data can help
accomplish the following goals:
 | Resolve
conflicting and ambiguous meanings for
data values
|
 | Identify
new or hidden attributes from free-form
and loosely controlled source fields
|
 | Standardize
data to make it easier to find
|
 | Identify
duplication and relationships among such
business entities as customers, prospects,
vendors, suppliers, parts, locations, and
events
|
 | Create
one unique view of the business entity
|
 | Facilitate
enrichment of re-engineered data, such as
adding information from 3rd
party vendor sources or applying standard
postal certification routines (ex: USPS
CASS Certification and WAVES for International
validation)
|
Note:
You
can use a data re-engineering process in batch
or real time for continuous data quality
improvement.
The
following methods can be used in the data
re-engineering process to restructure the
data. The methods are usually performed in the
order listed:
- Source
access or extraction*
- Conditioning
- Standardization
- Address
verification*
- Matching
- Group
association*
- Survivorship*
- Data
enrichment*
- Output
formatting
- Auditing
the load process
An
asterisk (*) indicates optional methods. Data
enrichment can take place prior to matching to
add additional match fields (for example, a
Dun and Bradstreet number or Experian for
organization matching).
Note:
We
can provide details of these 10 steps if
required. For the purpose of keeping the
document small, we are avoiding explaining
these steps in detail.
Here’s what we do
Profiling includes:
 | Collect inventory
of Data Sources (ex: files,
databases) |
 | Import metadata
from Identified Data Sources |
 | Perform Column
analysis, Primary-Key analysis,
Domain analysis, cross-domain
analysis |
 | Set up analysis
results for review and approval
process |
 | Set up baseline for approved
analysis results |
 | Investigate data
structure, word and character
discreet investigation |
Data Quality Rigor includes:
 | Develop Frequency
distribution and Pattern report |
 | Uncovering
information buried in free-form
fields and identifying |
 | relationships
between data values |
 | Develop matching
and blocking strategy for match
design specification |
 | Develop
Standardization, De-Dupe, Merge,
Enrich for name, address, city,
state, zip, phone, fax, email, etc
for profiled data |
 | Develop review
process for master data, duplicates,
unmatched and unresolved data |
 | Setup data for survivorship |
 | Automate Data
Quality measures and compare with
baseline results |
 | Ensure that the data is of highest
quality |
Increase your return on investment
|


Copyright
1991-2011 AVS SYSTEMS, INC. All rights reserved
|
|
|
|
 |
|
|