Archive for the ‘Data Quality’ Tag

npENGAGE – Data Hygiene

npENGAGE – Data Hygiene

Good article… Thanks Mary Dempsey.

Go to this link and sign up for one of the many newsletters which focuses on different aspects of your business.  I believe the section I found this article in was Analytics – Data Hygiene, Is a bad address costing your organization?


Wiki – Data Cleansing

Wiki – Data Cleansing

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data.

After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.

Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities.

PC Magazine’s Definition

PC Magazine’s Definition

The condition of data in a database. Clean data are error free or have very few errors. Dirty data have errors, including incorrect spelling and punctuation of names and addresses, redundant data in several records or simply erroneous data (not the correct amounts, names, etc.).

Dots On A Map Provide Unique Insights Into Data Quality

This was a presentation I originally prepared back in 2005, but is probably even more applicable in 2009 given the impact using a GIS tool can have on visualizing data quality – customer addresses on a map! The next time you conduct a customer “data” assessment – try this! You can also see a high level data profile I prepared for this trade area of specific customers.

What Different Routines Do You Consider Important When “Data Profiling” In Order To Reveal The Quality Of Information In A Data Source?

There are several different types of data quality tools in the marketplace today to essentially do one important thing – cleanse, validate, correct, and enhance your data.

In order to better understand what the “quality expectation” is for YOUR CLIENT a baseline (or scorecard) must be established for each source system. Data profiling is an ideal way to reveal and share the results with others in order to make an informed decision and rank your findings.

Address Quality Extends Beyond CASS and NCOA

Several consultants have asked me over the last 6-8 months why don’t they (their respective firms) just build there own address management solution (code from scratch!) and purchase/license the CASS, NCOA, etc… content “only” directly from the USPS.

My answer in short is “let me tell you some reasons why not”.

1.) First and most important is many of the well known original suppliers of postal coding solutions in place today had to (and still go through) rigorous certification processes to insure their software and subsequent updates (versions) continue to comply with the USPS guidelines. (Note: Many of these well known suppliers I am referring to have been around since the early 1980’s when I began my career at Metromail. Now, that’s old. LOL)

2.) The number of “bugs” that have been reported to these same vendors over the years by their respective client bases (2,000+ clients in some cases) are best positioned to minimize risk for each “new license” of their products and services sold.

3.) The “people” behind the design, creation, and ongoing development of these (existing) postal products and services have 15+ (minimum) years of experience in the industry rather than a new team that may be just formed with little or no knowledge about this process.

4.) The barrier to entry for a new “postal coding” engine in 2009 (with the exception of a new add-on service you may want to bolt-on) to an existing postal coding engine is hard to envision. But, that’s my opinion.

In summary, my advice is to stick to creating some kind of exception process or create a client-specific data governance process (or standard) using an existing vendor solution offering that already has an established relationship with the USPS.

Here is a good example of one software supplier who exemplifies several of my points above:

GreyHair Software, Inc. goes beyond CASS/NCOA as major sources to power their address quality (best practice) offering which includes other alliances like the UAA Clearinghouse.

Let me explain further:

Here is a brief excerpt from an article last years published by GreyHair Software, Inc. Note: One of the executives at GrayHair is a past work associate of mine – Raymond Chin, Vice President of Product Management & Development. (See point #3 above)

Hold that thought, and read about how providers today are enhancing the traditional “postal” offerings today to expand beyond traditional USPS – CASS and NCOA… content!

Publication: Business Wire
Date: Wednesday, April 9 2008

GrayHair Software, Inc. and UAA Clearinghouse today introduced the most comprehensive set of offerings for managing Address Quality and reducing Undeliverable-As-Addressed (UAA) mail. By using source data from the USPS([R]) next-generation Intelligent Mail([R]) Barcode with change-of-address data from publishing and telecommunications organizations, best-of-breed solutions are now available for suppressing and/or redirecting addresses.

This will improve responses and reduce the cost of business mailings, thus enhancing the return-on-investment of direct mail programs, and making a significant contribution to the bottom line.

The article goes on… you can read the rest by going to:

(End of article, excerpt.)

In summary, consultants “do your research”… find existing companies like GreyHair Software to support your basic client needs (with confidence) plus any other unique requirements.

Addtional note,

Address Quality (Best Practices) today are providing more “value” than just save postage ($$$) and improved deliverability of a piece of mail… like days past.

The benefit of good address quality (best practices) is also a big plus for customer data integration (CDI) initiatives… resulting in increased customer match/merge/link/search scenario’s, especially in customer (MDM) hubs where clients today are centralizing disparate customer data sources across the enterprise into a single view of a customer.

To my fellow consultants… and postal software vendors… you are welcome to add your own comments or share with us your unique “product” differentiators.

Enterprise Data Quality Blog

Here is the link to the most recent article on my Enterprise Data Quality blog:

From TDAN: 11 Predictions About Data Quality Space

Diby Malakar has written an interesting article on possible upcoming trends regarding Data Quality given the current economic climate:

Read this article and more at TDAN – The Data Administration Newsletter.

Data Cleansing With Datactics

Datactics delivers rapidly deployed and user friendly products, converting and cleansing data from disparate sources in multiple languages into reliable business information for Fortune 500 and other leading companies.

Be first to comment about any success stories or background regarding this company.