Data Hygiene – A Simple Definition

Here is a simple definition I found about data hygiene: 

Data Hygiene are principles and practices that serve to maintain accuracy in computer data.

This term originated back around 1995 in the direct mail industry.

What data hygiene tool(s) does your company use today?

Red Gate – Data Cleanser for SSIS

1.) Red Gate Forum:  Click Here.

2.) Free “Beta Tool” For Data Cleansing:  Click Here.

3.) Products Page:  Click Here.

4.) Adding Data Cleanser To Your SQL Toolbox:  Click Here.

 

Data Warehouse Studio

Here is another request I received and would like to share with my readers:

Data Warehouse Studio complements your current ETL technology and helps to automate the management and code generation processes as you build your Warehouse and/or Mart. Data Warehouse Studio customers are accelerating their implementations by as much as 3 − 10 times vs. conventional methods.

The Software collects specifications from various contributors – architects, analysts and modelers taking in to consideration that a large number of specs can be reused across different feeds and projects. By providing each type of contributor a custom entry screen we are eliminating redundant manual activities that are present in conventional methodologies. Once the specs have been entered, with the push of a button DWS generates ETL & Sql code in the tool of your choice (Informatica, DataStage, Talend, etc.)

Here are some of the benefits:

For Architects:

> We ensure consistency of implementation across the entire project and adhere to architectural requirements

For Developers:

> The automation of mundane tasks
> The generation of staging jobs
> Automated surrogate key generation
> Automated Referential Integrity validation
> Automated History Management
> Automated Error Handling
> Impact Analysis

For Analysts:

> We ensure generated code fully corresponds to teh business logic
> Fast turnaround time
> Facilitation of data mapping activities

For Managers:

> Short project timelines
> Lower cost deployments
> Documentation generations
> Inventory of Feeds
> Hardware and Software Utilization Metrics
> Data Lineage Reporting

Please email Scott Davidson below if you would like to receive a 2 week trial license and/or a demo.

Regards,
Scott Davidson
Gamma Systems
scott.davidson@gamma-sys.com
www.gamma-sys.com

For a real life customer success story, please check out this link:

http://www.talend.com/open-source-provider/casestudy/CaseStudy_Allianz_EN.php
Reply

Is Your Data Getting Better or Worse?

Over the past 25 years for those of us that have been in the industry that long… We have seen a number of advancements in data hygiene applications and techniques? Regardless, if you are old or new to the industry – What data hygiene software best suits your current needs today?

(LOL: Even the word Data Hygiene is old school… today we use more robust terms like Data Management, Data Quality, Enterprise Data Quality, Master Data Management, Data Governance, etc…)

Dots On A Map Provide Unique Insights Into Data Quality

This was a presentation I originally prepared back in 2005, but is probably even more applicable in 2009 given the impact using a GIS tool can have on visualizing data quality – customer addresses on a map! The next time you conduct a customer “data” assessment – try this! You can also see a high level data profile I prepared for this trade area of specific customers.

What Different Routines Do You Consider Important When “Data Profiling” In Order To Reveal The Quality Of Information In A Data Source?

There are several different types of data quality tools in the marketplace today to essentially do one important thing – cleanse, validate, correct, and enhance your data.

In order to better understand what the “quality expectation” is for YOUR CLIENT a baseline (or scorecard) must be established for each source system. Data profiling is an ideal way to reveal and share the results with others in order to make an informed decision and rank your findings.

Address Quality Extends Beyond CASS and NCOA

Several consultants have asked me over the last 6-8 months why don’t they (their respective firms) just build there own address management solution (code from scratch!) and purchase/license the CASS, NCOA, etc… content “only” directly from the USPS.

My answer in short is “let me tell you some reasons why not”.

1.) First and most important is many of the well known original suppliers of postal coding solutions in place today had to (and still go through) rigorous certification processes to insure their software and subsequent updates (versions) continue to comply with the USPS guidelines. (Note: Many of these well known suppliers I am referring to have been around since the early 1980′s when I began my career at Metromail. Now, that’s old. LOL)

2.) The number of “bugs” that have been reported to these same vendors over the years by their respective client bases (2,000+ clients in some cases) are best positioned to minimize risk for each “new license” of their products and services sold.

3.) The “people” behind the design, creation, and ongoing development of these (existing) postal products and services have 15+ (minimum) years of experience in the industry rather than a new team that may be just formed with little or no knowledge about this process.

4.) The barrier to entry for a new “postal coding” engine in 2009 (with the exception of a new add-on service you may want to bolt-on) to an existing postal coding engine is hard to envision. But, that’s my opinion.

In summary, my advice is to stick to creating some kind of exception process or create a client-specific data governance process (or standard) using an existing vendor solution offering that already has an established relationship with the USPS.

Here is a good example of one software supplier who exemplifies several of my points above:

GreyHair Software, Inc. goes beyond CASS/NCOA as major sources to power their address quality (best practice) offering which includes other alliances like the UAA Clearinghouse.

Let me explain further:

Here is a brief excerpt from an article last years published by GreyHair Software, Inc. Note: One of the executives at GrayHair is a past work associate of mine – Raymond Chin, Vice President of Product Management & Development. (See point #3 above)

Hold that thought, and read about how providers today are enhancing the traditional “postal” offerings today to expand beyond traditional USPS – CASS and NCOA… content!

Publication: Business Wire
Date: Wednesday, April 9 2008

GrayHair Software, Inc. and UAA Clearinghouse today introduced the most comprehensive set of offerings for managing Address Quality and reducing Undeliverable-As-Addressed (UAA) mail. By using source data from the USPS([R]) next-generation Intelligent Mail([R]) Barcode with change-of-address data from publishing and telecommunications organizations, best-of-breed solutions are now available for suppressing and/or redirecting addresses.

This will improve responses and reduce the cost of business mailings, thus enhancing the return-on-investment of direct mail programs, and making a significant contribution to the bottom line.

The article goes on… you can read the rest by going to:

http://www.allbusiness.com/marketing-advertising/marketing-advertising/8945207-1.html

(End of article, excerpt.)

In summary, consultants “do your research”… find existing companies like GreyHair Software to support your basic client needs (with confidence) plus any other unique requirements.

Addtional note,

Address Quality (Best Practices) today are providing more “value” than just save postage ($$$) and improved deliverability of a piece of mail… like days past.

The benefit of good address quality (best practices) is also a big plus for customer data integration (CDI) initiatives… resulting in increased customer match/merge/link/search scenario’s, especially in customer (MDM) hubs where clients today are centralizing disparate customer data sources across the enterprise into a single view of a customer.

To my fellow consultants… and postal software vendors… you are welcome to add your own comments or share with us your unique “product” differentiators.

Enterprise Data Quality Blog

Here is the link to the most recent article on my Enterprise Data Quality blog: http://enterprisedataquality.com/2009/03/31/data-quality-and-master-data-initiatives/

Premier-International’s EPACTL Tool (Applaud)

Premier-International is based in Chicago and has software and consulting services:

What is Applaud?

Applaud is the only “EPACTL” tool – the only single software product with integrated tools to extract, profile, analyze, cleanse, transform and load data.

EPACTL is a new breed of software that provides integrated tools to accomplish all requirements of data quality and data migration/consolidation projects.

After reviewing the website, here are some of the key service offerings I would like to share which has been directly taken from their website to avoid mis-representation:

1.) Data Migration and Data Conversion – Migrating data from legacy systems to a new replacement system.

2.) Data Consolidation – Consolidating data from multiple instances of the same system or multiple disparate systems.

3.) Data Cleansing – Cleansing data and supporting data quality initiatives.

4.) Data Quality Audits – Performing data quality audits.

5.) Data Integration – Constructing interfaces between on-going systems.

6.) Data Management for IT – Building customized data management solutions.

7.) Data Management for Employee Benefits – Delivering customized data management solutions for employee benefit consultants and actuaries.

8.) Rapid Application Development – Using Applaud’s RAD tools to deliver dynamic system solutions fast.

If you want to learn more about Applaud and Premier International, visit…

http://www.premier-international.com/company_about_us.aspx?SubMenuID=11

If there are any readers out their who have knowledge about Premier-International or Applaud, please feel free to comment.

Free Dirty Data Challenge At DataMentors

If you want to take this challenge, go to:

http://www.datamentors.com/Products-and-Services/Dirty-Data-Challenge.html

I will give you a sneak preview of what you should be ready to do… and it’s all for FREE! See details for number of records that qualify for test at DataMentors website.

(from the website)
It’s easy and only five simple steps:

1. Select the Data Evaluation File Layout instructions that best match your business
2. Attached the correctly configured data files
3. Complete the form below
4. Choose your preferred method of transfer
5. Select Submit

DataMentors Summary About Dirty Data

Dirty data will cost Corporate America over $2 billion dollars this year. And you’re probably an unwilling contributor to this senseless waste. The success of any sales or marketing initiative is based on optimizing data and that means starting with the cleanest data possible.

Contact Information:

Email: info@datamentors.com
Phone: 877.888.3282

Data Quality and Identity Resolution (White Paper)

Here is a white paper available from Informatica that does a good job at discussing data quality and identity resolution.

Go to this page and register to download:

http://networking.ittoolbox.com/research/data-quality-and-identity-resolution-19104?sp=cm

Here is a brief excerpt about the white paper and Informatica.

Identity Resolution helps meet the challenges of dealing with identity data by emulating an intelligent user’s ability to recognize matches but also to discover connections – between people, accounts, products – that might be hidden in the data.

Data Quality processes have evolved beyond the traditional cleansing of names and addresses. In parallel, the market for Identity Resolution processes has evolved from simple search and match functionality for customer service, fraud detection, and security screening processes to today’s highly accurate, high-performance, real-time, cross-language search and match services across multiple applications. Download this white paper now to learn about how Identity Resolution complements and extends the application of data quality and data integration processes into business applications.
About this Sponsor

Informatica Corporation delivers data integration software and services to solve the problem of data fragmentation across disparate systems, helping organizations gain greater business value from all their information assets.

Follow

Get every new post delivered to your Inbox.