Archive for the ‘Data Cleansing’ Tag

Wiki – Data Cleansing

Wiki – Data Cleansing

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data.

After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.

Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities.


Red Gate – Data Cleanser for SSIS

1.) Red Gate Forum:  Click Here.

2.) Free “Beta Tool” For Data Cleansing:  Click Here.

3.) Products Page:  Click Here.

4.) Adding Data Cleanser To Your SQL Toolbox:  Click Here.


Premier-International’s EPACTL Tool (Applaud)

Premier-International is based in Chicago and has software and consulting services:

What is Applaud?

Applaud is the only “EPACTL” tool – the only single software product with integrated tools to extract, profile, analyze, cleanse, transform and load data.

EPACTL is a new breed of software that provides integrated tools to accomplish all requirements of data quality and data migration/consolidation projects.

After reviewing the website, here are some of the key service offerings I would like to share which has been directly taken from their website to avoid mis-representation:

1.) Data Migration and Data Conversion – Migrating data from legacy systems to a new replacement system.

2.) Data Consolidation – Consolidating data from multiple instances of the same system or multiple disparate systems.

3.) Data Cleansing – Cleansing data and supporting data quality initiatives.

4.) Data Quality Audits – Performing data quality audits.

5.) Data Integration – Constructing interfaces between on-going systems.

6.) Data Management for IT – Building customized data management solutions.

7.) Data Management for Employee Benefits – Delivering customized data management solutions for employee benefit consultants and actuaries.

8.) Rapid Application Development – Using Applaud’s RAD tools to deliver dynamic system solutions fast.

If you want to learn more about Applaud and Premier International, visit…

If there are any readers out their who have knowledge about Premier-International or Applaud, please feel free to comment.

From TDAN: 11 Predictions About Data Quality Space

Diby Malakar has written an interesting article on possible upcoming trends regarding Data Quality given the current economic climate:

Read this article and more at TDAN – The Data Administration Newsletter.

Data Cleansing With Datactics

Datactics delivers rapidly deployed and user friendly products, converting and cleansing data from disparate sources in multiple languages into reliable business information for Fortune 500 and other leading companies.

Be first to comment about any success stories or background regarding this company.

Gartner Says… Companies Want to Get The Data Right

Here is a good < 10 minute video on MDM from Gartner = November 2008 by Ted Friedman, Vice President covering Data Integration, Data Quality and Data Warehousing.

My high-level notes:

1.) Ties to critical business initiatives are a must.

2.) Gartner is seeing “pre-packaged” product offerings on the rise.

3.) Appliance offerings also… “Datawarehouse in a Box”.

4.) Challenges of Data Integration and Data Quality – automating data transformation and data cleansing routines.

5.) Companies getting even more serious about Data Quality given the regulatory issues.

6.) Data Quality and its impact on loss productivity, inaccurate data, etc.

7.) Companies want to get the data right.

8.) Business issues > IT issues according to Gartner

9.) Key question to ask your client is what does Data Quality mean to you?

10.) Dimensions are several – identify key metrics and they must be fact-based.

11.) Data Quality tools continue to emerge in the industry.

12.) Information Management issues are top of mind.

If your company has a Master Data Managment (MDM)

offering you would like to share – click here – and it will take

you to another blog =

where you can request to have your company name added to the “links” section of the blog.   Include a brief description, as well.

Here is Ted’s video: Offers FREE Data Quality Analysis!

For a limited time is offering a FREE Data Quality analysis – just tell them you read about it on are experts in cleaning and enhancing your customer files, databases and mailing lists.  They support many of the world’s largest companies, as well as hundreds of smaller organizations.  Their services are provided in a controlled and secure environment where protection of your data is their number one priority.

The main data center is in Canada but they operate throughout North America.

For a limited time or until someone notify’s us at to discontinue this offer… is offering a FREE Data Quality analysis – just tell them you read about it on

Furthermore, If anyone has used this secure service and would like to comment about their past experience with Interact Direct Marketing,  please let our readers know more.

Thanks Dave Anderson for making this offer possible.

Enclarity – ProviderPoint

I continue to get requests from companies who would like to have some information posted about one or more of their companies offerings or added to the blogroll.  Here is a company that specializes in the healthcare payer market:

Company Name: Enclarity

Service Offering: ProviderPoint

When Enclarity receives your provider file, it uses AcuSync to cleanse the file – identifying and replacing inaccurate, duplicate and incomplete records. It also augments the file with additional data attributes.  Enclarity’s data scientists work with you to establish business rules that govern what updates, corrections and augmentations happen automatically, and where your staff’s expertise will come into play. After those rules are applied, ProviderPoint delivers a clean provider file ready for easy integration into your systems.

ProviderPoint cleans and enhances your provider files with information from Enclarity’s Master Provider Referential Database, which uses Enclarity’s innovative AcuSync™ process to leverage thousands of referential and transactional data sources.

AcuSync uses advanced analytical and database methods to efficiently and reliably standardize, match and join data from different sources, and then produce a provider profile that contains the best available information.

If anyone would like to comment about Enclarity or any one of their service offerings, please do so.

For more information regarding Enclarity:

Data Hygiene T-Shirts With A Smile


For This T-Shirt And More:  Click Here

If you would like to purchase  a DataHygiene T-Shirt with your company (or vendor’s) name on it – Please email: or leave a comment.

Companies with T-shirts already customized are:  Group 1 Software, Informatica, and Trillium Software.

There are also T-shirts with just a smile or question mark on them – if you don’t want one that is personalized with a vendor’s name on it.

Data Hygiene 101

It is a never ending job to ensure a customer’s full name and mailing address is accurate, complete, and remains up-to-date using postal services, such as NCOA.  One of the data elements I use to continually monitor (and update) was multi-family dwelling unit (MFDU) styled addresses where the apartment number was blank.  Obtaining the apartment number was essential to having a piece of mail delivered without having it returned to sender.

What specific data elements do you spend alot of time enhancing in your customer master database?