When is Data Cleansing required ?
Generally Data Cleansing is required in a systems integration flavor of project. For example when different systems which might be logically related like back-office and front-office systems, or non-related systems like different business application that might be independent in its own respect but functioning under a common umbrella. In both these examples one thing is common, these systems operate differently and are subject to integration.
What is Data Cleansing ?
Data is the above mentioned systems are not always as expected, or the data is not of the quality that is required to be fed into new system. For example, there can be OLTP applications or front-office systems where data is not fed in the precise manner due to operator error or limited system design. So these data issues needs to be fixed, mostly in the host or intermediate system which is mostly effective to the requirements of the new system where data of a particular quality level is required.
How to do Data Cleansing, using SQL Server in specific ?
There are different ways of doing Data Cleansing, and there is not particular theory or formula to it, in the best of my knowledge. But one thing that applies universally to any data cleansing exercise is, the data should be profiled thoroughly. Thorough data profiling is directly proportional to level of data cleansing. Mostly the hard part is data profiling, than data cleansing. It takes more business knowledge combined with technical analysis of the data for profiling. Once the details about what to fix in the data is catalogued, it's generally not that hard to fix those issues using ETL tools like SSIS. SSIS comes with a dedicated task for Data Profiling and a dedicated tool for viewing the profiled data.