Simple Data Cleaning Trick In SQL

Hi Everyone,

It’s been a month since I’ve started my new job (Yay!!) and I’ve been doing a lot of Data Conversion!!! I wrote a dynamic script yesterday that removes whitespaces and any form of garbage data like #REF! or #N/A and so on from all character-based columns.

I’m using cursors to go through all the character-based columns after querying the view INFORMATION_SCHEMA.COLUMNS.

DECLARE @COL_NAME NVARCHAR(50);
DECLARE @TABLE_NAME NVARCHAR(200);
DECLARE @unwantedCharStr NVARCHAR(MAX);
DECLARE @sqlString NVARCHAR(MAX);

SET @unwantedCharStr = '''#REF!'', ''#N/A'', ''.'', ''0''';

BEGIN TRANSACTION

DECLARE tbl_for_cleaning CURSOR FAST_FORWARD FOR
SELECT TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME IN ('table1', 'table2') AND DATA_TYPE IN ('nvarchar', 'char', 'varchar');

OPEN tbl_for_cleaning;

FETCH NEXT FROM tbl_for_cleaning INTO @COL_NAME, @TABLE_NAME;

WHILE @@FETCH_STATUS = 0 BEGIN
	-- 1. Remove white spaces
	SET @sqlString = 'UPDATE ' + @TABLE_NAME + ' SET ' + @COL_NAME + ' = LTRIM(RTRIM(' + @COL_NAME + '));';
	--PRINT(@sqlString);
	EXEC(@sqlString);

	-- 2. Remove unwanted characters
	SET @sqlString = 'UPDATE ' + @TABLE_NAME + ' SET ' + @COL_NAME + ' = NULL WHERE ' + @COL_NAME + ' IN (' + @unwantedCharStr + ');';
	--PRINT(@sqlString);
	EXEC(@sqlString);

	FETCH NEXT FROM tbl_for_cleaning INTO @COL_NAME, @TABLE_NAME;
END;

CLOSE tbl_for_cleaning;
DEALLOCATE tbl_for_cleaning;

-- 3. Checks!!
SELECT *
FROM table1;

SELECT *
FROM table2;

-- ROLLBACK
-- COMMIT

I usually export the tables out to Excel and have a quick look and try to spot any other unusual things.

Thanks for reading!!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.