Often times, data is not cleansed appropriately when pulling data from files or other database tables. As a result, this can cause problems when trying to use ETL processes or utilities like Amazon Athena to digest data from it’s source. In order to remove blanks, spaces, or whitespace in sql we can use a sql trim function.
The TRIM function will eliminate all trailing and leading spaces of a column value. This will NOT eliminate any whitespace or blanks that falls between characters in a field. However, we can use REPLACE if needed as an alternative. Since TRIM is an ANSI-SQL function it is compatible with most databases.
These are a few functions that we can use to remove blanks in sql:
ANSI-Standard SQL Syntax – TRIM()
SELECT TRIM(<column_name>) FROM Table; — trim syntax example
Similarly, LTRIM (left trim) is used to eliminate any leading spaces in a column. RTRIM has the opposite effect and removes all trailing spaces in a field. These functions are mostly compatible with SQL Server databases, but may work on some others.
SQL Server Syntax – LTRIM() & RTRIM()
SELECT LTRIM(<column_name>) AS C1, RTRIM(<column_name>) AS C2 FROM Table;
Practical Examples of Using Trim
As an example, I can look at a record in the database that has padded spaces on the Customer table.
We can see that our field has padded spaces both leading and trailing. As a result, this is a great use-case for using TRIM.
The resulting output returns us the same value without the leading and trailing spaces. Similarly we could also use replace to accomplish the same task although the syntax is slightly larger.
In this case our output results end up the same as using TRIM.
Trouble-shooting & Final Thoughts
Sometimes databases do not seem to remove blanks or whitespace. This is usually a result of alternative spacing characters not recognized by the database. In this case, often REPLACE can be used to eliminate these special characters.
Scrubbing blank spaces and whitespace are common data cleansing practices. We often expect data to be pre-cleansed when moving from outdated systems, but this is almost never the case. Data entry practices are often sloppy and many tools do not offer much to prevent bad entries.
It’s up to the back-end database developers and data scientists to make sure that reporting and analytics appear clean and seam-less on the front end. Trimming and replacing bad characters is just one of the many tasks to accomplish this.