SQL Data Quality Services: How to Find Special Characters and Extra Spaces between Words in your Data
You can use Data Quality Services (DQS) in SQL Server to find special characters and extra spaces between words in a string value in your data. You can do this easily by using regular expressions (regex) in a domain rule in DQS. Since you can use regular expressions in a domain rule for a String domain only, you can check for special characters and extra spaces for only the character/string type of data.
In this example, we will create a domain rule using regular expressions that only treats the following as the valid values in the domain: English lowercase letter (a, b, c,… z), English uppercase letter (A, B, C,… Z), Numeral (0, 1, 2,… 9), and a single space between the words if there is more than one word in a field.
NOTE: You cannot find leading and trailing spaces in your string values using DQS because DQS automatically trims leading/trailing spaces in string values while checking for domain rules and values. You will have to manually remove the leading and trailing spaces in your string values.
To check for special characters and extra spaces between words:
- In DQS, create or select a select a domain of String data type.
- Click the Domain Rules tab.
- Click the Add a new domain rule icon () to create a domain rule.
- Specify a name for the domain rule, and then select the Value matches regular expression option from the rules list. In the value box, type the following: ^[a-zA-Z0-9]+([\s][a-zA-Z0-9]+)*$.
- Click the Run the selected domain rule on test data icon () to test the rule on dummy data. Here are the results of running the rule:
- Close the Test Domain Rule dialog box, and then click Finish to save the domain rule and publish the knowledge base.
You can modify the regular expression in the domain rule to include or exclude characters as per your data validation requirements.
See Also
- [[articles:Using DQS: Cleansing complex data using composite domains]]
- [[articles:Using DQS: How to Find Blank/Empty Values in Your Data]]
- DQS Resources on TechNet Wiki