Share via


Data Quality Services (DQS) FAQ

DQS is a new feature in SQL Server 2012 that provides you with a knowledge-driven data cleansing solution. For more information about DQS, see Introducing Data Quality Services. The FAQs were originally created by the following people in the DQS product team:

  • Elad Ziklik (Principal Group Program Manager)
  • Gadi Peleg (Senior Program Manager)

Note: This article is closely monitored. Any changes that you make will be evaluated and then quickly accepted, refined, or reverted. Because this is a wiki, additions or refinements to these FAQs might have been made by community members. To read the original FAQ document, click here.


Q.1 What is data quality?

Data quality represents the degree to which the data is suitable for usage in the required business processes. The quality of data can be defined, measured and managed through various data quality metrics such as completeness, conformity, consistency, accuracy, duplication etc. Data quality is achieved through people, technology and processes.

Return to Top


Q.2 What is DQS?

DQS is a knowledge-driven solution, focusing on creation and maintenance of a Data Quality Knowledge Base (DQKB) that is reused for performing various data quality operations, such as data cleansing and matching.

The main concept behind DQS is a rapid, easy-to-deploy, and easy-to-use data quality system that can be set up and used practically in minutes.

Return to Top


Q.3 Who is the target audience?

DQS is targeting organizations of all sizes who seek to improve the quality of their business data. The product's functionality enables business users, information workers and IT professionals to improve the quality of their data and manage their data quality processes and tasks.

Return to Top


Q.4 What is data stewardship?

Data Stewardship has as its main objective the management of the corporation's data assets in order to improve their reusability, accessibility, and quality. It is the Data Stewards' responsibility to approve business naming standards, develop consistent data definitions, determine data aliases and derivations, document the business rules of the corporation, monitor the quality of the data, and so forth.

Return to Top


Q.5 What are the core problems customers should expect to solve with DQS?

DQS provides customers with capabilities that help improve the quality of their data. Data is usually generated by multiple systems and parties across organizational and geographic boundaries and often contains inaccurate, incomplete or stale data elements. The following scenarios are the data quality problems addressed by DQS in SQL Server 2012.

Data Quality Issue Description
Completeness Is all the required information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue.
Example: if you have an email field where only 50,000 values are present out of a total of 75,000 records, then the email field is 66.6% complete.
Conformity Are there expectations that data values conform to specified formats? If so, do all the values conform to these formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships.
Example: The Gender codes in two different systems are represented differently; in one system the codes are defined as ‘M’, ‘F’ and ‘U’ whereas in the second system they appear as 0, 1, and 2.
Consistency Do values represent the same meaning?
Example: Is a city name used consistently? For example: New York, NY, NYC, and The Big Apple refer to the same city.
Accuracy Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications.
Example: A customer’s address is a valid USPS address. However, the ZIP code is incorrect and the customer name contains a spelling mistake.
Validity Do data values fall within acceptable ranges?
Example: Salary values should be between 60,000 and 120,000 for position levels 51 and 52.
Duplication Are there multiple, unnecessary representations of the same data objects within your data set? The inability to maintain a single representation for each entity across your systems poses numerous vulnerabilities and risks. Duplicates are measured as a percentage of the overall number of records. There can be duplicate individuals, companies, addresses, product lines, invoices and so on. The following example depicts duplicate records existing in a data set:
Name Address Postal Code City State
Mag. Smith 545 S Valley View D. # 136 34563 <Anytown> New York
Margaret smith 545 Valley View ave unit 136 34563-2341 <Anytown> New-York
Maggie Smith 545 S Valley View Dr <Anytown> NY.

Return to Top


Q.6 What are the business benefits of DQS? How does it change things for the better (the before/after picture)?

Delivering higher data quality in a consistent, controlled, managed, integrated and fast manner results in better business results. The DQS knowledge base approach enables the organization, through its data experts, to efficiently capture and refine the data quality related knowledge in a Data Quality Knowledge Base (DQKB).

Through its interactive cleansing capabilities and its integration with Integration Services and Windows Azure Marketplace, information workers and IT professionals will be able to collaborate and reuse this knowledge for various data quality improvements and enterprise data management processes (cleansing, matching, standardization, enrichment, etc.).

Return to Top


Q.7 What does DQS consist of?

DQS is a part of the SQL Server product, and comprises a Data Quality Server and a dedicated Data Quality Client application. DQS also provides a DQS Cleansing component in Integration Services for an integrated easy-to-use cleansing experience.

Return to Top


Q.8 What is the partner opportunity and value with DQS?

DQS is a knowledge-driven solution, focusing on the creation and maintenance of a Data Quality Knowledge Base (DQKB) that can then be reused to perform various data quality operations, such as data cleansing and matching.

The main concept behind DQS is a rapid, easy-to-deploy, easy-to-use data quality product that can be set up with minimal effort. To that end, DQS focuses on creating an open environment for consuming third-party intellectual property (IP) and knowledge, which enables partners and ISVs to build DQKB content and assist customers to launch their data-quality initiatives using DQS in a smoother, friction-free manner.

Return to Top


Q.9 What is a Data Quality Knowledge Base (DQKB)?

DQS is a knowledge-driven solution, and in its heart resides in the DQKB. A DQKB stores all the knowledge related to a specific type of data sources, and is maintained by the organization’s data expert (often referred to as a data steward). For example, one DQKB can handle information on an org’s customer database, while another can handle employees database.

The DQKB contains data domains that relate to the data source (for example: name, city, state, zip code, ID). For each data domain, the DQKB stores all identified terms, spelling errors, validation and business rules, and reference data that can be used to perform data quality actions on the data source.

For detailed information about DQKB and domains, see DQS Knowledge Bases and Domains.

Return to Top


Q.10 What capabilities can I expect to see in DQS?

DQS enables a “self-service data quality experience” through a dedicated Data Quality Client application, where any data expert with virtually no database expertise can create, maintain, and run data-quality operations, with minimal setup and preparation time. Data Quality Client and the DQS Cleansing component in Integration Services enable the following capabilities in SQL Server.

DQS Component Capabilities
Data Quality Client
  • Knowledgebase Management: Creating and maintaining a DQKB, including:
    • Knowledge management: A set of functionalities that enables the data steward to manually define, update and review the DQKB’s knowledge.
    • Knowledge discovery: A computer-assisted acquisition of knowledge from a data source sample.
    • Matching policy training: Define a set of rules that will serve as the policy governing the matching process.
    • Reference data exploration: Explore, choose, and integrate reference data from 3rd parties into the DQKB domains.
  • Data Quality Projects: Enable correcting, standardizing, and matching source data according to domain values, rules, and reference data associated with a designated data quality knowledge base.
  • Administration: Encompasses several administrative functions such as:
    • Monitoring current and past DQS processes such as data correction and matching.
    • Definition of reference data providers.
    • Setting parameters related to DQS activities.

For more information about Data Quality Client, see Data Quality Client Application.
DQS Cleansing component in Integration Services A synchronous data flow transformation component enables correcting the input data according to domain values, rules, and reference data associated with a designated DQS knowledge base.

For more information about the DQS cleansing component, see Using the SSIS DQS Cleansing Component.

Return to Top


Q.11 How do you build a DQS knowledge base (DQKB)?

A DQKB can be built by acquiring knowledge through data samples and user feedback. The DQKB is enriched through a computer-assisted knowledge discovery process, or by user-generated knowledge and IP by third-party reference data providers.

Return to Top


Q.12 Does DQS provide any extensibility support in terms of public APIs or Web services that third-parties can programmatically use to run DQS operations?

DQS does not provide any programmatic access in terms of public APIs and Web services that can be used by third-party apps/developers to extend DQS functionality. You must use the Data Quality Client or the DQS Cleansing component in SSIS to perform DQS operations.

Return to Top


Q.13 What is the ship vehicle for DQS?

DQS will ship as part of SQL Server 2012.

Return to Top


Q.14 Is there an Innovation, citizenship, quality angle?

Yes, DQS is a knowledge-driven solution that is based on Quality Specific Knowledge Bases that reside in SQL Server. The DQS knowledge base stores comprehensive quality-related knowledge in the form of data domains. These domains encapsulate the semantic representation of specific type of data sources (for example, name, city, state, zip-code, id number). For each data domain, the DQS knowledge base stores all identified terms, spelling errors, rules, and external reference data that can be used to cleanse the enterprise business data.

Building the DQKB combines advanced automatic algorithms and well-defined and streamlined processes that enable rapid knowledge acquisition, aligned with the specific enterprise data.

Return to Top


Q.15 What is the relationship to Microsoft’s cloud initiatives?

DQS includes defined connectivity to Windows Azure Marketplace and other third-party business services and data sets to enhance the DQKB with third-party cloud-based IP. For more information, see Reference Data Services in DQS.

Future versions of DQS might also include end-to-end (e2e) cloud solutions.

Return to Top


Q.16 DQS sounds exciting. When and how can I get it?

DQS is available now as part of the SQL Server 2012 RTM release. You can download SQL Server 2012 RTM from here. For detailed information about installing and configuring DQS, see the DQS Installation Guide and watch the installation video. For learning DQS, see DQS Learning Resources.

Return to Top


See Also