Monitor Data Exchange between DQS and Reference Data Service Providers
One of the key features in Data Quality Services (DQS) is to use knowledge from a cloud-based reference data service (Windows Azure Marketplace) to cleanse your data. However, in Data Quality Client, you cannot monitor information about the data exchanged between DQS and the reference data service providers during the cleansing process. This article provides information about why and how you can view/monitor this information.
Why to monitor data exchange?
Well, primarily for the following two reasons:
- Financial auditing: You are charged/billed for data exchange between DQS and reference data service providers in the cloud. It is good to have the information handy in case you find any discrepancy in your billing, and want to verify actual usage of the reference data service.
- Troubleshooting: Provides various information about the data exchange such as total time taken for the cleansing of a batch of records, how many records were cleansed, response from the reference data service provider (HTTP status), and so on. This information enables you to efficiently track issues with reference data-based cleansing, and quickly answer support calls from customer.
How to monitor data exchange?
The V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database contains information about the data exchange between DQS and reference data service. Each batch data request is stored as a row in the view. This V_A_REFERENCE_DATA_AUDIT view contains the following columns:
- ID: Unique ID of the data exchange record in this view.
- Provider Name: Reference data service provider name used for cleansing.
- Project: Data Quality Project name used for cleansing.
- Knowledge Base: DQS Knowledge base used for cleansing.
- Batch ID: Unique ID of the data batch sent to the reference data provider.
IMPORTANT: When a domain value is successfully queried once from the reference data service provider, it is cached locally to prevent the same domain values from being queried multiple times from the reference data service provider. You are not billed for queries against the cached data. In this case, ##Cache Lookup## is displayed in the Batch ID column instead of a GUID to signify that the data has been queried against the local cache.
- Number of records sent: Number of records sent to the reference data provider.
- Number of records received: Number of records received from the reference data provider.
- Time sent: Time when the batch request was sent to the reference data provider for data cleansing. (Note: this time stays constant across multiple batches within the same activity, because there is a bug where the time is not refreshed for each individual batch in SQL Server 2012 RTM)
- Time received: Time when the reference date service responded with the cleansed data.
- Duration: Time taken by the reference data provider to respond with the cleansed data. This is calculated as the difference between the request and response time. (Note: this duration is cumulative is since the beginning of the activity, since the Time sent column is not incremented per individual batch in SQL Server 2012 RTM)
- Response Code: The HTTP status code for the operation. For example, 200 OK, 504 - Gateway Timeout, and so on.
To view data in the V_A_REFERENCE_DATA_AUDIT view in the DQS_MAIN database:
- Start Microsoft SQL Server Management Studio
- In Microsoft SQL Server Management Studio, right-click the SQL Server instance where DQS is installed, and then click New Query in the shortcut menu.
- In the Query Editor window, copy the following SQL statement:
select * from DQS_MAIN.dbo.V_A_REFERENCE_DATA_AUDIT - Press F5 to run the statement. The Results pane will display the data in the V_A_REFERENCE_DATA_AUDIT view.
Example Output:
ID
PROVIDER NAME
PROJECT
KNOWLEDGEBASE
BATCH ID
NUMBER OF RECORDS SENT
NUMBER OF RECORDS RECEIVED
TIME SENT
TIME RECEIVED
DURATION
RESPONSE CODE
1000000
Provider 1
Project1
KB1
##Cache Lookup##
38
0
2011-12-12 14:49:40.973
2011-12-12 14:49:41.063
00:00:00.0900000
OK
1000001
Provider 1
Project1
KB1
354aae07-bb1e-47ea-a9f8-43afdcbacefb
38
0
2011-12-12 14:49:41.127
2011-12-12 14:50:27.067
00:00:45.9400000
504 - Gateway Timeout
1000002
Provider 1
Project1
KB1
191c6a4e-af18-4117-a273-53d9e80988c8
20
20
2011-12-12 14:50:27.097
2011-12-12 14:50:35.153
00:00:08.0570000
OK
1000003
Provider 2
Project2
KB2
##Cache Lookup##
288
0
2012-02-07 11:38:40.163
2012-02-07 11:38:40.560
00:00:00.3970000
OK
1000004
Provider 2
Project2
KB2
82db1c29-b77b-4452-a837-d4e1744b655e
100
100
2012-02-07 11:38:40.630
2012-02-07 11:38:46.057
00:00:05.4270000
OK
1000005
Provider 2
Project2
KB2
5d0e526d-56db-47bc-bebe-caaa43123665
100
100
2012-02-07 11:38:40.630
2012-02-07 11:38:50.447
00:00:09.8170000
OK
1000006
Provider 2
Project2
KB2
847ba394-92aa-4a55-b882-940834838d69
88
88
2012-02-07 11:38:40.630
2012-02-07 11:38:54.453
00:00:13.8230000
OK
If you have enabled logging for reference data service (RDS) in Data Quality Client (see Configure Severity Levels for DQS Log Files), each batch request is also logged in the DQS Server log file (DQServerLog.DQS_MAIN.log). A sample entry in the DQS Server log file looks like the following:
12/14/2011 9:44:55 AM|[]|15|INFO|f592c205-3a81-40d3-8fe8-08a511df6762|Microsoft.Ssdqs.ReferenceData.Auditing.ReferenceDataAuditor| Batch Job: PROVIDER [1002], PROJECT [1000078], PROCESS [1079], BATCH ID [f6388357-a209-413f-b2df-499c3ea8b12e], NUMBER OF RECORDS SENT [10], NUMBER OF RECORDS RECEIVED [10], TIME SENT = [12/14/2011 09:44:03], TIME RECEIVED [12/14/2011 09:44:55], DURATION [00:00:52.2082203], RESPONSE CODE [OK]