Custom Logs.

Apurva Tiwari 40 Reputation points
2024-07-15T08:16:44.56+00:00

I have a CSV file (say "empfile.csv") in my data lake which is then loaded into Azure Synapse's dedicated SQL pool. Now, My data exists at both Data Lake and SQL pool.

User's image

Suppose, I update some field values in this data (say "Manager" to "Se. Manager" for emp_id:101) and upload it again. It will update the table in synapse with updated values.

User's image

Task: I want to create custom logs that provide output as below and store it in my warehouse:

User's image Any idea how can I implement it or does Azure provide any such functionality for auditing the data?
Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,485 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,997 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 26,186 Reputation points
    2024-07-15T11:29:16.35+00:00

    Azure Synapse does not natively provide a comprehensive change tracking or auditing solution, so you can use the dedicated SQL pool with triggers

    Start by creating a table in your SQL pool to store the audit logs.

     CREATE TABLE audit_logs (
         Audit_Id INT IDENTITY(1,1) PRIMARY KEY,
         TableName NVARCHAR(50),
         ChangedBy NVARCHAR(50),
         emp_id INT,
         ChangedFromValues NVARCHAR(MAX),
         ChangedToValues NVARCHAR(MAX),
         DateTime DATETIME
     )
    

    Then create a staging table where you can load the new data before merging it with the existing data.

    CREATE TABLE empfile_staging ( emp_id INT,
    emp_name NVARCHAR(50),
     emp_design NVARCHAR(50),
      dept_id INT)
    

    Load the updated data in your staging table :

    COPY INTO empfile_staging FROM 'path_to_your_csv_file' WITH ( FILE_TYPE = 'CSV', FIRSTROW = 2, -- Assuming the first row is a header FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );	
    

    Then create a trigger that fires before an update operation on your main table :

    CREATE TRIGGER trg_empfile_update
    ON empfile
    FOR UPDATE
    AS
    BEGIN
        DECLARE @emp_id INT, @old_emp_design NVARCHAR(50), @new_emp_design NVARCHAR(50), @changed_by NVARCHAR(50), @current_time DATETIME;
        
        SELECT @emp_id = INSERTED.emp_id,
               @old_emp_design = DELETED.emp_design,
               @new_emp_design = INSERTED.emp_design,
               @changed_by = 'abc person', -- You can change this to get the actual user
               @current_time = GETDATE()
        FROM INSERTED
        JOIN DELETED ON INSERTED.emp_id = DELETED.emp_id;
        IF @old_emp_design <> @new_emp_design
        BEGIN
            INSERT INTO audit_logs (TableName, ChangedBy, emp_id, ChangedFromValues, ChangedToValues, DateTime)
            VALUES ('empfile', @changed_by, @emp_id, @old_emp_design, @new_emp_design, @current_time);
        END
    END
    

    Merge the data from the staging table to the main table. This will trigger the above trigger for any updates.

    MERGE INTO empfile AS target
    USING empfile_staging AS source
    ON target.emp_id = source.emp_id
    WHEN MATCHED AND (target.emp_design <> source.emp_design OR target.emp_name <> source.emp_name OR target.dept_id <> source.dept_id)
    THEN
        UPDATE SET target.emp_name = source.emp_name,
                   target.emp_design = source.emp_design,
                   target.dept_id = source.dept_id
    WHEN NOT MATCHED
    THEN
        INSERT (emp_id, emp_name, emp_design, dept_id)
        VALUES (source.emp_id, source.emp_name, source.emp_design, source.dept_id)
    

    Optionally, truncate the staging table after the merge operation to keep it clean for future use.

    TRUNCATE TABLE empfile_staging
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.