What’s the best approach to create external tables from JSON logs using Serverless SQL Pool?.

Saul Figueroa 0 Reputation points
2025-03-10T21:16:18.3+00:00

Hello,
I am currently using Azure Logging to monitor logs in real-time via Log Stream. The goal is to store these logs in a Storage Account Gen 2 container and create an external table in Azure Synapse Analytics for data analysis.

However, I encountered an issue: logs generated via Diagnostic Settings (App Services Console Logs) are stored in JSON Lines format by default. This prevents direct external table creation in Synapse with JSON keys as individual columns.

I have considered some alternatives, but each presents challenges in terms of cost.

Alternative Approaches:

  1. JSON to Parquet Conversion
    • Using Spark Notebooks or Azure Data Factory (ADF) pipelines (Copy Activity): Requires storing both the original JSON and the converted Parquet files, increasing storage costs. Also, processing costs could be high due to the large daily log volume (logs are stored in time-partitioned folders in Storage Account Gen2. - e.g. ".../LOGGING/y=2025/m=03/d=10/h=11/m=00/PT1H.json" ).
  2. Using Databricks or Snowflake
    • Involves additional costs and the need to introduce new services.
  3. Using Log Analytics Views
    • Helps structure data using OPENROWSET but does not support external table creation with JSON keys as columns.

My JSON file example:

{ "time": "2025-03-07T17:00:00.0491935Z", "resultDescription": ".. some text ..", ... more key:value pairs }
{ "time": "2025-03-07T17:00:00.0492294Z", "resultDescription": ".. some text ..", ... more key:value pairs }

Script used:

CREATE EXTERNAL DATA SOURCE LogsDataSource
WITH (
  LOCATION = 'https://<myworkspaceaccount>.blob.core.windows.net'
);

--
CREATE EXTERNAL FILE FORMAT JsonLinesFormat
WITH (
  FORMAT_TYPE = DELIMITEDTEXT,
  FORMAT_OPTIONS (
    FIELD_TERMINATOR = '0x0',
    ENCODING = 'UTF8'
  )
);

--
CREATE EXTERNAL TABLE MyExternalTable
(
  time NVARCHAR(255),
  resultDescription NVARCHAR(MAX)
)
WITH (
  LOCATION = '/<my_container>/...../LOGGING/y=2025/m=03/d=10/h=11/m=00/PT1H.json',
  DATA_SOURCE = LogsDataSource,
  FILE_FORMAT = JsonLinesFormat
);

--
SELECT * FROM MyExternalTable;

Expected output:
time | resultDescription | ...

2025-03-.. | ... some text | ...

2025-03-.. | ... some text | ...

Questions:

  • Is there a way to configure Log Stream to store logs directly in a Synapse-compatible format (e.g., Parquet instead of JSON) to avoid additional processing?
  • What is the most cost-effective approach to transforming JSON logs into a format suitable for external tables in Azure Synapse Analytics?

I appreciate any guidance on optimizing this process to properly create an external table with JSON keys as columns.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,233 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Venkat Reddy Navari 0 Reputation points Microsoft External Staff
    2025-03-11T13:22:20.6966667+00:00

    Hi Saul Figueroa

    Is there a way to configure Log Stream to store logs directly in a Synapse-compatible format (e.g., Parquet instead of JSON) to avoid additional processing?

    Azure Log Stream do not natively support storing logs in Synapse-compatible formats (like Parquet). By default, logs are stored in JSON Lines format when streamed via Diagnostic Settings to a Storage Account.

    Transforming JSON Logs to Parquet:

    The suggested alternatives—like Azure Data Factory (ADF), Azure Databricks, or Azure Synapse Spark Pools—are solid approaches for transforming JSON logs into a more query-efficient format like Parquet. Each method allows you to automate this transformation process, but the most cost-effective option depends on your specific needs.

    What is the most cost-effective approach to transforming JSON logs into a format suitable for external tables in Azure Synapse Analytics?

    Azure Data Factory (ADF) is cost-effective and easy to set up for regular, scheduled transformations. You only pay for data movement and transformation tasks, making it suitable for scenarios where logs are processed frequently (e.g., daily or hourly).

    Azure Synapse Spark Pools allow you to perform these transformations directly within Synapse, saving you additional costs and leveraging your existing infrastructure if you're already using Synapse for analytics.

    Azure Databricks is ideal for larger datasets and more complex transformations but could be more expensive, so it’s important to manage resources well.

    Here's an updated version of your SQL to create external table in Azure Synapse Analytics

    External File Format for JSON Lines: Instead of using FORMAT_TYPE = DELIMITEDTEXT, the correct type should be FORMAT_TYPE = JSON since you're working with JSON logs. The DELIMITEDTEXT format is for plain text or CSV-style logs, not JSON.

    FIELD_TERMINATOR: This is unnecessary for JSON data. The JSON Lines format doesn't need a field terminator (which is typically used in delimited files). Instead, you just need the correct FORMAT_TYPE to specify that the data is in JSON format.

    sql
    -- Create an external data source pointing to the Blob storage location
    CREATE EXTERNAL DATA SOURCE LogsDataSource
    WITH (
      LOCATION = 'https://<your_storage_account>.blob.core.windows.net'
    );
    -- Create an external file format for JSON Lines
    CREATE EXTERNAL FILE FORMAT JsonLinesFormat
    WITH (
      FORMAT_TYPE = JSON
    );
    -- Create the external table to map the JSON data to columns
    CREATE EXTERNAL TABLE MyExternalTable
    (
      time NVARCHAR(255),
      resultDescription NVARCHAR(MAX)
    )
    WITH (
      LOCATION = '/<my_container>/...../LOGGING/y=2025/m=03/d=10/h=11/m=00/PT1H.json',  -- Modify the path as needed
      DATA_SOURCE = LogsDataSource,
      FILE_FORMAT = JsonLinesFormat
    );
    -- Query the external table to get the logs
    SELECT * FROM MyExternalTable;
    

    I hope this helps. Please let us know if you have any further questions.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.