Is there a way to configure Log Stream to store logs directly in a Synapse-compatible format (e.g., Parquet instead of JSON) to avoid additional processing?
Azure Log Stream do not natively support storing logs in Synapse-compatible formats (like Parquet). By default, logs are stored in JSON Lines format when streamed via Diagnostic Settings to a Storage Account.
Transforming JSON Logs to Parquet:
The suggested alternatives—like Azure Data Factory (ADF), Azure Databricks, or Azure Synapse Spark Pools—are solid approaches for transforming JSON logs into a more query-efficient format like Parquet. Each method allows you to automate this transformation process, but the most cost-effective option depends on your specific needs.
What is the most cost-effective approach to transforming JSON logs into a format suitable for external tables in Azure Synapse Analytics?
Azure Data Factory (ADF) is cost-effective and easy to set up for regular, scheduled transformations. You only pay for data movement and transformation tasks, making it suitable for scenarios where logs are processed frequently (e.g., daily or hourly).
Azure Synapse Spark Pools allow you to perform these transformations directly within Synapse, saving you additional costs and leveraging your existing infrastructure if you're already using Synapse for analytics.
Azure Databricks is ideal for larger datasets and more complex transformations but could be more expensive, so it’s important to manage resources well.
Here's an updated version of your SQL to create external table in Azure Synapse Analytics
External File Format for JSON Lines: Instead of using FORMAT_TYPE = DELIMITEDTEXT, the correct type should be FORMAT_TYPE = JSON since you're working with JSON logs. The DELIMITEDTEXT format is for plain text or CSV-style logs, not JSON.
FIELD_TERMINATOR: This is unnecessary for JSON data. The JSON Lines format doesn't need a field terminator (which is typically used in delimited files). Instead, you just need the correct FORMAT_TYPE to specify that the data is in JSON format.
sql
-- Create an external data source pointing to the Blob storage location
CREATE EXTERNAL DATA SOURCE LogsDataSource
WITH (
LOCATION = 'https://<your_storage_account>.blob.core.windows.net'
);
-- Create an external file format for JSON Lines
CREATE EXTERNAL FILE FORMAT JsonLinesFormat
WITH (
FORMAT_TYPE = JSON
);
-- Create the external table to map the JSON data to columns
CREATE EXTERNAL TABLE MyExternalTable
(
time NVARCHAR(255),
resultDescription NVARCHAR(MAX)
)
WITH (
LOCATION = '/<my_container>/...../LOGGING/y=2025/m=03/d=10/h=11/m=00/PT1H.json', -- Modify the path as needed
DATA_SOURCE = LogsDataSource,
FILE_FORMAT = JsonLinesFormat
);
-- Query the external table to get the logs
SELECT * FROM MyExternalTable;
I hope this helps. Please let us know if you have any further questions.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.