Files vanished from storage account

Aleksandra Stan 0 Reputation points
2025-01-13T06:37:02.28+00:00

Morning!

I was wondering if anyone knows what might be the case. We have list of jsons files which we store in a blob storage. We need those files in the Azure synapse therefore I created a notebook to ingest those files. After I ran an ingestion couple of times - files vanished from that source folder. No logs for deletion whatsoever. No clue why this happened. The ingestion code :

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder \
    .appName("Ingest JSON from Synapse") \
    .getOrCreate()

# base path
base_path = "abfss://container@storageaccountX.dfs.core.windows.net/folder1"
folder_name = "affectedfolder"
target_path = f"{base_path}/{folder_name}"

# load all JSON files
df = spark.read.format("json") \
    .option("multiLine", True) \
    .load(f"{target_path}/*.json") \
    .withColumn("file_path", F.input_file_name())

# file_date from the JSON file name
df = df.withColumn("file_date", F.regexp_extract(F.col("file_path"), r"filename(\d{4}-\d{2}-\d{2})\.json", 1))

# silver path
silver_base_path = "abfss://silver@storageaccountX.dfs.core.windows.net/"
table_name = folder_name
silver_table_path = f"{silver_base_path}/{table_name}"

# save the DF
df.write.format("delta") \
    .mode("overwrite") \
    .option("delta.columnMapping.mode", "name") \
    .option("overwriteSchema", "true") \
    .save(silver_table_path)

print(f"Data ingested successfully into '{silver_table_path}'.")
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,120 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Abiola Akinbade 21,460 Reputation points
    2025-01-13T07:17:36.58+00:00

    Hello Aleksandra Stan,

    Thanks for your question.

    To check delete operations, go to Azure Storage Account > Monitoring > Activity log and check for any delete operations related to the container. If logging isn’t enabled, you may need to enab;e Azure Storage Diagnostics to capture future events.

    That's the best way to capture Azure operations.

    Per your code, maybe temporarily remove the .mode("overwrite") and write to a new path instead of silver_table_path to confirm that this operation doesn’t affect the source folder.

    You can mark it 'Accept Answer' and 'Upvote' if this helped you

    Regards,

    Abiola


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.