Hello @Pratik Roy ,
Welcome to the MS Q&A platform.
I see you are using UDFs. In general, UDFs are slow because Spark cannot optimize them as it does with SQL functions.
This blog explains about UDFs.
Apart from this, I guess the data load is creating a huge data frame that is not able to fit into the memory.
Can you please increase the cluster size and number of worker nodes from the current 8 nodes to a higher number and see if it makes any difference?
also, please check the network configuration of ADLS. A private link to connect to ADLS is recommended.