Hello RJ -
To speed up the process of inserting only new records into a large table, you can consider using a more efficient approach. Here are a few methods that might help:
Use a LEFT JOIN: Instead of using the EXCEPT
clause, you can use a LEFT JOIN
to identify new records. This can be more efficient, especially with proper indexing.
INSERT INTO [dbo].[large_accumulation_table] (amt, lastdate, type, doc1, line1, type1, doc2, line2, type2)
SELECT t.amt, t.lastdate, t.type, t.doc1, t.line1, t.type1, t.doc2, t.line2, t.type2
FROM [dbo].[todays_data] t
LEFT JOIN [dbo].[large_accumulation_table] l
ON t.amt = l.amt AND t.lastdate = l.lastdate AND t.type = l.type AND t.doc1 = l.doc1 AND t.line1 = l.line1 AND t.type1 = l.type1 AND t.doc2 = l.doc2 AND t.line2 = l.line2 AND t.type2 = l.type2
WHERE l.amt IS NULL;
Batch Processing: If the dataset is extremely large, consider processing the data in smaller batches. This can help reduce the load on the server and improve performance.
DECLARE @BatchSize INT = 10000;
DECLARE @BatchStart INT = 0;
DECLARE @BatchEnd INT = @BatchSize;
WHILE @BatchStart < (SELECT COUNT(*) FROM [dbo].[todays_data])
BEGIN
INSERT INTO [dbo].[large_accumulation_table] (amt, lastdate, type, doc1, line1, type1, doc2, line2, type2)
SELECT t.amt, t.lastdate, t.type, t.doc1, t.line1, t.type1, t.doc2, t.line2, t.type2
FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum FROM [dbo].[todays_data]) t
LEFT JOIN [dbo].[large_accumulation_table] l
ON t.amt = l.amt AND t.lastdate = l.lastdate AND t.type = l.type AND t.doc1 = l.doc1 AND t.line1 = l.line1 AND t.type1 = l.type1 AND t.doc2 = l.doc2 AND t.line2 = l.line2 AND t.type2 = l.type2
WHERE l.amt IS NULL AND t.RowNum BETWEEN @BatchStart AND @BatchEnd;
SET @BatchStart = @BatchEnd + 1;
SET @BatchEnd = @BatchEnd + @BatchSize;
END;
Use Indexes: Ensure that you have appropriate indexes on the columns used in the join condition. This can significantly improve the performance of the query.
Consider Using HASHBYTES: If the number of columns is large, you can use HASHBYTES
to create a hash of the row and compare the hashes instead of comparing each column individually.
INSERT INTO [dbo].[large_accumulation_table] (amt, lastdate, type, doc1, line1, type1, doc2, line2, type2)
SELECT t.amt, t.lastdate, t.type, t.doc1, t.line1, t.type1, t.doc2, t.line2, t.type2
FROM [dbo].[todays_data] t
LEFT JOIN [dbo].[large_accumulation_table] l
ON HASHBYTES('SHA2_256', CONCAT(t.amt, t.lastdate, t.type, t.doc1, t.line1, t.type1, t.doc2, t.line2, t.type2)) = HASHBYTES('SHA2_256', CONCAT(l.amt, l.lastdate, l.type, l.doc1, l.line1, l.type1, l.doc2, l.line2, l.type2))
WHERE l.amt IS NULL;
These methods should help improve the performance of your insert operation. Let me know if you need further assistance!