SQL Server MERGE vs UPSERT
Today we are comparing a performance between the T-SQL MERGE statement and the UPSERT.
A test has been performed on:
SQL SERVER 2019 (15.0.4043.16), Maximum server memory = 16000 MB, Cost threshold for parallelism = 50, 6 physical cores.
I prepared an [Integration].[Vote] table which is our target table. I won’t use any hints that could lock tables.
Its size is 19 GB and 183 million records.
Our source is an [Integration].[VoteView] view with the same structure as the table above.
CREATE VIEW [Integration].[VoteView]
AS
SELECT
v.[Id] AS [VoteId]
,p.Title AS [PostTitle]
,p.Tags AS [PostTags]
,u.DisplayName as [UserName]
,[BountyAmount] as [BountyAmount]
,vt.Name as [VoteType]
,v.[CreationDate] AS [VoteCreationDate]
,p.CreationDate AS [PostCreationDate]
FROM [dbo].[Votes] v
LEFT JOIN dbo.VoteTypes vt
ON vt.Id = v.VoteTypeId
LEFT JOIN dbo.Posts p
ON p.Id = v.PostId
LEFT JOIN dbo.Users u
ON u.Id = v.UserId
GO
Before every test I had updated, deleted and inserted records according to the script bellow.
--UPDATE
UPDATE Integration.Vote
SET
PostTitle = 'updated',
UserName = 'updated',
BountyAmount = 10,
VoteCreationDate = '2000-01-01'
WHERE
VoteID BETWEEN 200000 AND 1500000
--DELETE
DELETE FROM Integration.Vote
WHERE VoteId between 2000000 AND 3400000
--INSERT
INSERT INTO Integration.Vote
SELECT TOP(1200000)
1988763080
,'NoTitle'
,'NoTag'
,'NoUserName'
,0
,'NoVoteType'
,'2020-01-01'
,'2020-01-01'
FROM sys.all_columns c1
CROSS JOIN sys.all_columns c2
Updated: 1.043.253 rows, Deleted: 1.233.373 rows, Inserted: 1.200.000 rows.
Firstly, we check how much time the MERGE performs this job.
SET STATISTICS IO ON
DBCC DROPCLEANBUFFERS
MERGE Integration.Vote t
USING Integration.VoteView s
ON s.VoteId = t.VoteId
WHEN MATCHED AND t.VoteCreationDate <> s.VoteCreationDate
THEN UPDATE SET
t.[PostTitle] = s.[PostTitle]
,t.[PostTags] = s.[PostTags]
,t.[UserName] = s.[UserName]
,t.[BountyAmount] = s.[BountyAmount]
,t.[VoteType] = s.[VoteType]
,t.[VoteCreationDate] = s.[VoteCreationDate]
,t.[PostCreationDate] = s.[PostCreationDate]
WHEN NOT MATCHED BY TARGET
THEN INSERT ([VoteId],[PostTitle],[PostTags],[UserName],[BountyAmount],[VoteType],[VoteCreationDate],[PostCreationDate])
VALUES (s.[VoteId],s.[PostTitle],s.[PostTags],s.[UserName],s.[BountyAmount],s.[VoteType],s.[VoteCreationDate],s.[PostCreationDate])
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
The next is the UPSERT with the UPDATE.
SET STATISTICS IO ON
DBCC DROPCLEANBUFFERS
IF OBJECT_ID('tempdb..#VoteId') IS NOT NULL
BEGIN
DROP TABLE #VoteId
END
SELECT
v.VoteId AS [vVoteId]
,vv.VoteId AS [vvVoteId]
INTO #VoteId
FROM Integration.Vote v
FULL JOIN Integration.VoteView vv
ON vv.VoteId = v.VoteId
WHERE
v.VoteId IS NULL
OR vv.VoteId IS NULL
DELETE FROM Integration.Vote
WHERE VoteId IN (SELECT vVoteId FROM #VoteId WHERE vvVoteId IS NULL)
UPDATE t
SET
t.[PostTitle] = s.[PostTitle]
,t.[PostTags] = s.[PostTags]
,t.[UserName] = s.[UserName]
,t.[BountyAmount] = s.[BountyAmount]
,t.[VoteType] = s.[VoteType]
,t.[VoteCreationDate] = s.[VoteCreationDate]
,t.[PostCreationDate] = s.[PostCreationDate]
FROM Integration.Vote t
JOIN Integration.VoteView s
ON s.VoteId = t.VoteId
WHERE s.VoteId = t.VoteId AND s.VoteCreationDate <> t.VoteCreationDate
INSERT INTO Integration.Vote
SELECT *
FROM Integration.VoteView
WHERE VoteId IN (SELECT vvVoteId FROM #VoteId WHERE vVoteId IS NULL)
And at the end, the UPSERT without the UPDATE — updated records are deleted and inserted.
SET STATISTICS IO ON
DBCC DROPCLEANBUFFERS
IF OBJECT_ID('tempdb..#VoteId') IS NOT NULL
BEGIN
DROP TABLE #VoteId
END
SELECT
v.VoteId AS [vVoteId]
,vv.VoteId AS [vvVoteId]
INTO #VoteId
FROM Integration.Vote v
FULL JOIN Integration.VoteView vv
ON vv.VoteId = v.VoteId
WHERE
v.VoteId IS NULL
OR vv.VoteId IS NULL
OR (v.VoteId IS NOT NULL AND vv.VoteId IS NOT NULL AND v.VoteCreationDate <> vv.VoteCreationDate)
DELETE FROM Integration.Vote
WHERE VoteId IN (SELECT vVoteId FROM #VoteId WHERE vvVoteId IS NULL OR (vVoteId IS NOT NULL AND vvVoteId IS NOT NULL))
INSERT INTO Integration.Vote
SELECT *
FROM Integration.VoteView
WHERE VoteId IN (SELECT vvVoteId FROM #VoteId WHERE vVoteId IS NULL OR (vVoteId IS NOT NULL AND vvVoteId IS NOT NULL))
Results:
MERGE: 00:12:26
UPSERT(with an UPDATE): 00:03:52
The most expensive is the UPDATE statement, so the rebuilt procedure avoids it.
The UPSERT (without an UPDATE): 00:02:06