calllogid is the primary key in the table
uicallid is a row that _should_ be unique but is not because of data from a legacy system being imported in to the table
The logs that were imported twice were identical in every way except for the calllogid, and I chose to match on uicallid because that is a unique value that should never be duplicated.
I started out trying to get the calllogid of every record that was duplicated and I used a subquery to do this:
SELECT
count(uicallid) as count,
( SELECT TOP 1 CallLogID from dbo.tablename as sub where sub.uicallid = main.uicallid) as id
FROM dbo.tablename as main
group by main.uicallid
having count(main.uicallid) > 1
Manually checking the output I was able to see how many times each record was duplicated and each record from this morning showed up twice, just as i expected. Once I was confident I had the correct data selected I asked a coworker for help and I was taught you could do a delete where fieldname IN (<a subquery>).
Now the final query looks like this:
DELETE FROM dbo.tablename
WHERE CallLogID IN ( select
( SELECT TOP 1 CallLogID from dbo.tablename as sub where sub.uicallid = main.uicallid) as id
FROM dbo.tablename as main
group by main.uicallid
having count(main.uicallid) > 1 )
The query to delete the duplicate records ran on the 5.5 million row 1GB table in 15 seconds, and it took 3 seconds to do the select to find them. Not tooo bad. If there was more than one duplicate entry this query would only delete one of them, so the query would need to be run multiple times to delete multiple entries.
WHERE CallLogID IN ( select
( SELECT TOP 1 CallLogID from dbo.tablename as sub where sub.uicallid = main.uicallid) as id
FROM dbo.tablename as main
group by main.uicallid
having count(main.uicallid) > 1 )
The query to delete the duplicate records ran on the 5.5 million row 1GB table in 15 seconds, and it took 3 seconds to do the select to find them. Not tooo bad. If there was more than one duplicate entry this query would only delete one of them, so the query would need to be run multiple times to delete multiple entries.
No comments:
Post a Comment