How to Delete Duplicate Rows in SQL

How to Delete Duplicate Rows in SQL

If you have a table with duplicate data, it can be challenging to clean up the table without deleting important information. However, SQL provides a straightforward way to delete duplicate rows in a table. In this tutorial, you will learn how to identify and delete duplicate rows using SQL.

Step 1: Identify Duplicate Rows

Before deleting duplicate rows, you need to identify which rows are duplicates. You can do this by using the GROUP BY clause and the HAVING clause. The GROUP BY clause groups rows with the same values in the specified columns, and the HAVING clause filters groups based on a specific condition.

For instance, let’s say we have a table called students with the columns id, name, age, and class. To identify duplicate rows in the students table based on the name and age columns, you can use the following SQL query:

SELECT name, age, COUNT(*) 
FROM students
GROUP BY name, age
HAVING COUNT(*) > 1;

This query groups the rows in the students table by name and age and returns the number of occurrences of each group. The HAVING clause filters the groups that have more than one occurrence, which means they are duplicates.

The output of this query will show you the duplicate rows based on the name and age columns, along with the number of times they occur.

Step 2: Delete Duplicate Rows

Once you have identified the duplicate rows, you can delete them using the DELETE statement. To delete duplicate rows based on the name and age columns, you can use the following SQL query:

DELETE FROM students 
WHERE id NOT IN (
   SELECT MIN(id)
   FROM students
   GROUP BY name, age
);

This query deletes all rows from the students table where the id is not the minimum value of each group of duplicates. The MIN() function selects the minimum id value for each group of duplicates.

Troubleshooting Tips

  • Always make a backup of your data before deleting any rows from a table.
  • Be careful when using the DELETE statement as it permanently deletes data from the table.
  • Make sure you have correctly identified the duplicate rows before deleting them.
  • Double-check the WHERE clause in the DELETE statement to ensure that it only deletes the intended rows.

Conclusion

Deleting duplicate rows in SQL is a straightforward process that involves identifying the duplicates and deleting them using the DELETE statement. By using the GROUP BY and HAVING clauses, you can easily identify duplicate rows based on specific columns. And, by using the DELETE statement with the WHERE clause, you can delete the duplicate rows while retaining the unique rows in the table.