Selecting Cases that have Duplicates in SPSS

To delete all cases that have duplicates in SPSS, you can do the following:

  1. Click: Data - Identify Duplicate Cases...
  2. Choose: Define matching cases by: (this is the primary variable, like a customer's ID)
  3. (If needed) Sort within matching groups by: (this might be a variable that contains the various groups a customer is in)
  4. To ignore the duplicate cases, check the "Indicator of primary cases" box in "Variables to Create"
  5. To ignore cases that have duplicates AND their duplicate cases, check the "Sequential count of matching case in each group" box.
  6. Click: OK

Depending on which checkbox you chose in #4 and/or #5, different variables will be created. If you checked the box in #4, the PrimaryLast variable will contain a 0 for each case. Corresponding duplicate cases will have a 1, 2, 3, etc. Additionally, all the duplicate cases will be at the top of your dataset. You can then select all data that has a PrimaryLast = 0 to ignore the duplicate cases using Data - Select Cases...

The MatchSequence variable will be created if you selected the checkbox in #5. If duplicate cases are found, they will be given an incremental value of 1, 2, 3, etc. Any cases that don't have duplicates will have a MatchSequence value of 0. If you want to ignore all cases that have duplicates, you can set a filter that includes MatchSequence = 0 with Data - Select Cases...