When analyzing PSAT or SAT data from the College Board, I have to frequently merge the PSAT/SAT data with a school's spreadsheet of students' course and GPA information. That way, I can perform a MANOVA on the data with PSAT/SAT subscores as dependent variables and course group or GPA group (e.g., 0.0 - 1.9 GPA as one group) as independent variables. Before merging the files, I have to decide on a key record that is present in both files to use as an index. Usually, it is the student ID. The following instructions work with SPSS version 22.
Merging Files That Do Not Have Duplicate Keyed Records
- Open both files in SPSS. Then sort each file in ascending order on the index key variable. It's important that the files are sorted in ascending order on the key variable! Right-click on the index key variable's column and choose "Sort Ascending" from the pop-up menu.
- Ensure that the key variables in both datasets are named exactly the same. (You might have to go to the Variable View of the second dataset to make the key variable's name be the same as the key variable in the primary dataset.)
- Click on the main data file in SPSS to make it visible and active.
- Click: Data - Merge Files - Add Variables...
- In the "Add Variables..." dialog box that pops up, click on the secondary data file in the open dataset frame to highlight it. Then, click on the Continue button.
- A new "Add Variables..." pop up window will appear. In the "New Active Dataset" frame, you can choose variables to be excluded from the merge. Variables marked with a "*" are from the main dataset while variables marked with a "+" are from the secondary dataset. If SPSS finds a variable with the same name in both datasets (which it should if you ensured the key variable is named the same in both datasets), it will be displayed in the "Excluded Variables" frame.
- Check the box next to "Match cases on key variables".
- Click on the key variable in "Excluded Variables" to select it. Then, click on the arrow button next to the "Key Variables" frame to enable the key variable.
- Next, check the box next to "Cases are sorted in order of key variables in both datasets". The radio button next to "Both files provide cases" should automatically be selected.
- Click the OK button to perform the merge. A dialog box will pop up to warn you about not sorting the datasets in ascending order. Click OK.
If there is a one-to-many relationship (the primary dataset has duplicate key values, e.g., multiple records for each subject), only the first record will get the merged data. Subsequent duplicate keyed records will not. It is best to ensure that both datasets have one record per keyed variable.
Merging with a Data File that has Duplicate Key Records
Sometimes, you might have a primary data file that has multiple records for each subject. For example, you might have a dataset that has a record for each course (with the course's data) for a student. This would result in multiple records (one for English, one for Math, one for Social Studies, etc.) for each student. To merge a secondary dataset with this kind of primary dataset, the secondary dataset must not contain duplicate keyed records. To check and make sure the secondary dataset does not have duplicates in SPSS, do the following:
- With the dataset visible and active, click on Data - Identify Duplicate Cases
- Select the variable that is the key in the left frame and click the arrow button next to "Define matching cases by:" That way, the keyed variable is selected for processing.
- Check: Sequential count of matching case in each group
- The "Display frequencies for created variables" should automatically be checked.
- Click: OK
- The output should then display if you have any duplicate cases that you need to address. ("Valid" should be 0 in the "Sequential count of matching cases" table.)
Now that you've ensured there are no duplicate cases of key variables in the secondary dataset, go back to the main dataset by making it visible and active. Follow the same steps as above in merging the files from 1 to 8, however, you will need to do something different starting at step 9:
9. Check the box next to "Cases are sorted in order of key variables in both datasets".
10. Choose "Non-active dataset is keyed table" (previously, "Both files provide cases" was chosen).
11. Click the OK button to perform the merge. A dialog box will pop up to warn you about not sorting the datasets in ascending order. Click OK.
The new merged dataset should now have the new variables and properly duplicated for each subject.