How to Merge Datasets in SAS: Step-by-Step Guide for Beginners
In the world of data analysis, you’ll rarely work with just one clean dataset. More often, you’ll find yourself needing to combine information from multiple tables such as customer details in one file and their transaction history in another. That’s where merging datasets in SAS comes in.
Whether you’re just getting started or need a quick refresher, this guide will walk you through how to merge datasets in SAS using the DATA
step, along with helpful tips to avoid common mistakes.
What Does It Mean to Merge Datasets in SAS?
Merging datasets means bringing together two or more datasets into a single dataset based on one or more common variables (often called key variables or BY variables in SAS). In SAS, this is typically done using a DATA
step with a MERGE
statement and a BY
statement.
Step 1: Ensure Datasets Are Sorted
Before merging, both datasets must be sorted by the variable which will be used to merge the data. SAS requires this for proper alignment of observations. An example of sorting datasets is included in the screenshot below.

Step 2: Use the DATA Step to Merge
Here’s a basic merge example which combines Dataset 1 and Dataset 2 by ID:


Watch for Many-to-Many Merges
When merging datasets in SAS, it’s possible that you end up with a many-to-many merge. This can be identified by paying close attention to the record count of your datasets before and after merging, and by looking at your log for the following note:

This note indicates that the datasets you’re trying to merge contain non-unique BY values, which can lead to inflated record counts or unintentional loss of data. To avoid this issue, you may need to deduplicate your data before performing the merge.
It’s critical to fully understand the data and the relationships between the BY values before merging. While many-to-many merges can sometimes be intentional, they are often an unintended consequence of the merging process.
Final Thoughts
Merging datasets in SAS is an essential skill that you’ll use regularly as you work with different data sources. By understanding the basics of sorting datasets, using the MERGE
statement, and handling common issues like duplicate values, you’ll be able to create clean, accurate merged datasets. Remember to always check your results, especially when working with non-unique keys, to ensure your merge is behaving as expected. With practice, merging data will become second nature, allowing you to focus more on analysis and insights!