1
PharmaSUG 2014 - Paper BB10
Using the Power of SAS SQL
Jessica Wang, Regeneron Pharmaceuticals Inc., Basking Ridge, NJ
ABSTRACT
SAS is a flexible language in that the same task can be accomplished in numerous ways. SAS SQL is a powerful tool
for data manipulation and query. SAS SQL can make your programs more efficient, simpler and more readable.
Topics in the paper will cover: merging multiple tables by different columns and different rules; SQL in-line view; SQL
set operation; using dictionary tables; creating empty tables with pre-defined dataset structure; inserting rows into a
dataset; assigning a list of macro variables. Some tricks and tips from the author’s personal experience as a SAS
user will also be shared: e.g. using COMPRESS to save storage size; using SQL options _METHOD to understand
your SQL code better; using NOPRINT to compress unnecessary display in output window. This paper is intended for
intermediate to advanced SAS SQL users who already know the basics of SAS/SQL, and want to better exploit the
power that SQL offers.
INTRODUCTION
PROC SQL is SQL (Structured Query Language) built into SAS system, used for data retrieving, manipulation,
analysis, and reporting. It also brings SAS language elements, such as format/informat, functions, and data set
options into SQL. The syntax for a basic SQL query looks like the following:
PROC SQL;
CREATE TABLE new_table_name AS
SELECT column_names
FROM table_name
WHERE conditions
GROUP BY column_names
HAVING having_conditions
ORDER BY column_name_list
;
QUIT;
The PROC SQL statement, and the SELECT and FROM clauses are required in a SQL query. Others are optional.
The SELECT clause can list out the columns to be selected or use asterisk sign (*) to select all columns; it can also
create new columns by using function, calculation, and assign alias names for the new columns. The FROM clause
can list the original table name. CREATE TABLE clause save the query result into a new table. The WHERE clause
list the query conditions that you want to be added to the query result. The ORDER BY clause is used to sort the
result. GROUP BY is usually used together with group functions in the SELECT clause to subset the query result or
to summarize the result. The HAVING clause, used together with GROUP BY clause, select only the query result that
satisfies the condition after the group function. The order of the SQL clauses should follow the order given in the
syntax above.
In SQL language, SAS terminology dataset, observation, variable are also called table, row, and column
respectively. To differentiate from other SAS procedures, we use SQL language names in this paper. Now, we are
going to exploit the SQL power further.
In this paper, fake clinical trial data in pharmaceutical industry is used for examples; such as Demographics (DM),
Exposure (EX), Adverse Events (AE), Laboratory Test Results (LB), IVRS (Interactive Voice Response System),
Randomization, etc. Data definition metadata table is listed in Appendix 1.
1. MERGING MULTIPLE TABLES
In some cases we need to obtain data from multiple tables and combine the tables horizontally by specified
conditions, PROC SQL JOIN can do this type of work efficiently.
Example: Three tables, IVRS, RANDTRT, and TRTDEC (alias names as A, B, C), contain the treatment information
we need for patients. Table A has individual patient information and randomization numbers; B has randomization
number and corresponding treatment group. C has treatment group and treatment description. We need to merge A,
B based on randomization numbers, and merge B and C by treatment groups in order to get subject’s treatment
group and treatment description. Note: randomizing number in table A and B has different names (rand_number and