What’s New in SAS Version 9? SAS is the statistical software available for installation on personal computers for all University of Oregon students, faculty, and staff at no cost. The most recent and advanced version of SAS, 9.1.3, is now available for installation on personal computers or laptops from the Computing Center documents room (MacKenzie 175). In recent years the PC version of SAS has become much more versatile, powerful, and convenient to run both small and large data analysis projects. Most new SAS users should install Version 9. If you already have version 8.2 you may want to upgrade to version 9; however, to do so you will need to go through a completely new installation process. You can register and obtain installation or license update instructions at: http://ssil.uoregon.edu/sas/ Both versions 8.2 and 9 of SAS may be loaded on your computer. A few situations may require you to install version 8.2. Documentation Online documentation for version 8.2 will continue to be available at: http://sas.uoregon.edu/sashtml/main.htm Documentation for Version 9.1.3 is available in two formats from the SAS support web site. The first website (recommended for slower connection speeds) allows you to search the table of contents for specific topics and then to select a link for further information. http://support.sas.com/onlinedoc/913/docMainpage.jsp A second website provides links to the contents of the SAS manuals in PDF format which you can browse: http://support.sas.com/documentation/onlinedoc/91pdf/index_913.html You should access these documents with a high speed connection and you must have Adobe Acrobat Reader 6.0 or later to view them. Please note, the ENTIRE contents of each manual are place into one pdf document -- some of them have 1000's of pages (the STAT manual by itself has over 5000!) so exercise paper control (many pages are blank) and be conservative when printing them. SAS is also more than willing to sell you these manuals if you wish to fill your bookcase with the latest documentation. If you are beginning to learn SAS a helpful resource to get you started is "The Little SAS Book", third edition, by Lora Delwiche and Susan Slaughter. Copies of this book are available for checkout in the Documents Room (MacKenzie 175). These chapters cover enough material to get you started in SAS. However, it is not designed to give you a theoretical introduction to the more advanced applications, such as PROC MIXED, SAS/IML, SAS/GRAPH, or SAS/ETS. What’s New in SAS 9? A complete description of what’s new in Version 9 can be found at: http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_913/whatnew_8350.pdf This document is indexed by topic and the table of contents makes it simple to look up specific topics of interest. However, it contains 270 pages with rather large print (great for viewing); however, like the SAS manuals exercise control if you decide to print portions of it. Much of what you may already know about SAS has not changed. Since SAS is backward compatible the great majority of programs written for version 8.2 (and earlier) will continue to work in 9.1.3 with few or no modifications. However, SAS has added many new features including functions for the DATA step, options for statistical programs, and specialized data analysis procedures not previously available. This article attempts to summarize only a few of the most helpful tools SAS has added to its arsenal. New Functions SAS functions include several new mathematical and statistical applications. Also included are functions which will help you more effectively work with date and character-string data. In particular, you can work with character data to extract specific results (such as the several new functions which begin with the letters ANY, NOT, and CAT). * NOTE: you should specify the LENGTH of these character data to avoid SAS assigning a default length of 200; DATA places; LENGTH location $22 city $10 state $12 ; FORMAT visit_date mmddyy10. ; INPUT visit_date anydtdte10. city state; * anydtdte10. will read a mixture of date formats; location=CATX(", ",city,state); * one of 4 new concatenate functions; DATALINES; 12/10/2004 Ashland Oregon 16JUL2004 Bertrand Nebraska ; PROC PRINT DATA=places NOobs; RUN; city state visit_date location Ashland Oregon 12/10/2004 Ashland, Oregon Bertrand Nebraska 07/16/2004 Bertrand, Nebraska New functions for descriptive statistics allow you to identify the smallest, largest, or the range of data from a specified list of variables contained in the same row (observation). Similar to previously available functions which computed the mean or the number of missing values for a list of variables, the new MEDIAN function allows you to compute the median of a set of variables from each observation. Compare the behavior of the ORDINAL function with the SMALLEST function in this example to see how they work with missing data. DATA tmp; INPUT a b c d e ; MISSING b z; min_r_or = ORDINAL(1,a,b,c,d,e); min_row = SMALLEST(1,a,b,c,d,e); mean_row = MEAN(a,b,c,d,e); median_row = MEDIAN(a,b,c,d,e); max_row = LARGEST(1,a,b,c,d,e); nmiss_row = NMISS(a,b,c,d,e); DATALINES; 3 2 z 4 5 13 12 34 34 35 9 . 5 3 1 2 5 99 z b ; PROC PRINT DATA=tmp NOobs; VAR min_r_or min_row median_row mean_row max_row nmiss_row ; RUN; median_ nmiss_ min_r_or min_row row mean_row max_row row Z 2 3.5 3.5000 5 1 12 12 34.0 25.6000 35 0 . 1 4.0 4.5000 9 1 B 2 5.0 35.3333 99 2 Functions that work with location, particularly zip codes, are also present such as the new ZIPCITY function which returns a city name and the two-character postal code that corresponds to a specified ZIP code: DATA one; zip_code=97402; location = ZIPCITY(zip_code); OUTPUT; zip_code=68927; location = ZIPCITY(zip_code); OUTPUT; proc print; run; Obs zip_code location 1 97402 Eugene, OR 2 68927 Bertrand, NE Study tip: descriptions and applications of most functions entered into SAS programs can be found in “SAS Functions by Example” by Ron Cody published by the SAS Institute. Longer Format Names SAS formats allow you to enter descriptive values for data stored as numbers or short alphabetic codes. SAS variable and dataset names were extended to 32 characters with version 8. However, it still required format names (the link between the variable and its values) to not exceed 8 characters (if so, they were truncated to the first 8). The maximum length for character format names is now 31 (since as shown below, the $ sign must also be included). The maximum length for format names for numerical data is now 32. This feature enables you to provide format names that are more descriptive. PROC FORMAT; VALUE $gender_respondent 'F'='Female' 'M'='Male'; RUN; PROC PRINT DATA=sashelp.class; VAR name sex age height weight; FORMAT sex $gender_respondent. ; RUN; PROC FORMAT now allows you to define multilabel formats for the same value. These formats may be written and applied with the procedures MEANS, SUMMARY, and TABULATE. They allow you to compute summary statistics for individual values or multiple ranges. You may read about this new feature at: http://support.sas.com/onlinedoc/912/getDoc/proc.hlp/a002473472.htm The FREQ Procedure The FREQ procedure allows you to summarize categorical data. The option Nlevels now available on the PROC FREQ statement prints a table that shows the number of levels for each variable named in the TABLES statement. PROC FREQ DATA=sashelp.class Nlevels; TABLE sex; RUN; The top portion of the output now includes the following summary: The FREQ Procedure Number of Variable Levels Variable Levels -------------------- sex 2 ODS for Statistical Graphics You can read an introduction and review new features of the Output Delivery System (ODS) at: http://darkwing.uoregon.edu/~robinh/110_ods.txt New Statistical Procedures Version 9 includes several statistical procedures which are new to SAS/STAT software. Robust Regression The new ROBUSTREG procedure (robust regression) computes stable regression results in the presence of outliers by limiting their influence. Two of the most commonly employed methods found here are Huber’s M estimation and LTS estimation. Power and Sample Size Analysis Power analysis often requires intuition and insight into your data, in addition to the data entered into the procedures. It also requires you to think about analysis issues even before any data are collected. PROC POWER and PROC GLMPOWER provide power and sample size computations for several types of prospective analyses from a variety of statistical models with the following objectives: * to determine the sample size required to detect an effect size with specified power * to characterize the power of a study to detect a desired effect size for a given number of subjects * to assess the sensitivity of power or sample size calculations to other factors Here is an example for SAS code to run a power analysis for the two-sample t-test that reproduces the results found in the first few rows of Table 2.3.5 in “Statistical Power Analysis for the Behavioral Sciences” by Jacob Cohen. Note that Cohen lists the number of subjects in one group whereas SAS computes the total number from both groups. PROC POWER; TWOSAMPLEMEANS TEST = diff ALPHA = .05 SIDES = 2 MeanDiff = .1 .2 .3 .4 .5 .6 .7 .8 1.0 1.2 1.4 STDDEV = 1 NTotal = 16 to 50 by 2 /* total in the 2 independent groups */ POWER = . ; RUN; Statements such as these with PROC POWER or PROC GLMPOWER allow you to produce the same or similar power analyses later by easily changing the input values. Power calculations can be saved to a dataset with the ODS and then plotted with the SAS/GRAPH procedure GPLOT. New Survey Analysis Procedures In previous versions of SAS the statistical procedures MEANS, FREQ, and TABULATE were the primary choice to compute summary statistics from survey data; the procedures REG, ANOVA, or LOGISTIC computed estimates for regression or analysis of variance relationships. Statistical inferences from these procedures assume data are selected from subjects at random from an infinite population by simple random sampling. If data are sampled from a population with a known finite size or from a stratified design, as is often the case with surveys, the statistical procedures listed above will not calculate the estimates and their variances properly, especially if the sample data include a substantial proportion of subjects from the entire population. Analyses of survey data which do not consider the sample design and the population size can lead to incorrect statistical inferences. SAS 9.1 introduces several new procedures for the analysis of survey data. Your data analyses may be improved with PROC SURVEYMEANS instead of PROC MEANS, PROC SURVEYREG instead of PROC REG, or PROC SURVEYLOGISTIC instead of PROC LOGISTIC. The estimated variances computed with them may be quite different from what you would compute with the previously existing procedures, since calculations come from formulas based on sample survey algorithms. For more information on new and existing procedures for survey sampling, see Chapter 10 of the SAS/STAT manual.