Monday, April 29, 2013

SAS functions

Image and video hosting by TinyPic
What's function? 
In wiki:
In mathematics, a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
What about SAS functions? 
Image and video hosting by TinyPic<
As you can see, SAS 9.2 now has nearly 500 functions available. And SAS 9.2 brings one of the greatest improvement---user-defined functions.
It's no need to say how important to learn the SAS functions if you want to improve your sas programming.
SAS/BASE provides two types of powerful functions (from SAS.com) :
1. Function
"A SAS function performs a computation or system manipulation on arguments, and returns a value that can be used in an assignment statement or elsewhere in expressions.
In Base SAS software, you can use SAS functions in DATA step programming statements, in a WHERE expression, in macro language statements, in PROC REPORT, and in Structured Query Language (SQL).
Some statistical procedures also use SAS functions. In addition, some other SAS software products offer functions that you can use in the DATA step. Refer to the documentation that pertains to the specific SAS software product for additional information about these functions."
Examples:
  • x=max(cash,credit);
  • x=sqrt(1500);
  • NewCity=left(upcase(City));
  • x=min(YearTemperature-July,YearTemperature-Dec);
  • s=repeat('----+',16);
  • x=min((enroll-drop),(enroll-fail));
  • dollars=int(cash);
  • if sum(cash,credit)>1000 then
       put 'Goal reached';
2. Call routine
"A CALL routine alters variable values or performs other system functions. CALL routines are similar to functions, but differ from functions in that you cannot use them in assignment statements or expressions.
All SAS CALL routines are invoked with CALL statements. That is, the name of the routine must appear after the keyword CALL in the CALL statement."
Examples:
  • call prxsubstr(prx,string,position);
  • call prxchange('/old/new',1+k,trim(string),result,length);
  • call set(dsid);
  • call ranbin(Seed_1,n,p,X1);
  • call label(abc{j},lab);
  • call cats(result,'abc',123);
Learn by Example
Suppose you need to import and process a messy txt file "Demog":
Image and video hosting by TinyPic
And do the following:
1. find the last name from the name
2. put (Year) after the Age
3. extract the number of weight from the WT (eg, 155lbs)
4. keep the Quote of each person with the quotation mark
5. order the scores and calculate the mean of the top three scores
6. replace the "XXX" in the "Demog" with "Unknown"
Target
Image and video hosting by TinyPic
Code
/* macro variable to store the path */
%let path=C:\Documents and Settings\Administrator\My Documents\Dropbox\Transfer\tech post\dlmstr; 

data raw; 
 /* Define the sequence of the variables and length */
 length  Name $20. Last_name $10. Gender $7. Age 3. Age_Year $15. 
   Height 3. WT $6. WT_Lbs Position $15. Quote $20.
   score1-score5 3.; 

 /* Options: truncover & dlmstr & dsd */
 infile "&path\demog.txt" truncover dlmstr="@#" dsd;

 input @;  

 /* Perl regular expressions */
 _infile_ = prxchange("s/XXX/Unknown/",-1,_infile_); 

 input          Name  $ 
   Gender  $ 
   Age 
   Height 
   WT   $ 
   Position $ 
 /* use ~ to keep quotation mark; */
   Quote ~ $    
   score1-score5; 

 /* put */
 Age_Year = put(Age, 3.)||" (Years)"; 

 /* compress */
 WT_Lbs = input( compress(WT, , "kd"), 3.); 

 /* propcase */
 Last_name = propcase( scan(Name, -1, " ") ); 

 /* call & sortn */
 call sortn(of score1-score5); 

 /* round & mean */
 top3 = round(mean(of score3-score5), 0.1); 
run; 
Discussion
Lots of things need to take a close look:
  • infile method to import txt flat file & options: truncover, dlmstr, dsd
  • Perl regular expressions : prxchange
  • automatic variable : _infile_
  • compress
  • propcase
  • scan 
  • sortn
  • call routine

Thursday, April 25, 2013

To excel

Image and video hosting by TinyPic
Here are methods for exporting data to excel:
%let gd = your_file_storage_path; 

/* 1. proc export */
proc export data = eg
 dbms = excel
 outfile = "&gd.\eg_export.xls"
 replace; 
 sheet = "page1";
run; 

/* 2. libname */
libname here "&gd."; 
libname here excel 
"&gd.\eg_libname.xls" ver=2002;
data here.eg(dblabel=yes);
 set eg;
run; 
libname here clear;

/* 3. ODS tagset */
ods tagsets.excelxp
 file  = "&gd.\eg_tagset.xls"
 style = minimal
 options (Orientation = "landscape"
 FitToPage = "yes"
 Pages_FitWidth = "1"
 Pages_FitHeight = "100"); 
proc print data = eg NOOBS; 
run; 
ods tagsets.excelxp close; 

/* 4. ODS CSV */
ods csv file = "&gd.\eg_csv.csv";
proc print data = eg;
run; 
ods csv close; 

/* 5. ODS html */
ods html body = "&gd.\eg_html.xls";
proc print data = eg; 
run; 
ods html close; 

/* 6. ODS MSOFFICE2k */
ods tagsets.msoffice2k file = "&gd.\eg_msoffice2k.xls";
proc print data = eg; 
run; 
ods tagsets.msoffice2k close; 

/* 7. DDE */

/* 8. Integrated object model */

Image and video hosting by TinyPic

Tuesday, April 16, 2013

Transforming the data structure

Image and video hosting by TinyPic
Fate rarely calls upon us at a moment of our choosing. 
---Prime, Transformers Revenge of the Fallen
Why do we need "Transforming" the dataset structure?
Lots of reasons. (it's our fate!)
Like after proc freq or means, you need to transform the statistics for various tables.

Basically: 
  • You may want to create multiple observations from a single observation to count frequencies, or to allow for BY variable processing.
  • You may also want to restructure SAS data sets for certain statistical analyses. 
The routine would be proc transpose. But array would be also good to know.

Example: 
Here we have a data set "salary" from another "jobs". The data set is more like "wide", horizontal structure.
Image and video hosting by TinyPic
First, we will "transform" it to "deep", vertical structure like this:
Image and video hosting by TinyPic
Secondly, we will "transform" it back to horizontal structure.
Image and video hosting by TinyPic

Code: 
We will do the following things:
  • 0. input the data
  • 1.1 transform the data to vertical by array
  • 1.2 transform back to horizontal by array
  • 2.1 transform the data to vertical by transpose
  • 2.2 transform the back to horizontal by transpose
dm 'clear output'; 
dm 'clear log'; 

/*==============================================================================================*/
/* Programe    : transformer.sas      */
/* Objective  :  Transform Data Set Structure by array and transpose */
/* Author   : Ray       */
/* SAS Version  :  9.2              */
/*==============================================================================================*/

/*----------------------------------------*/
/* 0 Input data             */
/*----------------------------------------*/
data jobs; 
    infile datalines missover;
    input #1 Info_Company   $20.
  #2  Score_Personal   10.2
  #3  Info_Web          $50.
  #4  Info_HQ    $50.
  #5 Info_Country   $4.
  #6 Info_Employees   10.
  #7 Job_Statistician  $50.
  #8 Salary_Stat   10.2
  #9 Job_Programmer   $50.
  #10 Salary_Pro          10.2
  #11     Job_Manager   $50.
  #12     Salary_Man   10.2; 
datalines; 
ICON
60.00
www.iconplc.com
Dublin, Ireland
IRL
5000
Statistician I

SAS Programmer Analyst I
59553.00
Group Manager
90000.00
Covance
80.00
www.covance.com
Princeton, NJ
USA
11200
Bio Statistician I
59000.00
SAS Programmer I
51872.00
Manager
87029.00
Medpace
81.00
www.medpace.com
Cincinnati, OH
USA
990
Bio Statisitican I
70418.00
SAS Programmer I
40000.00
Clinical Trial Manager
84027.00
Shire Pharmaceuticals
90.00
www.shire.com
Basingstoke, United Kingdom
GBR
5000
Statistician I

SAS Programmer I
61000.00
Manager
96000.00
TechData Service
90.01
www.techdataservice.com
King of Prussia, PA
USA
5000
Statistician

SAS Programmer
86667.00
Manager

; 
run; 

* subset data set to salary; 
data salary; 
    set jobs (keep = Info_Company  Job_Statistician  Job_Programmer  Job_Manager 
              Salary_Stat      Salary_Pro  Salary_Man); 
run; 

* sort the salary; 
proc sort data = salary; 
    by Info_Company; 
run; 

* print out; 
proc print data = salary; 
run; 

/*--------------------------------------------------------------*/
/* 1.1 Transform the structure from horizontal to vertical */
/* In other words:   from wide to deep  */
/*--------------------------------------------------------------*/
* create salary_vertical; 
data salary_vertical; 
    set salary; 
    array job{3} Job_Statistician    Job_Programmer Job_Manager ; 
    array sal{3} Salary_Stat      Salary_Pro   Salary_Man; 
    do i = 1 to 3; 
        Position = job{i}; 
 Salary = sal{i}; 
 output; 
    end; 
    keep Info_Company Position Salary; 
run; 

* change the Salary from numeric to character; 
data salary_ver;
    set salary_vertical; 
    sal = put(Salary, 10.2); 
    drop Salary; 
    rename sal=Salary; 
run; 

proc print data = salary_vertical; 
run; 

/*----------------------------------------------------------*/
/* 1.2 Transform back to horizontal              */
/*----------------------------------------------------------*/
* transform the structure from vertical back to horizontal; 
data salary_horizontal; 
    set salary_ver; 
    by Info_Company; 
    if first.Info_Company then do; 
        i = 0; 
 array all(3, 3) $50. a1-a9; 
 array var{3} $50. Info_Company Position Salary; 
 retain a1-a9; 
    end; 
 i + 1; 
 do j = 1 to 3; 
     all(i, j) = var(j); 
 end; 
    if last.Info_Company then output; 
    rename  a2 = Job_Statistician
  a3 = Salary_Stat
  a5 = Job_Programmer
  a6 = Salary_Pro
  a8 = Job_Manager
  a9 = Salary_Man
    ; 
    keep Info_Company a2 a3 a5 a6 a8 a9; 
run; 

* sort the dataset; 
proc sort data = salary_horizontal; 
    by Info_Company; 
run; 

proc print data = salary_horizontal; 
run; 

/*------------------------------------------------------------------------------*/
/* 2.1 Transform the structure from horizontal to vertical by proc transpose */
/*------------------------------------------------------------------------------*/
proc transpose data = salary 
 out = first
 prefix = Position; 
 by Info_Company; 
 var Job_Statistician Job_Programmer Job_Manager; 
run; 

proc transpose data = salary 
 out = second
 prefix = Salary; 
 by Info_Company; 
 var Salary_Stat Salary_Pro Salary_Man; 
run; 

* merge the first and second; 
* sort before you merge. but it's already been sorted; 
data salary_ver_transpose; 
 merge first second; 
 by Info_Company; 
 drop _NAME_; 
 rename  Position1 = Position
 Salary1 = Salary
 ; 
run; 

/*----------------------------------------------------------*/
/* 2.2 Transform back to horizontal by proc transpose     */
/*----------------------------------------------------------*/
proc transpose data = salary_ver_transpose
 out = third
 prefix = Position; 
 by Info_Company; 
 var Position; 
run; 

proc transpose data = salary_ver_transpose
 out = fourth
 prefix = Salary;
 by Info_Company; 
 var Salary; 
run; 

* sort before you merge. but it's already been sorted; 
data salary_hor_transpose; 
 keep Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3; 
 format Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3; 
 merge third fourth; 
 by Info_Company;
 rename  Position1 = Job_Statistician
  Salary1 = Salary_Stat
  Position2 = Job_Programmer
  Salary2 = Salary_Pro
  Position3 = Job_Manager
  Salary3 = Salary_Man
 ; 
run; 


Good job!
Now you've got the power to save the world. Go! Transformers!

Wednesday, April 10, 2013

Timestamp

Image and video hosting by TinyPic
"What's the time?"
Certainly it's very important for everyone to keep track of the time.
For SAS programmer, sometimes it's also important to label the files or certain place (like footnote or title) with "timestamp".
Here comes a simple SAS macro to perform this function:
%now( ) 
%MACRO now( fmt= DATETIME23.3 ) /DES= 'timestamp'; 
 %sysfunc( compress( %SYSFUNC( DATETIME(), &fmt))  )
%MEND;
It's a utility macro. You can put it in the autocall library whenever you need it.

Example: 
Here I put it in the title and file name for the data set "eg".
ods listing close; 
options nonumber nodate; 
ods pdf file = "C:\Bancova\eg_%now(fmt=b8601dt).pdf"; 
title "eg_%now(fmt=b8601dt)"; 
proc print data = eg; 
run; 
ods pdf close; 
ods listing; 
You will see the result:
Image and video hosting by TinyPic
For file names, the ISO 8601 format YYYYMMDDThhmmss maintains readability and also sorts in chronological order.
Image and video hosting by TinyPic

Remember: 
Although it is very short, the application of this base SAS macro requires knowledge of the following:
  • macro basics (know how to write a macro)
  • format (know how to define a date format)
  • %sysfunc (one of the most important tools to the macro laguage since the %let statement)
  • ISO 8601 standard (It's compliant with CDISC, FYI)
  • ODS pdf (This destination is part of the ODS PRINTER family) 

Simple Quiz: 
  • Why do we need compress inside the %now( ) ? 
References:
Does your SAS code know what time it is? Add a timestamp.