Tuesday, April 16, 2013

Transforming the data structure

Image and video hosting by TinyPic
Fate rarely calls upon us at a moment of our choosing. 
---Prime, Transformers Revenge of the Fallen
Why do we need "Transforming" the dataset structure?
Lots of reasons. (it's our fate!)
Like after proc freq or means, you need to transform the statistics for various tables.

Basically: 
  • You may want to create multiple observations from a single observation to count frequencies, or to allow for BY variable processing.
  • You may also want to restructure SAS data sets for certain statistical analyses. 
The routine would be proc transpose. But array would be also good to know.

Example: 
Here we have a data set "salary" from another "jobs". The data set is more like "wide", horizontal structure.
Image and video hosting by TinyPic
First, we will "transform" it to "deep", vertical structure like this:
Image and video hosting by TinyPic
Secondly, we will "transform" it back to horizontal structure.
Image and video hosting by TinyPic

Code: 
We will do the following things:
  • 0. input the data
  • 1.1 transform the data to vertical by array
  • 1.2 transform back to horizontal by array
  • 2.1 transform the data to vertical by transpose
  • 2.2 transform the back to horizontal by transpose
dm 'clear output'; 
dm 'clear log'; 

/*==============================================================================================*/
/* Programe    : transformer.sas      */
/* Objective  :  Transform Data Set Structure by array and transpose */
/* Author   : Ray       */
/* SAS Version  :  9.2              */
/*==============================================================================================*/

/*----------------------------------------*/
/* 0 Input data             */
/*----------------------------------------*/
data jobs; 
    infile datalines missover;
    input #1 Info_Company   $20.
  #2  Score_Personal   10.2
  #3  Info_Web          $50.
  #4  Info_HQ    $50.
  #5 Info_Country   $4.
  #6 Info_Employees   10.
  #7 Job_Statistician  $50.
  #8 Salary_Stat   10.2
  #9 Job_Programmer   $50.
  #10 Salary_Pro          10.2
  #11     Job_Manager   $50.
  #12     Salary_Man   10.2; 
datalines; 
ICON
60.00
www.iconplc.com
Dublin, Ireland
IRL
5000
Statistician I

SAS Programmer Analyst I
59553.00
Group Manager
90000.00
Covance
80.00
www.covance.com
Princeton, NJ
USA
11200
Bio Statistician I
59000.00
SAS Programmer I
51872.00
Manager
87029.00
Medpace
81.00
www.medpace.com
Cincinnati, OH
USA
990
Bio Statisitican I
70418.00
SAS Programmer I
40000.00
Clinical Trial Manager
84027.00
Shire Pharmaceuticals
90.00
www.shire.com
Basingstoke, United Kingdom
GBR
5000
Statistician I

SAS Programmer I
61000.00
Manager
96000.00
TechData Service
90.01
www.techdataservice.com
King of Prussia, PA
USA
5000
Statistician

SAS Programmer
86667.00
Manager

; 
run; 

* subset data set to salary; 
data salary; 
    set jobs (keep = Info_Company  Job_Statistician  Job_Programmer  Job_Manager 
              Salary_Stat      Salary_Pro  Salary_Man); 
run; 

* sort the salary; 
proc sort data = salary; 
    by Info_Company; 
run; 

* print out; 
proc print data = salary; 
run; 

/*--------------------------------------------------------------*/
/* 1.1 Transform the structure from horizontal to vertical */
/* In other words:   from wide to deep  */
/*--------------------------------------------------------------*/
* create salary_vertical; 
data salary_vertical; 
    set salary; 
    array job{3} Job_Statistician    Job_Programmer Job_Manager ; 
    array sal{3} Salary_Stat      Salary_Pro   Salary_Man; 
    do i = 1 to 3; 
        Position = job{i}; 
 Salary = sal{i}; 
 output; 
    end; 
    keep Info_Company Position Salary; 
run; 

* change the Salary from numeric to character; 
data salary_ver;
    set salary_vertical; 
    sal = put(Salary, 10.2); 
    drop Salary; 
    rename sal=Salary; 
run; 

proc print data = salary_vertical; 
run; 

/*----------------------------------------------------------*/
/* 1.2 Transform back to horizontal              */
/*----------------------------------------------------------*/
* transform the structure from vertical back to horizontal; 
data salary_horizontal; 
    set salary_ver; 
    by Info_Company; 
    if first.Info_Company then do; 
        i = 0; 
 array all(3, 3) $50. a1-a9; 
 array var{3} $50. Info_Company Position Salary; 
 retain a1-a9; 
    end; 
 i + 1; 
 do j = 1 to 3; 
     all(i, j) = var(j); 
 end; 
    if last.Info_Company then output; 
    rename  a2 = Job_Statistician
  a3 = Salary_Stat
  a5 = Job_Programmer
  a6 = Salary_Pro
  a8 = Job_Manager
  a9 = Salary_Man
    ; 
    keep Info_Company a2 a3 a5 a6 a8 a9; 
run; 

* sort the dataset; 
proc sort data = salary_horizontal; 
    by Info_Company; 
run; 

proc print data = salary_horizontal; 
run; 

/*------------------------------------------------------------------------------*/
/* 2.1 Transform the structure from horizontal to vertical by proc transpose */
/*------------------------------------------------------------------------------*/
proc transpose data = salary 
 out = first
 prefix = Position; 
 by Info_Company; 
 var Job_Statistician Job_Programmer Job_Manager; 
run; 

proc transpose data = salary 
 out = second
 prefix = Salary; 
 by Info_Company; 
 var Salary_Stat Salary_Pro Salary_Man; 
run; 

* merge the first and second; 
* sort before you merge. but it's already been sorted; 
data salary_ver_transpose; 
 merge first second; 
 by Info_Company; 
 drop _NAME_; 
 rename  Position1 = Position
 Salary1 = Salary
 ; 
run; 

/*----------------------------------------------------------*/
/* 2.2 Transform back to horizontal by proc transpose     */
/*----------------------------------------------------------*/
proc transpose data = salary_ver_transpose
 out = third
 prefix = Position; 
 by Info_Company; 
 var Position; 
run; 

proc transpose data = salary_ver_transpose
 out = fourth
 prefix = Salary;
 by Info_Company; 
 var Salary; 
run; 

* sort before you merge. but it's already been sorted; 
data salary_hor_transpose; 
 keep Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3; 
 format Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3; 
 merge third fourth; 
 by Info_Company;
 rename  Position1 = Job_Statistician
  Salary1 = Salary_Stat
  Position2 = Job_Programmer
  Salary2 = Salary_Pro
  Position3 = Job_Manager
  Salary3 = Salary_Man
 ; 
run; 


Good job!
Now you've got the power to save the world. Go! Transformers!

No comments:

Post a Comment