
Why do we need "Transforming" the dataset structure?Fate rarely calls upon us at a moment of our choosing.
---Prime, Transformers Revenge of the Fallen
Lots of reasons. (it's our fate!)
Like after proc freq or means, you need to transform the statistics for various tables.
Basically:
- You may want to create multiple observations from a single observation to count frequencies, or to allow for BY variable processing.
- You may also want to restructure SAS data sets for certain statistical analyses.
Example:
Here we have a data set "salary" from another "jobs". The data set is more like "wide", horizontal structure.
First, we will "transform" it to "deep", vertical structure like this:
Secondly, we will "transform" it back to horizontal structure.
Code:
We will do the following things:
- 0. input the data
- 1.1 transform the data to vertical by array
- 1.2 transform back to horizontal by array
- 2.1 transform the data to vertical by transpose
- 2.2 transform the back to horizontal by transpose
dm 'clear output';
dm 'clear log';
/*==============================================================================================*/
/* Programe : transformer.sas */
/* Objective : Transform Data Set Structure by array and transpose */
/* Author : Ray */
/* SAS Version : 9.2 */
/*==============================================================================================*/
/*----------------------------------------*/
/* 0 Input data */
/*----------------------------------------*/
data jobs;
infile datalines missover;
input #1 Info_Company $20.
#2 Score_Personal 10.2
#3 Info_Web $50.
#4 Info_HQ $50.
#5 Info_Country $4.
#6 Info_Employees 10.
#7 Job_Statistician $50.
#8 Salary_Stat 10.2
#9 Job_Programmer $50.
#10 Salary_Pro 10.2
#11 Job_Manager $50.
#12 Salary_Man 10.2;
datalines;
ICON
60.00
www.iconplc.com
Dublin, Ireland
IRL
5000
Statistician I
SAS Programmer Analyst I
59553.00
Group Manager
90000.00
Covance
80.00
www.covance.com
Princeton, NJ
USA
11200
Bio Statistician I
59000.00
SAS Programmer I
51872.00
Manager
87029.00
Medpace
81.00
www.medpace.com
Cincinnati, OH
USA
990
Bio Statisitican I
70418.00
SAS Programmer I
40000.00
Clinical Trial Manager
84027.00
Shire Pharmaceuticals
90.00
www.shire.com
Basingstoke, United Kingdom
GBR
5000
Statistician I
SAS Programmer I
61000.00
Manager
96000.00
TechData Service
90.01
www.techdataservice.com
King of Prussia, PA
USA
5000
Statistician
SAS Programmer
86667.00
Manager
;
run;
* subset data set to salary;
data salary;
set jobs (keep = Info_Company Job_Statistician Job_Programmer Job_Manager
Salary_Stat Salary_Pro Salary_Man);
run;
* sort the salary;
proc sort data = salary;
by Info_Company;
run;
* print out;
proc print data = salary;
run;
/*--------------------------------------------------------------*/
/* 1.1 Transform the structure from horizontal to vertical */
/* In other words: from wide to deep */
/*--------------------------------------------------------------*/
* create salary_vertical;
data salary_vertical;
set salary;
array job{3} Job_Statistician Job_Programmer Job_Manager ;
array sal{3} Salary_Stat Salary_Pro Salary_Man;
do i = 1 to 3;
Position = job{i};
Salary = sal{i};
output;
end;
keep Info_Company Position Salary;
run;
* change the Salary from numeric to character;
data salary_ver;
set salary_vertical;
sal = put(Salary, 10.2);
drop Salary;
rename sal=Salary;
run;
proc print data = salary_vertical;
run;
/*----------------------------------------------------------*/
/* 1.2 Transform back to horizontal */
/*----------------------------------------------------------*/
* transform the structure from vertical back to horizontal;
data salary_horizontal;
set salary_ver;
by Info_Company;
if first.Info_Company then do;
i = 0;
array all(3, 3) $50. a1-a9;
array var{3} $50. Info_Company Position Salary;
retain a1-a9;
end;
i + 1;
do j = 1 to 3;
all(i, j) = var(j);
end;
if last.Info_Company then output;
rename a2 = Job_Statistician
a3 = Salary_Stat
a5 = Job_Programmer
a6 = Salary_Pro
a8 = Job_Manager
a9 = Salary_Man
;
keep Info_Company a2 a3 a5 a6 a8 a9;
run;
* sort the dataset;
proc sort data = salary_horizontal;
by Info_Company;
run;
proc print data = salary_horizontal;
run;
/*------------------------------------------------------------------------------*/
/* 2.1 Transform the structure from horizontal to vertical by proc transpose */
/*------------------------------------------------------------------------------*/
proc transpose data = salary
out = first
prefix = Position;
by Info_Company;
var Job_Statistician Job_Programmer Job_Manager;
run;
proc transpose data = salary
out = second
prefix = Salary;
by Info_Company;
var Salary_Stat Salary_Pro Salary_Man;
run;
* merge the first and second;
* sort before you merge. but it's already been sorted;
data salary_ver_transpose;
merge first second;
by Info_Company;
drop _NAME_;
rename Position1 = Position
Salary1 = Salary
;
run;
/*----------------------------------------------------------*/
/* 2.2 Transform back to horizontal by proc transpose */
/*----------------------------------------------------------*/
proc transpose data = salary_ver_transpose
out = third
prefix = Position;
by Info_Company;
var Position;
run;
proc transpose data = salary_ver_transpose
out = fourth
prefix = Salary;
by Info_Company;
var Salary;
run;
* sort before you merge. but it's already been sorted;
data salary_hor_transpose;
keep Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3;
format Info_Company Position1 Salary1 Position2 Salary2 Position3 Salary3;
merge third fourth;
by Info_Company;
rename Position1 = Job_Statistician
Salary1 = Salary_Stat
Position2 = Job_Programmer
Salary2 = Salary_Pro
Position3 = Job_Manager
Salary3 = Salary_Man
;
run;
Good job! Now you've got the power to save the world. Go! Transformers!
No comments:
Post a Comment