Thursday, May 9, 2013

QC by proc format

Image and video hosting by TinyPic
Quality Control is a very important part of SAS programmer's job. Sometimes we want to find the unexpected values hidden in the data set. Proc Format would be a good way to go.
Suppose we have a lab test data set "eg" here:
Image and video hosting by TinyPic
We have the range for the variable BASOS, HCT and HGB. If the value goes beyond the range we know, we need to flag it out.
Here is the simple way to do by proc format:
proc format; 
    value basos
        low - 1, 
        6 - high = 99
        other = 1; 
    value hct
        low - 20, 
        60 - high = 99
        other = 1; 
    value hgb
        low - 5, 
        20 - high = 99
        other = 1; 
run; 
We can either use a new variable diag to flag the unexpected value, or use proc freq.
1. The "egDiag":
data egDiag;
    set eg; 
    diag = (max(put(basos, basos.), put(hct, hct.), put(hgb, hgb.)) >= 99); 
run; 
Image and video hosting by TinyPic
2. Proc Freq:
proc freq data = eg; 
    tables basos hct hgb; 
    format basos basos. 
           hct hct.
           hgb hgb.
    ; 
run; 
Image and video hosting by TinyPic

Wednesday, May 1, 2013

Yearcutoff

Image and video hosting by TinyPic
If you are inputing two digits years, you may encounter the option "yearcutoff".
How to control the prefix? Like if you have two digit year number "18", how can you get "1918" rather than "2018"?
First, let's check the option "yearcutoff".
Here is a good explaination:
Before you use the YEARCUTOFF= system option, examine the dates in your data:
  • If the dates in your data fall within a 100-year span, you can use the YEARCUTOFF= system option.
  • If the dates in your data do not fall within a 100-year span, you must either convert the two-digit years to to four-digit years or use a DATA step with conditional logic to assign the proper century prefix.
Once you've determined that the YEARCUTOFF= system option is appropriate for your range of data, you can determine the setting to use. The best setting for YEARCUTOFF= is a year just slightly lower than the lowest year in your data. For example, if you have data in a range from 1921 to 1999, set YEARCUTOFF= to 1920, if that is not already your system default. The result of setting YEARCUTOFF= to 1920 is that
  • SAS interprets all two-digit dates in the range of 20 through 99 as 1920 through 1999.
  • SAS interprets all two-digit dates in the range 00 through 19 as 2000 through 2019.
The following figure shows the span of years when the YEARCUTOFF= option is set to a value of 1920. The 100-year span in this case is from 1920 to 2019.
Span of Years When the YEARCUTOFF= Option Is Set to 1920
[IMAGE]
With YEARCUTOFF= set to 1920, a two-digit year of 10 would be interpreted as 2010, and a two-digit year of 22 would be interpreted as 1922.
Learn by example
Now let's code. 
/* macro variable to store the path */
%let path=C:\Documents and Settings\Administrator\My Documents\Dropbox\Transfer\tech post\yearcutoff; 

options yearcutoff = 1820; 

proc options option = yearcutoff; 
run; 

data year_1820; 
 input month day year; 
 bday = mdy(month, day, year); 
 format bday date9.; 
cards; 
7 11 18
7 11 48
1 1 60 
; 
run; 

options yearcutoff = 1920; 

proc options option = yearcutoff; 
run; 

data year_1920; 
 input month day year; 
 bday = mdy(month, day, year); 
 format bday date9.; 
cards; 
7 11 18
7 11 48
1 1 60 
; 
run; 
Result
Image and video hosting by TinyPic
Image and video hosting by TinyPic