Calculate Medians, IF Conditions
Example 7 - Calculating medians (or any percentile) on-the-fly
Other features illustrated: Column formulas and IF conditions
Background: The DataFerrett spreadsheet allows you to calculate medians (or any percentile) on-the-fly by
creating your own bins (microdata) or utilizing income distribution counts (aggregate data), and using standard formulas to
calculate the linear percentile you have defined within a given distribution.
Goal: Create a table that calculates the median income for age groups (for the population 15 and over)
using the March CPS supplement(Annual Social and Economic Supplement)
- Open the Ferrett Tabulation File named cps_income_percentile+mean.ftf
- "Go Get Data" to see the tabulation.
- This tabulation calculates the median (the 50th percentile) which is defined in column 2 (C2) as 0.5. This value can be changed to any percentile you want (number greater than 0 and less than 1, e.g 0.25 for first quartile). To change the value, click in the gray C2 column header. This will show the formula in the formula bar. Edit the formula.
- This table uses the 2006 March file (which reports income for 2005). To calculate for a different year, leave the spreadsheet window open and go to Step 1 in the DataFerrett window to select a different year. Then go back to the spreadsheet window and "Go Get Data".
- You can also calculate for different ages or other variables. To do so, you either select new variables or create new recodes (while the spreadsheet window is open). New variables or recodes will become available in the spreadsheet window. Then, go to "Edit - Clear > All rows" to clear the existing age variable. Then drop your variable(s) into column 1 and "Go Get Data".
- This table's universe eliminates income values of zero through the Advanced SQL Option in the Step 2 tab of DataFerrett. If you wanted to include these values, or if you wanted to determine the medians for only positive income (current table includes negative income), you can do so in the Advanced SQL Option. The existing SQL universe is
(AGE1 in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17) and ((PTOTVAL between -999999 and -1) or (PTOTVAL between 1 and 999999)))
If you wanted to include the zero PTOTVAL values, you could simply remove everything related to PTOTVAL like:
(AGE1 in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17))
Or if you wanted to use only positive income values, change the universe to: (AGE1 in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17) and (PTOTVAL between 1 and 999999))
These percentile calculations are not statistically accurate and will not match published numbers exactly, although they should be within the confidence interval.