Example 4 - Spreadsheet functions - Ranking, weighted ranking, order of formula processing.
Other features illustrated: Summing calculation, hiding columns, creating column spanners, viewing hidden columns, IF conditions
Background: This example shows how you can use ranking
and apply weights to those ranks in order to answer questions like
"What is the best county for ...?"
Goals: 1) Create a table that will rank the best
county in the US in which to live, based on our 3 criteria - commute
time, median household income, and median housing value. 2) Limit our
rankings to counties that meet our population size criteria - between
50,000 and 150,000 people.
Steps:
Goal 1
- Open the Ferrett Session File (or saved data basket) named sf3_bestcounty_tolive.fsf .
- Go to the Step 2 tab to see the variables in the data basket -
number of workers not working at home, aggregate travel time to work,
median household income, median value of owner-occupied housing units,
and county (all counties in US).
- "Make a Table"
- Turn off the "Automatic Totals" (in the "Options" menu) and drag the county variable to column 1 to define the rows.
- Drag the aggregate travel time variable, P033001, to column 2 - C2, R1.
- Drag the workers variable, P031002, to column 3 - C3, R1.
- Skip 2 columns and drag the median household income variable, P053001, to column 6 - C6, R1.
- Skip 2 columns and drag the median housing unit value variable, P085001, to column 9 - C9, R1.
- "Go Get Data"
Create calculations and rankings.
- Calculate the average commute time in column 4.
- Click in the C4 column header to select the column in which to create the calculation.
- In the formula bar, enter =comp(c2/c3) == aggregate travel time divided by workers not working at home equals average travel time.
- Create the county ranking for commute time to work. The lower the commute time, the higher the rank number.
- Click the C5 column header to select the column in which to create the ranking.
- In the formula bar, enter =rank(c4)
- Create the county ranking for median household income. In this
case we want the higher value to have the higher rank number. Therefore
we will need to create an inverse income column on which to create the
ranking.
- Click the C7 column header to select the column in which to create the calculation for inverting the income.
- In the formula bar, enter =comp(0-c6)
- Click the C8 column header to select the column in which to create the ranking.
- In the formula bar, enter =rank(c7)
- Create the county ranking for median housing unit value. The
lower the median value, the higher the rank number. If for some reason
you were to want to rank the counties higher based on HIGH housing
values, then you would need to create an inverted housing value column
like we did for income.
- Click the C10 column header to select the column in which to create the ranking.
- In the formula bar, enter =rank(c9)
Hide Columns
Since several of our columns were used to
create our desired columns, we can hide them. In our example, we do not
need to see the columns used to calculate the commute time or the
inverted income column used for ranking income.
- Click the C2 column header to select the column you wish to hide, then select the "Edit" menu, then select "Hide".
- Repeat for columns 3 and 7. Currently you can only highlight one column at a time.
- If you ever want to see the hidden columns, select the "View" menu, then select "Hidden Rows/Columns".
- If you ever want to change a hidden column back to not being
hidden (unhide), first view the hidden columns as in the previous step,
then highlight the column, then select the "Edit" menu and select
"Unhide". The "Unhide" selection allows you to unhide only the selected
column, or unhide all hidden columns.
Create Custom Column Spanners
You can create your own
custom column spanners to group columns together. In our example, we
can create spanners to group the three different measures and their
rankings together.
- Select the "Edit" menu, then select "Column Spanners". This opens
the "Column Spanner Editing Dialog" box. There are three fields for you
to supply - 1) the text for the spanner, 2) the beginning column, and
3) the ending column. See the image.
Screenshot of column spanner creation:
- Hit the "Insert" button at the bottom to enter the first spanner.
- Click in the empty "Column Spanner" field and type in your first spanner - Commuting - then hit the "Enter" key.
- Click in the empty "From" column and enter the column number the spanner should start with - 2 - then hit the "Enter" key.
- Click in the empty "To" column and enter the column number where the spanner should end - 5 - then hit the "Enter" key.
- Hit the "Insert" button at the bottom to enter another spanner.
- Enter the information as show in the image above.
- Once the three spanners have been entered, hit the "OK" button.
Create the Combined Ranking
Now we want to sum all three rankings together to find which counties
rank the best based on the three criteria. The summed rankings will
show us which counties have the shortest commute times, the highest
median income, and the lowest median housing values.
- Click in the C11 column header to select the column in which to create the summed ranking.
- In the formula bar, enter =sum(c5,c8,c10) == this sums the rankings together.
- A small dialog box will appear showing the computations being done and the last one shows a conflict. See the image.
Screenshot of calculation sequence conflict dialog:
The computations must happen in order, or they cannot be completed
properly. In this case, all previous ranking functions must be
completed prior to our summing the ranks together, therefore we want to
change the conflicting computation's sequence.
- Click in the "Sequence" cell of the "Conflict" row and change
the 0 to a 2 and hit the "Enter" key. This will make this computation
occur last.
- Hit the "OK" button.
To see which county ranks the highest or lowest, highlight column 11 and use the sort buttons in the toolbar.
Adding Weights to the Ranking Criteria
We can customize
our final ranking by weighting the criteria that are of more importance
to us. For example, if the commute time is the most important criterion
to us, we can weight it higher than the others. The weights must add up
to 1. We will weight commute time as 0.5, and the other two as 0.25
each.
- Unsort the table by hitting the "Return to original order" button next to the sort buttons on the toolbar.
- Click in the C12 column header to select the column in which to create the weighted ranking.
- In the formula bar, enter =comp((c5*0.5)+(c8*0.25)+(c10*0.25)) == this weights the ranks and sums them together.
- We notice that we get all 0s in our table. We have a sequence
conflict again, but we were not notified the this time. We can change
the sequence manually.
- Go to the "Options" menu and select the "Specify Sequence"
option. This will bring up the dialog we saw before. Now, we want to
change the last two calculations to have a sequence of 2, making sure
to hit the "Enter" key after each change. See the image.
Screenshot of calculation sequence specification dialog:
Once the sequence is changed, the weighted rankings appear. You can
easily change these weighted rankings by changing the weights in the
formula for column 12. To edit a formula, simply highlight the column
and the formula should appear in the formula bar. Edit the formula and
hit the "Enter" key and the changes will be applied.
You can sort the table to find the best and worst counties based on your weighted ranks.
Goal 2
If we only want to see the best counties of a certain size, we can
include the population variable and then use an IF condition in a
formula to remove counties from the top of the list.
- DO NOT CLOSE your spreadsheet window.
- Go back to the Step 1 tab and find the P1. Total Population table in Summary File 3.
- Get the variable, and add the P001001 variable to your data basket.
- Now go back to the spreadsheet window and you will see the P001001 variable in your variable list.
- Drag P001001 into column 13. Hit the "Go Get Data" button.
- Click in the C14 column header to select the column in which to create the weighted ranking for counties of a certain size.
- In the formula bar, enter =IF(((C13>50000)AND(C13<250001)),C12,3500)
== this condition will use the weighted rank (C12) if the county
population is between 50,001 and 250,000 and if the population is
smaller or greater than that, it will set the rank to 3500 (an
arbitrary number higher than the highest county rank).
- We notice that we get all 0s and 3500s in our table. We have
a sequence conflict again, and again we were not notified. We must
change the sequence manually.
- Go to the "Options" menu and select the "Specify Sequence"
option. This will bring up the dialog we saw before. Now, we want to
change the last three calculations - two to have a sequence of 2, and
the last one to a sequence of 3, making sure to hit the "Enter" key
after each change. See the image.
Screenshot of calculation sequence specification dialog:
Once the sequence is changed, we now can sort the list and see the best
counties based on the weighted criteria we used and limiting the
counties to the size we specified.