rdataframemaxminr-rownames

USArrests data.frame in R - which state (row) presents the smallest and the largest crime rate (column)


I am using the USArrests data.frame in R and I need to see for each crime (Murder, Assault and Rape) which state presents the smallest and the largest crime rate. I guess I have to calculate the max and min for each crime and I have done that.

which(USArrests$Murder == min(USArrests$Murder)) [1] 34

The problem is that I cannot retrieve State in row 34, but only the whole row:

USArrests[34,] Murder Assault UrbanPop Rape North Dakota 0.8 45 44 7.3

I am just starting using R so can anyone help me please?


Solution

  • I would usually suggest taking a different approach to a problem like this but for ease I'm going to offer the following solution and maybe come back later with a more well thought out way.

    You can use the attributes() function to see particular 'attributes' of a dataframe.

    Eg:

    attributes(USArrests)
    

    will give you the following output.

    $names
    [1] "Murder"   "Assault"  "UrbanPop" "Rape"    
    
    $class
    [1] "data.frame"
    
    $row.names
     [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"       "California"     "Colorado"      
     [7] "Connecticut"    "Delaware"       "Florida"        "Georgia"        "Hawaii"         "Idaho"         
    [13] "Illinois"       "Indiana"        "Iowa"           "Kansas"         "Kentucky"       "Louisiana"     
    [19] "Maine"          "Maryland"       "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
    [25] "Missouri"       "Montana"        "Nebraska"       "Nevada"         "New Hampshire"  "New Jersey"    
    [31] "New Mexico"     "New York"       "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
    [37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina" "South Dakota"   "Tennessee"     
    [43] "Texas"          "Utah"           "Vermont"        "Virginia"       "Washington"     "West Virginia" 
    [49] "Wisconsin"      "Wyoming"     
    

    So now we know the dataframe is composed of 'names' (name of charge), 'row.names' (names of states) and that the 'class' is a dataframe. As a newcomer to R it is important to note that in the results above, the row id is only given for the first item on each new line. This will make more sense in the last step.

    Using this knowledge we can use attributes to find just the states by doing the following:

    attributes(USArrests)$row.names
    

    To find the 34th state in the list which you have identified as North Dakota, we can simply give the row id for that state, as per below.

    attributes(USArrests)$row.names[34]
    

    Which will give you....

    [1] "North Dakota"
    

    Again, this is probably not the most elegant way of doing this, but it will work for your scenario.

    Hope this helps and happy coding.

    EDIT

    As I mentioned there's usually a more elegant, performant and efficient way of doing things. Here is another such way of achieving your goal.

    row.names(USArrests)[which.min(USArrests$Murder)]
    

    You'll probably be able to see instantly what is happening here, but essentially, we're asking for the row name associated with the lowest value for the Murder charge. Again this gives...

    [1] "North Dakota"
    

    You can now apply this logic to find the states with the max & min crime rates for each offence. Eg, for max Assaults

    row.names(USArrests)[which.max(USArrests$Assault)]
    

    Giving...

    [1] "North Carolina"