swirl学习之六——Subsetting Vectors // 小默的博客


| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files     
 3: Sequences of Numbers       4: Vectors                 
 5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                   
 9: Functions                 10: lapply and sapply       
11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times         
15: Base Graphics             

Selection: 6

  |                                                        |   0%

| In this lesson, we'll see how to extract elements from a vector
| based on some conditions that we specify.

...

  |=                                                       |   3%

| For example, we may only be interested in the first 20 elements
| of a vector, or only the elements that are not NA, or only
| those that are positive or correspond to a specific variable of
| interest. By the end of this lesson, you'll know how to handle
| each of these scenarios.

...

  |===                                                     |   5%

| I've created for you a vector called x that contains a random
| ordering of 20 numbers (from a standard normal distribution)
| and 20 NAs. Type x now to see what it looks like.

> x
 [1]          NA  1.01612351  0.17390520          NA -0.62466706
 [6]          NA -2.57269671          NA -0.44002462          NA
[11]  0.37101633  0.65818630  1.03885003  0.16175551          NA
[16] -0.32999611          NA          NA          NA  0.40024254
[21]          NA  0.53018587          NA          NA          NA
[26]          NA          NA  0.28211580 -0.04009442          NA
[31]  0.79493463  0.60598426          NA -1.42021598          NA
[36]  0.17550349  0.39153186          NA  1.07989501          NA

| You are really on a roll!

  |====                                                    |   8%

| The way you tell R that you want to select some particular
| elements (i.e. a 'subset') from a vector is by placing an
| 'index vector' in square brackets immediately following the
| name of the vector.

...

  |======                                                  |  11%

| For a simple example, try x[1:10] to view the first ten
| elements of x.

> x[1:10]
 [1]         NA  1.0161235  0.1739052         NA -0.6246671
 [6]         NA -2.5726967         NA -0.4400246         NA

| All that hard work is paying off!

  |=======                                                 |  13%

| Index vectors come in four different flavors -- logical
| vectors, vectors of positive integers, vectors of negative
| integers, and vectors of character strings -- each of which
| we'll cover in this lesson.

...

  |=========                                               |  16%

| Let's start by indexing with logical vectors. One common
| scenario when working with real-world data is that we want to
| extract all elements of a vector that are not NA (i.e. missing
| data). Recall that is.na(x) yields a vector of logical values
| the same length as x, with TRUEs corresponding to NA values in
| x and FALSEs corresponding to non-NA values in x.

...

  |==========                                              |  18%

| What do you think x[is.na(x)] will give you?

1: A vector of length 0
2: A vector of TRUEs and FALSEs
3: A vector with no NAs
4: A vector of all NAs

Selection: 4

| You are amazing!

  |============                                            |  21%

| Prove it to yourself by typing x[is.na(x)].

> x[is.na(x)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Excellent job!

  |=============                                           |  24%

| Recall that `!` gives us the negation of a logical expression,
| so !is.na(x) can be read as 'is not NA'. Therefore, if we want
| to create a vector called y that contains all of the non-NA
| values from x, we can use y <- x[!is.na(x)]. Give it a try.

> y<-x[!is.na(x)]

| Your dedication is inspiring!

  |===============                                         |  26%

| Print y to the console.

> y
 [1]  1.01612351  0.17390520 -0.62466706 -2.57269671 -0.44002462
 [6]  0.37101633  0.65818630  1.03885003  0.16175551 -0.32999611
[11]  0.40024254  0.53018587  0.28211580 -0.04009442  0.79493463
[16]  0.60598426 -1.42021598  0.17550349  0.39153186  1.07989501

| Keep working like that and you'll get there!

  |================                                        |  29%

| Now that we've isolated the non-missing values of x and put
| them in y, we can subset y as we please.

...

  |==================                                      |  32%

| Recall that the expression y > 0 will give us a vector of
| logical values the same length as y, with TRUEs corresponding
| to values of y that are greater than zero and FALSEs
| corresponding to values of y that are less than or equal to
| zero. What do you think y[y > 0] will give you?

1: A vector of all NAs
2: A vector of all the positive elements of y
3: A vector of length 0
4: A vector of all the negative elements of y
5: A vector of TRUEs and FALSEs

Selection: 2

| All that hard work is paying off!

  |===================                                     |  34%

| Type y[y > 0] to see that we get all of the positive elements
| of y, which are also the positive elements of our original
| vector x.

> y[y>0]
 [1] 1.0161235 0.1739052 0.3710163 0.6581863 1.0388500 0.1617555
 [7] 0.4002425 0.5301859 0.2821158 0.7949346 0.6059843 0.1755035
[13] 0.3915319 1.0798950

| You're the best!

  |=====================                                   |  37%

| You might wonder why we didn't just start with x[x > 0] to
| isolate the positive elements of x. Try that now to see why.

> x[x>0]
 [1]        NA 1.0161235 0.1739052        NA        NA        NA
 [7]        NA 0.3710163 0.6581863 1.0388500 0.1617555        NA
[13]        NA        NA        NA 0.4002425        NA 0.5301859
[19]        NA        NA        NA        NA        NA 0.2821158
[25]        NA 0.7949346 0.6059843        NA        NA 0.1755035
[31] 0.3915319        NA 1.0798950        NA

| Keep up the great work!

  |======================                                  |  39%

| Since NA is not a value, but rather a placeholder for an
| unknown quantity, the expression NA > 0 evaluates to NA. Hence
| we get a bunch of NAs mixed in with our positive numbers when
| we do this.

...

  |========================                                |  42%

| Combining our knowledge of logical operators with our new
| knowledge of subsetting, we could do this -- x[!is.na(x) & x >
| 0]. Try it out.

> x[!is.na(x)&x>0]
 [1] 1.0161235 0.1739052 0.3710163 0.6581863 1.0388500 0.1617555
 [7] 0.4002425 0.5301859 0.2821158 0.7949346 0.6059843 0.1755035
[13] 0.3915319 1.0798950

| You got it right!

  |=========================                               |  45%

| In this case, we request only values of x that are both
| non-missing AND greater than zero.

...

  |===========================                             |  47%

| I've already shown you how to subset just the first ten values
| of x using x[1:10]. In this case, we're providing a vector of
| positive integers inside of the square brackets, which tells R
| to return only the elements of x numbered 1 through 10.

...

  |============================                            |  50%

| Many programming languages use what's called 'zero-based
| indexing', which means that the first element of a vector is
| considered element 0. R uses 'one-based indexing', which (you
| guessed it!) means the first element of a vector is considered
| element 1.

...

  |=============================                           |  53%

| Can you figure out how we'd subset the 3rd, 5th, and 7th
| elements of x? Hint -- Use the c() function to specify the
| element numbers as a numeric vector.

> x[c(3,5,7)]
[1]  0.1739052 -0.6246671 -2.5726967

| Keep up the great work!

  |===============================                         |  55%

| It's important that when using integer vectors to subset our
| vector x, we stick with the set of indexes {1, 2, ..., 40}
| since x only has 40 elements. What happens if we ask for the
| zeroth element of x (i.e. x[0])? Give it a try.

> x[0]
numeric(0)

| You're the best!

  |================================                        |  58%

| As you might expect, we get nothing useful. Unfortunately, R
| doesn't prevent us from doing this. What if we ask for the
| 3000th element of x? Try it out.

> x[3000]
[1] NA

| Great job!

  |==================================                      |  61%

| Again, nothing useful, but R doesn't prevent us from asking for
| it. This should be a cautionary tale. You should always make
| sure that what you are asking for is within the bounds of the
| vector you're working with.

...

  |===================================                     |  63%

| What if we're interested in all elements of x EXCEPT the 2nd
| and 10th? It would be pretty tedious to construct a vector
| containing all numbers 1 through 40 EXCEPT 2 and 10.

...

  |=====================================                   |  66%

| Luckily, R accepts negative integer indexes. Whereas x[c(2,
| 10)] gives us ONLY the 2nd and 10th elements of x, x[c(-2,
| -10)] gives us all elements of x EXCEPT for the 2nd and 10
| elements.  Try x[c(-2, -10)] now to see this.

> x[c(-2,-10)]
 [1]          NA  0.17390520          NA -0.62466706          NA
 [6] -2.57269671          NA -0.44002462  0.37101633  0.65818630
[11]  1.03885003  0.16175551          NA -0.32999611          NA
[16]          NA          NA  0.40024254          NA  0.53018587
[21]          NA          NA          NA          NA          NA
[26]  0.28211580 -0.04009442          NA  0.79493463  0.60598426
[31]          NA -1.42021598          NA  0.17550349  0.39153186
[36]          NA  1.07989501          NA

| Excellent job!

  |======================================                  |  68%

| A shorthand way of specifying multiple negative numbers is to
| put the negative sign out in front of the vector of positive
| numbers. Type x[-c(2, 10)] to get the exact same result.

> x[-c(2,10)]
 [1]          NA  0.17390520          NA -0.62466706          NA
 [6] -2.57269671          NA -0.44002462  0.37101633  0.65818630
[11]  1.03885003  0.16175551          NA -0.32999611          NA
[16]          NA          NA  0.40024254          NA  0.53018587
[21]          NA          NA          NA          NA          NA
[26]  0.28211580 -0.04009442          NA  0.79493463  0.60598426
[31]          NA -1.42021598          NA  0.17550349  0.39153186
[36]          NA  1.07989501          NA

| All that hard work is paying off!

  |========================================                |  71%

| So far, we've covered three types of index vectors -- logical,
| positive integer, and negative integer. The only remaining type
| requires us to introduce the concept of 'named' elements.

...

  |=========================================               |  74%

| Create a numeric vector with three named elements using vect <-
| c(foo = 11, bar = 2, norf = NA).

> vect<-c(foo=11,bar=2,norf=NA)

| Great job!

  |===========================================             |  76%

| When we print vect to the console, you'll see that each element
| has a name. Try it out.

> vect
 foo  bar norf 
  11    2   NA 

| You are amazing!

  |============================================            |  79%

| We can also get the names of vect by passing vect as an
| argument to the names() function. Give that a try.

> names(vect)
[1] "foo"  "bar"  "norf"

| You are quite good my friend!

  |==============================================          |  82%

| Alternatively, we can create an unnamed vector vect2 with c(11,
| 2, NA). Do that now.

> vect2<-c(11,2,NA)

| That's the answer I was looking for.

  |===============================================         |  84%

| Then, we can add the `names` attribute to vect2 after the fact
| with names(vect2) <- c("foo", "bar", "norf"). Go ahead.

> names(vect2)<-c("foo","bar","norf")

| You got it right!

  |=================================================       |  87%

| Now, let's check that vect and vect2 are the same by passing
| them as arguments to the identical() function.

> identical(vect,vect2)
[1] TRUE

| Great job!

  |==================================================      |  89%

| Indeed, vect and vect2 are identical named vectors.

...

  |====================================================    |  92%

| Now, back to the matter of subsetting a vector by named
| elements. Which of the following commands do you think would
| give us the second element of vect?

1: vect["bar"]
2: vect["2"]
3: vect[bar]

Selection: 1

| You are really on a roll!

  |=====================================================   |  95%

| Now, try it out.

> vect["bar"]
bar 
  2 

| All that hard work is paying off!

  |======================================================= |  97%

| Likewise, we can specify a vector of names with vect[c("foo",
| "bar")]. Try it out.

> vect[c("foo","bar")]
foo bar 
 11   2 

| You are doing so well!

  |========================================================| 100%

| Now you know all four methods of subsetting data from vectors.
| Different approaches are best in different scenarios and when
| in doubt, try it out!

...

| Are you currently enrolled in the Coursera course associated
| with this lesson?

1: Yes
2: No

Selection: 2

| You've reached the end of this lesson! Returning to the main
| menu...