swirl学习之五——Missing Values // 小默的博客


| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

 1: Basic Building Blocks      2: Workspace and Files     
 3: Sequences of Numbers       4: Vectors                 
 5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                   
 9: Functions                 10: lapply and sapply       
11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times         
15: Base Graphics             

Selection: 5

  |                                                        |   0%

| Missing values play an important role in statistics and data
| analysis. Often, missing values must not be ignored, but rather
| they should be carefully studied to see if there's an
| underlying pattern or cause for their missingness.

...

  |===                                                     |   5%

| In R, NA is used to represent any value that is 'not available'
| or 'missing' (in the statistical sense). In this lesson, we'll
| explore missing values further.

...

  |======                                                  |  11%

| Any operation involving NA generally yields NA as the result.
| To illustrate, let's create a vector c(44, NA, 5, NA) and
| assign it to a variable x.

> x<-c(44,NA,5,NA)

| Perseverance, that's the answer.

  |=========                                               |  16%

| Now, let's multiply x by 3.

> x*3
[1] 132  NA  15  NA

| That's a job well done!

  |============                                            |  21%

| Notice that the elements of the resulting vector that
| correspond with the NA values in x are also NA.

...

  |===============                                         |  26%

| To make things a little more interesting, lets create a vector
| containing 1000 draws from a standard normal distribution with
| y <- rnorm(1000).

> y<-rnorm(1000)

| That's the answer I was looking for.

  |==================                                      |  32%

| Next, let's create a vector containing 1000 NAs with z <-
| rep(NA, 1000).

> z<-rep(NA,1000)

| You nailed it! Good job!

  |=====================                                   |  37%

| Finally, let's select 100 elements at random from these 2000
| values (combining y and z) such that we don't know how many NAs
| we'll wind up with or what positions they'll occupy in our
| final vector -- my_data <- sample(c(y, z), 100).

> my_data<-sample(c(y,z),100)

| You are doing so well!

  |========================                                |  42%

| Let's first ask the question of where our NAs are located in
| our data. The is.na() function tells us whether each element of
| a vector is NA. Call is.na() on my_data and assign the result
| to my_na.

> my_na<-is.na(my_data)

| All that practice is paying off!

  |===========================                             |  47%

| Now, print my_na to see what you came up with.

> my_na
  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE
 [11]  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE
 [21] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [31]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [41] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE
 [51] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
 [61] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [71]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE  TRUE
 [81] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [91] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

| All that hard work is paying off!

  |=============================                           |  53%

| Everywhere you see a TRUE, you know the corresponding element
| of my_data is NA. Likewise, everywhere you see a FALSE, you
| know the corresponding element of my_data is one of our random
| draws from the standard normal distribution.

...

  |================================                        |  58%

| In our previous discussion of logical operators, we introduced
| the `==` operator as a method of testing for equality between
| two objects. So, you might think the expression my_data == NA
| yields the same results as is.na(). Give it a try.

> my_data==NA
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [81] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Keep up the great work!

  |===================================                     |  63%

| The reason you got a vector of all NAs is that NA is not really
| a value, but just a placeholder for a quantity that is not
| available. Therefore the logical expression is incomplete and R
| has no choice but to return a vector of the same length as
| my_data that contains all NAs.

...

  |======================================                  |  68%

| Don't worry if that's a little confusing. The key takeaway is
| to be cautious when using logical expressions anytime NAs might
| creep in, since a single NA value can derail the entire thing.

...

  |=========================================               |  74%

| So, back to the task at hand. Now that we have a vector, my_na,
| that has a TRUE for every NA and FALSE for every numeric value,
| we can compute the total number of NAs in our data.

...

  |============================================            |  79%

| The trick is to recognize that underneath the surface, R
| represents TRUE as the number 1 and FALSE as the number 0.
| Therefore, if we take the sum of a bunch of TRUEs and FALSEs,
| we get the total number of TRUEs.

...

  |===============================================         |  84%

| Let's give that a try here. Call the sum() function on my_na to
| count the total number of TRUEs in my_na, and thus the total
| number of NAs in my_data. Don't assign the result to a new
| variable.

> sum(my_na)
[1] 53

| All that practice is paying off!

  |==================================================      |  89%

| Pretty cool, huh? Finally, let's take a look at the data to
| convince ourselves that everything 'adds up'. Print my_data to
| the console.

> my_data
  [1]           NA           NA           NA           NA
  [5]  0.124769189           NA           NA  0.692392963
  [9] -1.746465523           NA           NA -0.821663967
 [13]           NA -0.580694318 -1.511836462  0.081071870
 [17]           NA           NA           NA  1.097226579
 [21] -3.126426132           NA -1.199908058 -0.794525073
 [25]           NA -0.443946101           NA           NA
 [29]           NA           NA           NA  0.742624944
 [33]           NA           NA -1.634124579           NA
 [37] -0.850173971  0.441734720  0.513475081           NA
 [41] -0.368936480 -1.357784834           NA           NA
 [45]  0.007424283 -1.258690752  0.779107391 -1.419960183
 [49]           NA -0.763940473  0.450923280           NA
 [53]           NA           NA           NA           NA
 [57]  0.925643135 -0.003863920           NA           NA
 [61] -0.062849926 -1.557277905           NA           NA
 [65]           NA           NA           NA           NA
 [69]           NA           NA           NA -0.284868951
 [73]           NA           NA  0.056676275  0.240678898
 [77]           NA           NA -0.432834665           NA
 [81]  0.784445940           NA           NA -1.192080644
 [85]           NA  0.768473262 -0.170659651 -1.795948523
 [89]  1.249158629 -0.723159498 -0.460614065  0.238104108
 [93] -1.025906852           NA           NA           NA
 [97]  0.982965761 -0.084049625 -0.102720652  0.552020816

| You are really on a roll!

  |=====================================================   |  95%

| Now that we've got NAs down pat, let's look at a second type of
| missing value -- NaN, which stands for 'not a number'. To
| generate NaN, try dividing (using a forward slash) 0 by 0 now.

> 0/0
[1] NaN

| You are amazing!

  |========================================================| 100%

| Let's do one more, just for fun. In R, Inf stands for infinity.
| What happens if you subtract Inf from Inf?

> Inf-Inf
[1] NaN

| You are amazing!

| Are you currently enrolled in the Coursera course associated
| with this lesson?

1: Yes
2: No

Selection: 2

| You've reached the end of this lesson! Returning to the main
| menu...