swirl学习之五——Missing Values


| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files
3: Sequences of Numbers 4: Vectors
5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic
9: Functions 10: lapply and sapply
11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times
15: Base Graphics

Selection: 5

| | 0%

| Missing values play an important role in statistics and data
| analysis. Often, missing values must not be ignored, but rather
| they should be carefully studied to see if there's an
| underlying pattern or cause for their missingness.

...

|=== | 5%

| In R, NA is used to represent any value that is 'not available'
| or 'missing' (in the statistical sense). In this lesson, we'll
| explore missing values further.

...

|====== | 11%

| Any operation involving NA generally yields NA as the result.
| To illustrate, let's create a vector c(44, NA, 5, NA) and
| assign it to a variable x.

> x<-c(44,NA,5,NA)

| Perseverance, that's the answer.

|========= | 16%

| Now, let's multiply x by 3.

> x*3
[1] 132 NA 15 NA

| That's a job well done!

|============ | 21%

| Notice that the elements of the resulting vector that
| correspond with the NA values in x are also NA.

...

|=============== | 26%

| To make things a little more interesting, lets create a vector
| containing 1000 draws from a standard normal distribution with
| y <- rnorm(1000).

> y<-rnorm(1000)

| That's the answer I was looking for.

|================== | 32%

| Next, let's create a vector containing 1000 NAs with z <-
| rep(NA, 1000).

> z<-rep(NA,1000)

| You nailed it! Good job!

|===================== | 37%

| Finally, let's select 100 elements at random from these 2000
| values (combining y and z) such that we don't know how many NAs
| we'll wind up with or what positions they'll occupy in our
| final vector -- my_data <- sample(c(y, z), 100).

> my_data<-sample(c(y,z),100)

| You are doing so well!

|======================== | 42%

| Let's first ask the question of where our NAs are located in
| our data. The is.na() function tells us whether each element of
| a vector is NA. Call is.na() on my_data and assign the result
| to my_na.

> my_na<-is.na(my_data)

| All that practice is paying off!

|=========================== | 47%

| Now, print my_na to see what you came up with.

> my_na

[1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
[11] TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
[21] FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[31] TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
[41] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
[51] FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
[61] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[71] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE
[81] FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[91] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE

| All that hard work is paying off!

|============================= | 53%

| Everywhere you see a TRUE, you know the corresponding element
| of my_data is NA. Likewise, everywhere you see a FALSE, you
| know the corresponding element of my_data is one of our random
| draws from the standard normal distribution.

...

|================================ | 58%

| In our previous discussion of logical operators, we introduced
| the `==` operator as a method of testing for equality between
| two objects. So, you might think the expression my_data == NA
| yields the same results as is.na(). Give it a try.

> my_data==NA
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[81] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Keep up the great work!

|=================================== | 63%

| The reason you got a vector of all NAs is that NA is not really
| a value, but just a placeholder for a quantity that is not
| available. Therefore the logical expression is incomplete and R
| has no choice but to return a vector of the same length as
| my_data that contains all NAs.

...

|====================================== | 68%

| Don't worry if that's a little confusing. The key takeaway is
| to be cautious when using logical expressions anytime NAs might
| creep in, since a single NA value can derail the entire thing.

...

|========================================= | 74%

| So, back to the task at hand. Now that we have a vector, my_na,
| that has a TRUE for every NA and FALSE for every numeric value,
| we can compute the total number of NAs in our data.

...

|============================================ | 79%

| The trick is to recognize that underneath the surface, R
| represents TRUE as the number 1 and FALSE as the number 0.
| Therefore, if we take the sum of a bunch of TRUEs and FALSEs,
| we get the total number of TRUEs.

...

|=============================================== | 84%

| Let's give that a try here. Call the sum() function on my_na to
| count the total number of TRUEs in my_na, and thus the total
| number of NAs in my_data. Don't assign the result to a new
| variable.

> sum(my_na)
[1] 53

| All that practice is paying off!

|================================================== | 89%

| Pretty cool, huh? Finally, let's take a look at the data to
| convince ourselves that everything 'adds up'. Print my_data to
| the console.

> my_data
[1] NA NA NA NA
[5] 0.124769189 NA NA 0.692392963
[9] -1.746465523 NA NA -0.821663967
[13] NA -0.580694318 -1.511836462 0.081071870
[17] NA NA NA 1.097226579
[21] -3.126426132 NA -1.199908058 -0.794525073
[25] NA -0.443946101 NA NA
[29] NA NA NA 0.742624944
[33] NA NA -1.634124579 NA
[37] -0.850173971 0.441734720 0.513475081 NA
[41] -0.368936480 -1.357784834 NA NA
[45] 0.007424283 -1.258690752 0.779107391 -1.419960183
[49] NA -0.763940473 0.450923280 NA
[53] NA NA NA NA
[57] 0.925643135 -0.003863920 NA NA
[61] -0.062849926 -1.557277905 NA NA
[65] NA NA NA NA
[69] NA NA NA -0.284868951
[73] NA NA 0.056676275 0.240678898
[77] NA NA -0.432834665 NA
[81] 0.784445940 NA NA -1.192080644
[85] NA 0.768473262 -0.170659651 -1.795948523
[89] 1.249158629 -0.723159498 -0.460614065 0.238104108
[93] -1.025906852 NA NA NA
[97] 0.982965761 -0.084049625 -0.102720652 0.552020816

| You are really on a roll!

|===================================================== | 95%

| Now that we've got NAs down pat, let's look at a second type of
| missing value -- NaN, which stands for 'not a number'. To
| generate NaN, try dividing (using a forward slash) 0 by 0 now.

> 0/0
[1] NaN

| You are amazing!

|========================================================| 100%

| Let's do one more, just for fun. In R, Inf stands for infinity.
| What happens if you subtract Inf from Inf?

> Inf-Inf
[1] NaN

| You are amazing!

| Are you currently enrolled in the Coursera course associated
| with this lesson?

1: Yes
2: No

Selection: 2

| You've reached the end of this lesson! Returning to the main
| menu...