<- list("a", 3, TRUE, FALSE, NA, NaN, 0/0)
lst is.na(lst)
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
sapply(lst, function(x) ifelse(is.nan(x), NaN, x))
[1] "a" "3" "TRUE" "FALSE" NA "NaN" "NaN"
Data isn’t clean, perfect, and ready-to-use. You always have to clean it before you can use it. This is always a data-specific and context-specific problem and solution.
When doing your analyses, you will want to be clear about the following:
NA
and NaN
values are, andNA
and NaN
values in your analysis—do you want to include them or exclude them from your calculations?Sometimes you want to examine a list to see if there are missing values. Let’s quickly define a list and test it with these functions:
<- list("a", 3, TRUE, FALSE, NA, NaN, 0/0)
lst is.na(lst)
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
sapply(lst, function(x) ifelse(is.nan(x), NaN, x))
[1] "a" "3" "TRUE" "FALSE" NA "NaN" "NaN"
<- data.frame(a=c(1, 2, 3), b=c(5, NA, NaN))
df is.na(df)
a b
[1,] FALSE FALSE
[2,] FALSE TRUE
[3,] FALSE TRUE
sapply(df, function(x) ifelse(is.nan(x), NaN, x))
a b
[1,] 1 5
[2,] 2 NA
[3,] 3 NaN
skim(df)
Name | df |
Number of rows | 3 |
Number of columns | 2 |
_______________________ | |
Column type frequency: | |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
a | 0 | 1.00 | 2 | 1 | 1 | 1.5 | 2 | 2.5 | 3 | ▇▁▇▁▇ |
b | 2 | 0.33 | 5 | NA | 5 | 5.0 | 5 | 5.0 | 5 | ▁▁▇▁▁ |
This is how you retrieve a specific observation (row) from a data frame:
6,] survey[
Year ID NPS Field ClassLevel Status Gender BirthYear FinPL
6 2012 mdoqvaalcscx 8 Undecl Sr Part-time Female 1988 Yes
FinSch FinGov FinSelf FinPar FinOther TooDifficult NotRelevant
6 Yes No Yes No No
PoorTeaching UnsuppFac Grades Sched ClassTooBig BadAdvising
6 Strongly Disagree Neutral Agree Strongly Agree <NA> Disagree
FinAid OverallValue
6 <NA> Strongly Agree
The anyNA(x)
function determines if there are any NA values in the vector:
anyNA(lst)
[1] TRUE
Here we use it on the 6th row of survey
:
anyNA(survey[6,])
[1] TRUE
The following call returns which items in the vector have the value NA
:
which(is.na(survey[6,]))
[1] 21 23
The following counts how many items in the vector have the value NA
:
sum(is.na(survey[6,]))
[1] 2