We are testing the null hypothesis that an intervention, specifically, some procedural change for the clincian, had no effect on quality of care. Specifically, we are using correctness of discharge reports as a measure of this effect. A more correct report is indicative of an improvement in treatment.
If the pre- and post- values are not significantly different, we can come to the conclusion that the observed results are likely not due to chance.
Let’s first read in our data.
#install.packages('exactRankTests')
library(exactRankTests)
## Package 'exactRankTests' is no longer under development.
## Please consider using package 'coin' instead.
anticoag = read.csv("anticoagulation.csv")
head(anticoag)
## pre_points pre_total_possible post_points post_possible
## 1 4 5 2 5
## 2 5 7 7 7
## 3 5 7 5 5
## 4 5 5 7 7
## 5 7 7 5 5
## 6 5 5 5 5
You’ll notice that I converted the .xls to a comma separated value (csv) file, and deleted the calculations that were done in Excel. We are going to do them here! Here is how we see our columns:
colnames(anticoag)
## [1] "pre_points" "pre_total_possible" "post_points"
## [4] "post_possible"
Let’s calculate a new column, the percentage (or score) for each timepoint.
anticoag$pre_percentage = anticoag$pre_points / anticoag$pre_total_possible
anticoag$post_percentage = anticoag$post_points / anticoag$post_possible
head(anticoag)
## pre_points pre_total_possible post_points post_possible pre_percentage
## 1 4 5 2 5 0.8000000
## 2 5 7 7 7 0.7142857
## 3 5 7 5 5 0.7142857
## 4 5 5 7 7 1.0000000
## 5 7 7 5 5 1.0000000
## 6 5 5 5 5 1.0000000
## post_percentage
## 1 0.4
## 2 1.0
## 3 1.0
## 4 1.0
## 5 1.0
## 6 1.0
You can see that we get better precision than we did with Excel, “better” meaning more consistent with decimal places, etc.
We want to compare the list of before percentages with after percentages to determine if they are significantly different, but some of our data has NA values! Let’s put those values into a variable, and remove the NAs as we do this:
before = anticoag$pre_percentage[!is.na(anticoag$pre_percentage)]
after = anticoag$post_percentage[!is.na(anticoag$post_percentage)]
There are tests that you can do to determine if your data is normally distributed, but sometimes it is sufficient to just look at it. I can look at the numbers (my there are a lot of 1’s!) or the plots below to see that we definitely do not have normally distributed data:
hist(before,breaks=5,col="blue",main="Pre-Intervention Anticoagulation Percentage Scores")
hist(after,breaks=5,col="pink",main="Post-Intervention Anticoagulation Percentage Scores")
We can now use a permutation test to assess for differences. This is two-sided:
perm.test(before,after,alternative="two.sided",exact=TRUE)
##
## 2-sample Permutation Test (scores mapped into 1:(m+n) using
## rounded scores)
##
## data: before and after
## T = 1202, p-value = 0.1441
## alternative hypothesis: true mu is not equal to 0
This shows us that the means are not significantly different.