Abstract
Two population difference in means.A maker of high-fiber cereal is looking for additional material to market its product. It already uses the “healthy image of high- fiber” but wants to exploit the issue even more. Its officers believe that people who eat high-fiber cereal for breakfast consume, on average, fewer calories than people who do not eat high-fiber cereal for breakfast. To test their claim, the company interviewed 150 people, identified each person as a consumer or a nonconsumer of high-fiber cereal, and recorded the number of calories each person consumed at lunch. The results of the interview are stored in this Excel file. Can this high fiber producer claim that people eating high-fiber cereal for breakfast consume fewer calories for lunch? For this problem assume that you are dealing with two populations with unequal variances.
What is the relevant point estimate?
This test statistic has _______ distribution. What is the value of the test statistic?
What are the degrees of freedom associated with this test? (round to the nearest integer)
What is the p-value for this test?
Allowing for a 5% chance of a Type I error, what is your conclusion for this test?
First of all, you need to import the Excel file. To do that, identify your directory and place the .xls inside of it. After that, you need to call the package xlsx
and use the function read.xlsx
.
#### Using getwd() to indentify the current directory
getwd()
## [1] "C:/Users/User/OneDrive - UIUC/OneDrive - University of Illinois - Urbana/Semestre 6/Econ203 Spring 2020/labmd"
#### Setting up my directory
setwd("C:/Users/User/Desktop/")
#### If you don't have the package xlsx, start with install.packages("xlsx")
require(xlsx)
fiber<-read.xlsx("fiberv3.xls", sheetName = "Sheet1", as.data.frame = T, header = T)
head(fiber,5)
## Consumers Nonconsumers
## 1 568 705
## 2 978 819
## 3 589 706
## 4 681 509
## 5 765 613
The t.test
function will give everything to you.
ttest<-t.test(fiber$Consumers, fiber$Nonconsumers, alternative = "less")
ttest
##
## Welch Two Sample t-test
##
## data: fiber$Consumers and fiber$Nonconsumers
## t = 0.10913, df = 84.444, p-value = 0.5433
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 32.42525
## sample estimates:
## mean of x mean of y
## 632.1456 630.1489
The point estimate is \(\bar{X}_{1}-\bar{X}_{2}\). R
gave to you mean of x (\(\bar{X}_{1}\)) and mean of y (\(\bar{X}_{2}\)). \(df=84\) (rounding to the nearest integer) and the \(p-value\) is equal to \(.54\).
Another option is to extract each information individually:
names(ttest)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate" "null.value" "stderr" "alternative" "method" "data.name"
pointest<-ttest$estimate[1]-ttest$estimate[2]
pointest
## mean of x
## 1.996695
testat<-ttest$statistic
testat
## t
## 0.1091311
pvalue<-ttest$p.value
pvalue
## [1] 0.5433213
#### So, no need to do 1-ttest$p.value (WHY?)
what_to_do<-ifelse(pvalue<0.05, "reject", "don't")
what_to_do
## [1] "don't"
As a new intern at Yellow Pages, your job is to call retailers and encourage them to advertise with the company in the future. You obtain from your manager a random sampling of 52 different firms that advertised with the Yellow Pages this year, but not last year. You are interested in comparing their annual sales figures to see whether there is a clear increase in sales this year vs. last year. Data can be found in this Excel file (in thousands of dollars).
What is the relevant point estimate?
What are the degrees of freedom for your test?
What is the value of the test statistic?
What is the p-value for this test?
Allowing for a 5% chance of a Type I error, what is your conclusion for this test?
Now create a 95% confidence interval for the population difference between the mean sales with Yellow Pages advertising and without. The 95% confidence interval for the population mean difference is bounded by: _________ and _________.
Again, first step is to read the .xls file:
setwd("C:/Users/User/Desktop/")
yellowp<-read.xlsx("yellow_pagesv4.xls", sheetName = "Sheet1", as.data.frame = T, header = T)
Use t.test function again. Important: for this question you are using different inputs inside of t.test.
ttest2<-t.test(yellowp$This.Year, yellowp$Last.Year, paired = TRUE, alternative = "greater", mu= 0)
ttest2
##
## Paired t-test
##
## data: yellowp$This.Year and yellowp$Last.Year
## t = 1.7556, df = 51, p-value = 0.04258
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.3397045 Inf
## sample estimates:
## mean of the differences
## 7.423077
Check the output above. Getting the info individually:
pointest2<-ttest2$estimate
pointest2
## mean of the differences
## 7.423077
tstat2<-ttest2$statistic
tstat2
## t
## 1.755628
pvalue2<-ttest2$p.value
pvalue2
## [1] 0.04257826
what_to_do2<-ifelse(pvalue2<0.05, "reject", "don't")
what_to_do2
## [1] "reject"
For confidence interval we need \(S_{D}\). Check lab session 4 slide 19. We created a column “Difference” using excel. Let’s do the same using R
. Then, find \(X_{D}\) and \(S_{D}\).
### Confidence Interval
### Creating a column "Difference"
yellowp$Difference<-yellowp$This.Year-yellowp$Last.Year
## xD and your pointest2 are the same
xD<-mean(yellowp$Difference)
xD
## [1] 7.423077
sD<-sd(yellowp$Difference)
sD
## [1] 30.48969
Final step is to get the critical values (use \(\frac{\alpha}{2}\)) and calculate the Confidence Interval:
tc<-abs(qt(0.05/2,51))
tc
## [1] 2.007584
max<-xD+tc*sD/sqrt(52)
max
## [1] 15.91146
min<-xD-tc*sD/sqrt(52)
min
## [1] -1.065308
For the final part, your alternative is \(\mu_{D}=.5\). Just change the input of t.test.
ttest3<-t.test(yellowp$This.Year, yellowp$Last.Year, paired = TRUE, alternative = "greater", mu= 0.5)
ttest3
##
## Paired t-test
##
## data: yellowp$This.Year and yellowp$Last.Year
## t = 1.6374, df = 51, p-value = 0.05385
## alternative hypothesis: true difference in means is greater than 0.5
## 95 percent confidence interval:
## 0.3397045 Inf
## sample estimates:
## mean of the differences
## 7.423077