﻿ R Computing Project 作业代写

# R Computing Project 作業代寫

STA2ABS/AMS R Computing Project
Due Friday the 31st of May 2013 no later than 5pm
• You will have to submit a script file containing all your solutions to the project. This includes all the R
commands and worded answers. For worded answers, use the # symbol to distinguish it from your R
command answers.
• Please save the script using the format: RProject Familyname Studentnumber.R
(e.g. RProject Smith 12345678.R). Make sure you end the file name with .R so that the file is associ-
ated with the R software.
• Email your script to J.Zhang@latrobe.edu.au with the subject line specifying your computer
laboratory time e.g. R Project submission - Friday 11pm.
• In submitting your work, you are consenting that it may be copied and transmitted by the University
for the detection of plagiarism. At the start of your script file please type the following statement of
originality, “This is my own work. I did not copy any of it from anyone else”. Also type your name and
student number together with your computer lab time underneath the statement of originality.
1. The R package includes many useful functions for generating various types of random data.
(a) Using the help file for the function rnorm, explain in words how you could use rnorm to randomly
generate 10 numbers from a normal distribution with mean 25 and standard deviation 3.
(b) What are the optional arguments for rnorm and what are their default values?
(c) Randomly generate 10 numbers from a normal distribution with mean 25 and standard deviation 3 and
store these in a vector variable called ten.random.numbers. What commands did you use and
what were the numbers generated?
2. Consider the following data that consists of a group classification (Treatment or Control), weight (in kilo-
grams) and a numeric response to a new drug for 7 patients.
Patient 1 2 3 4 5 6 7
Group Control Treatment Treatment Treatment Control Treatment Control
Weight 59 90 47 106 85 73 61
Response 0.0 0.8 0.1 0.1 0.7 0.6 0.2
Table 1: Patient records data.
(a) Provide R commands that create a data frame called patient.records that contains the group
classification, weight and drug response information for the seven patients where each of these vari-
ables is named appropriately within the data frame and are of the appropriate type (i.e. numeric,
character etc). Carry this out in R.
(b) With reference to patient.records only, give two distinct one-line R commands that will display
just the weight of the patients.
(c) With reference to patient.records only, give a simple one-line R command that displays the
group, weight and drug response for just the fourth patient.
(d) Create a list called patient.data that includes the patient.records data frame, the average
weight of the patients and the average drug response of the patients. Objects within the list should be
given appropriate names.
1
3. A famous formula that can be used to roughly estimate the Blood Alcohol Concentration (BAC) percentage
is the Widmark formula given as
BAC% =
? Ounces × 5.4 × ADR
2.2 × Weight
?
− 0.015 × Hours
where
• ‘Ounces’ is the liquid ounces of alcohol consumed.
• ‘ADR’ is the alcohol distribution ratio. This is equal to 0.73 for males and 0.66 for females.
• ‘Weight’ is the weight (kg) of individual whose BAC is to be estimated.
• ‘Hours’ is the time in hours since the first drink.
(a) Suppose we wish to estimate the BAC% of a male who weighs 85kg and who has consumed 3.1 liquid
ounces of alcohol over the past 1.5 hours. Provide R commands that
i. Assign appropriate values to the R objects Ounces, Weight, ADR and Hours that will be used
to estimate the BAC% for this individual.
ii. Use these R objects and the Widmark formula to estimate the BAC%.
What is the estimated BAC% for this person?
(b) Now suppose that this person has consumed the same amount of alcohol over 2 hours. By changing
just the value you assigned to Hours in your script, what is the estimated BAC% now?
4. An evolutionary biologist examined the relative fitness of Escherichia coli bacteria evolved for 300 days
at stressful acidic pH level 5.5 and their parental generation, evolved at neutral pH level 7.2. Both types
were later grown together in an acidic medium and their relative fitness was computed. The experiment was
replicated with 10 different lines of Escherichia coli giving the following fitness values
1.08,0.98,0.89,1.22,1.07,1.10,1.15,1.04,1.00,1.09.
We assume these values are sampled from a normal distribution. A relative fitness of 1 indicates that both
the acidic and neutrally evolved line are equally fit when both are later grown in acidic conditions. A relative
fitness larger than 1 indicates that the acidic-evolved line is more fit than the neutrally-evolved line when
both are later grown in acidic conditions (that is, the acid-evolved bacteria grew the most). The evolutionary
biologist claims that acidic-evolved bacteria are better adapted to acidic conditions? 1
(a) Let µ denote the mean relative fitness between acidic-evolved bacteria and neutrally-evolved bacteria
when both are later grown in acidic conditions. Suppose we want to test the claim made by the
biologist. State the null and alternative hypothesis for this test.
(b) Store the sampled relative fitness values in a numeric vector object named Rel Fit.
(c) Use R to calculate the sample mean and sample standard deviation of the relative fitness values and
assign these to R objects named x bar and st d, respectively.
(d) Use the objects that have been assigned the sample mean and sample standard deviation to calculate
the observed test statistic for the hypothesis test stated in part (a).
(e) Use R to carry out the hypothesis test stated in part (a). What is the p-value for this test? Also, verify
from the R output that your calculation for the observed test statistic calculated in part (d) is correct.
(f) Based on your R output for the hypothesis test you conducted in part (e) make an appropriate conclu-
sion for this test, at the 5% level of significance.
2
5. Let x 1 ,...,x n denote a sample of n observations where x and s 2 denote the sample mean and sample
variance respectively. Also suppose that we are interested in testing the hypotheses
H 0 : µ = µ 0 versus H 1 : µ 6= µ 0
where µ is the population mean from which the data is sampled.
If x 1 ,...,x n are sampled from a normal distribution, then we may test the hypotheses using a t-test. Fur-
thermore, if we nominate a significance level of α = 0.05 then the probability that H 0 will be rejected when
it is in fact true is 0.05. Consider the following R code:
n<-20
x<-rnorm(n,mean=25)
p.value<-t.test(x,mu=25)\$p.value
p.valueR Computing Project
This code randomly generates 20 observations from the N(25,1) distribution and obtains a p-value for the
test of H 0 : µ = 25 versus H 1 : µ 6= 25 where we reject H 0 if the p-value is less than 0.05. Note here that
the population mean is in fact µ = 25 so that H 0 is true. This means that the probability of us rejecting H 0
is 0.05.
Another way of looking at this is as follows. If we were to repeat this process many times, then H 0 would
be rejected around 5% of the time. That is, if we repeated this 1000 times then we would expect H 0 to be
rejected around 0.05 × 1000 = 50 times.
An important question arises. What if the data is not sampled from a normal distribution? For exam-
ple, suppose that the 20 observations are sampled from the F 5,10 distribution such that the command
x<-rf(n,df1=5,df2=10)
is used instead of x<-rnorm(n,mean=25). The mean of a F ν 1 ,ν 2 random variable is equal to
ν 2
(ν 2 −2)
so that the true population mean is µ = 10/(10 − 2) = 1.25 and we are interested in the hypotheses
H 0 : µ = 1.25 versus H 1 : µ 6= 1.25. If the t-test is appropriate and we used the commands
p.value<-t.test(x,mu=1.25)\$p.value
p.value
to obtain a p-value, then in the long run we would expect to reject H 0 around 5% of the time.
If n is large then the t-test may be used regardless of the underlying distribution from which the data
was sampled. Another important question now arises. When is n large enough such that we can use the t
test appropriately? For this project you are required to use an R script to simulate the effectiveness of the
t-test. A for loop will be used to carry out the following 2000 times:
• Sample n observations from the F 5,ν 2 distribution.
• Obtain a p-value for the t-test carried out on the sampled data which tests the hypotheses
H 0 : µ =
ν 2
ν 2 − 2
versus H 1 : µ 6=
ν 2
ν 2 − 2 .
• Check whether or not H 0 is to be rejected.
Your script needs to give the proportion of times that H 0 was rejected so that you can check to see whether
it was close to expected (i.e. close to the nominated significance level of 0.05).
3R Computing Project 作業代寫
(a) Write a script file that may be used to carry out this simulation for n = 5, ν 1 = 5 and ν 2 = 10. For
this question you will be assessed on (i) whether the script can carry out the simulation correctly (ii)
clarity (i.e. whether it is easy to read and follow including the use of indentation within the code where
appropriate) and whether appropriate object names were used and (iii) on how easy it is to change the
script file so that the simulation can be carried out for different values for n and ν 2 (the fewer changes
the better).R Computing Project 作業代寫
(b) Carry out the simulation for n = 5 and ν 2 = 10 and enter the proportion of times H 0 was rejected
into the appropriate spot in Table 2 which can be found in the Word document named Table 2. Is
the proportion of times H 0 was rejected close to what we would expect if the t-test is appropriate?
Explain.
(c) Complete the rest of Table 2 where you will need to make changes to your script to account for the
choices of n and ν 2 required. Remember that (i) ν 1 = 5 throughout, and (ii) the value assigned to
the argument mu within the t.test function will change for each choice of ν 2 . (Important: The
values you place in the table should go up to 4 decimal places. You need to submit the Table 2 Word
document at the same time you submit your script file that contains all your solutions to the project.)
(d) By comparing the proportion of times you expected H 0 to be rejected (if the t-test is appropriate) with
the proportion of times H 0 was actually rejected, assess the effectiveness of the t-test (with respect to
a level of significance chosen to be 0.05) when data is sampled from a F 5,ν 2 distribution. Your answer
should discuss the effect of changing n and ν 2 and make use of your findings reported in Table 2.
1. Question 4 is adapted from Example 17.41, pp. 454-455 of B ALDI , B., & M OORE , D.S. (2009). The practice of
statistics in the life sciences. Freeman, New York.
4
R Computing Project 作業代寫

QQ： 273427
QQ： 273427

/Brisbane_paper代寫時間

Copyright ? 2002-2012 悉尼論文代寫,墨爾本論文代寫,悉尼作業代寫.墨爾本作業代寫.

#### 在線客服  Badgeniuscs  