R exam project case study

Instructions: You need to upload two files on Blackboard as answers for this test: (1) A word document with descriptive answers to the questions; (2) A .txt file (Textedit in Mac and Notepad in Windows) with

input-output from the R-console. Copy and paste your work from R Console to the .txt file. Answer all

questions:

- Upload the “Affairs” dataset from the AER library. It has data which would allow us to analyze

the determinants of the number of extra-marital affairs that people may have.

It is a data frame containing 601 observations on 9 variables:

affairs numeric. - How often engaged in extramarital sexual intercourse during the past year? 0

= none, 1 = once, 2 = twice, 3 = 3 times, 7 = 4–10 times, 12 = monthly, 12 = weekly, 12 = daily.

gender factor indicating gender.

age numeric variable coding age in years: 17.5 = under 20, 22 = 20–24, 27 = 25–29, 32 =

30–34, 37 = 35–39, 42 = 40–44, 47 = 45–49, 52 = 50–54, 57 = 55 or over.

years married numeric variable coding number of years married: 0.125 = 3 months or less,

0.417 = 4–6 months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5 years, 7 = 6–8 years, 10

= 9–11 years, 15 = 12 or more years.

children factor.

- Are there children in the marriage?

religiousness numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 =

somewhat, 5 = very.

education numeric variable coding level of education: 9 = grade school, 12 = high school

graduate, 14 = some college, 16 = college graduate, 17 = some graduate work, 18 = master’s

degree, 20 = Ph.D., M.D., or other advanced degree.

occupation numeric variable coding occupation according to Hollingshead classification

(reverse numbering).

rating numeric variable coding self rating of marriage: 1 = very unhappy, 2 = somewhat

unhappy, 3 = average, 4 = happier than average, 5 = very happy - Carry out the following operations on the data.

a. Present a brief description of the data. In particular, display the first six rows of the

data to reveal the type of variables the dataset has. [6 points]

b. Plot affairs against the rating variable. What kind of regression model specification

would be suitable to represent the relationship between the two variables? Run the

appropriate regression model. Create additional variables if required for this model.

Interpret the variables. [17 points]

c. Report the summary of results. Report the confidence intervals of the coefficients.

Interpret the coefficients and comment on their individual significance. Comment on

the goodness of fit of the regression model. [15 points]

d. Describe the omitted variable bias that may arise in the regression specification you

have estimated above. [15 points]

e. Run a regression model by including the gender, age, years married, children,

religiousness, education, occupation variables in addition to the specification you have

estimated in (b). [5 points]

f. Compare the summary statistics from (c) with the specification we ran in (e). [16

points]

g. Report and interpret the F statistic. Why might the t-test for significance of coefficients

be inadequate in the specification described in (e)? [8 points]

h. Compute and report the standard error of regression, SER. [8 points]

i. Do you think the specification in (e) is a better representation of determinants of

affairs? [10 points]

