1. Data collected from approximately the same period of time from a cross-section of a population are called:
a. time series data
b. linear data
c. cross-sectional data
d. historical data
2. Regression analysis as
...
1. Data collected from approximately the same period of time from a cross-section of a population are called:
a. time series data
b. linear data
c. cross-sectional data
d. historical data
2. Regression analysis asks:
a. if there are differences between distinct populations
b. if the sample is representative of the population
c. how a single variable depends on other relevant variables
d. how several variables depend on each other
3. In regression analysis, the variables used to help explain or predict the response variable are called the
a. independent variables
b. dependent variables
c. regression variables
d. statistical variables
4. In regression analysis, the variable we are trying to explain or predict is called the
a. independent variable
b. dependent variable
c. regression variable
d. statistical variable
e. residual variable
5. In regression analysis, if there are several explanatory variables, it is called:
a. simple regression
b. multiple regression
c. compound regression
d. composite regression
Copyright Cengage Learning. Powered by Cognero. Page 1
Name: Class: Date:
Chapter 10
6. In regression analysis, which of the following causal relationships are possible?
a. X causes Y to vary
b. Y causes X to vary
c. Other variables cause both X and Y to vary
d. All of these options
7. is/are especially helpful in identifying outliers.
a. Linear regression
b. Regression analysis
c. Normal curves
d. Scatterplots
e. Multiple regression
8. Outliers are observations that
a. lie outside the sample
b. render the study useless
c. lie outside the typical pattern of points on a scatterplot
d. disrupt the entire linear trend
9. A “fan” shape in a scatterplot indicates:
a. unequal variance
b. a nonlinear relationship
c. the absence of outliers
d. sampling error
10. A scatterplot that appears as a shapeless mass of data points indicates:
a. a curved relationship among the variables
b. a linear relationship among the variables
c. a nonlinear relationship among the variables
d. no relationship among the variables
Copyright Cengage Learning. Powered by Cognero. Page 2
Name: Class: Date:
Chapter 10
11. Correlation is a summary measure that indicates:
a. a curved relationship among the variables
b. the rate of change in Y for a one unit change in X
c. the strength of the linear relationship between pairs of variables
d. the magnitude of difference between two variables
12. A correlation value of zero indicates.
a. a strong linear relationship
b. a weak linear relationship
c. no linear relationship
d. a perfect linear relationship
13. The correlation value ranges from
a. 0 to +1
b. –1 to +1
c. –2 to +2
d. -Y to +Y
14. The covariance is not used as much as the correlation because
a. is not always a valid predictor of linear relationships
b. it is difficult to calculate
c. it is difficult to interpret
d. all of these options
15. A single variable X can explain a large percentage of the variation in some other variable Y when the two variables are:
a. mutually exclusive
b. inversely related
c. directly related
d. highly correlated
e. None of the above
Copyright Cengage Learning. Powered by Cognero. Page 3
Name: Class: Date:
Chapter 10
16. The term autocorrelation refers to:
a. the analyzed data refers to itself
b. the sample is related too closely to the population
c. the data are in a loop (values repeat themselves)
d. time series variables are usually related to their own past values
17. The weakness of scatterplots is that they:
a. do not help identify linear relationships
b. can be misleading about the types of relationships they indicate
c. only help identify outliers
d. do not actually quantify the relationships between variables
18. In linear regression, we fit the least squares line to a set of values (or points on a scatterplot). The distance from the
line to a point is called the:
a. fitted value
b. residual
c. correlation
d. covariance
e. None of these options
19. In linear regression, the fitted value is the:
a. predicted value of the dependent variable
b. predicted value of the independent value
c. predicted value of the slope
d. predicted value of the intercept
e. None of these options
20. In choosing the “best-fitting” line through a set of points in linear regression, we choose the one with the:
a. smallest sum of squared residuals
b. largest sum of squared residuals
c. smallest number of outliers
d. largest number of points on the line
e. None of these options
Copyright Cengage Learning. Powered by Cognero. Page 4
Name: Class: Date:
Chapter 10
21. The standard error of the estimate ( ) is essentially the
a. mean of the residuals
b. standard deviation of the residuals
c. mean of the explanatory variable
d. standard deviation of the explanatory variable
22. A multiple regression analysis including 50 data points and 5 independent variables results in 40. The multiple
standard error of estimate will be:
a. 0.901
b. 0.888
c. 0.800
d. 0.953
e. 0.894
23. Approximately what percentage of the observed Y values are within one standard error of the estimate of the
corresponding fitted Y values?
a. 67%
b. 95%
c. 99%
d. It is not possible to say
24. The percentage of variation ( ) can be interpreted as the fraction (or percent) of variation of the
a. explanatory variable explained by the independent variable
b. explanatory variable explained by the regression line
c. response variable explained by the regression line
d. error explained by the regression line
25. The percentage of variation (R2) ranges from
a. 0 to +1
b. –1 to +1
c. –2 to +2
d. –1 to 0
Copyright Cengage Learning. Powered by Cognero. Page 5
Name: Class: Date:
Chapter 10
26. In a simple linear regression analysis, the following sums of squares are produced:
The proportion of the variation in Y that is explained by the variation in X is:
a. 20%
b. 80%
c. 25%
d. 50%
e. None of the above
27. Given the least squares regression line,
a. the relationship between X and Y is positive
b. the relationship between X and Y is negative
c. as X increases, so does Y
d. as X decreases, so does Y
e. there is no relationship between X and Y
28. The regression line has been fitted to the data points (28, 60), (20, 50), (10, 18), and (25, 55). The sum
of the squared residuals will be:
a. 20.25
b. 16.00
c. 49.00
d. 94.25
29. In multiple regression, the constant :
a. Is the expected value of the dependent variable Y when all of the independent variables have the value zero
b. Is necessary to fit the multiple regression line to set of points
c. Must be adjusted for the number of independent variables
d. All of these options
Copyright Cengage Learning. Powered by Cognero. Page 6
Name: Class: Date:
Chapter 10
30. In multiple regression, the coefficients reflect the expected change in:
a. Y when the associated X value increases by one unit
b. X when the associated Y value increases by one unit
c. Y when the associated X value decreases by one unit
d. X when the associated Y value decreases by one unit
31. An important condition when interpreting the coefficient for a particular independent variable X in a multiple regression
equation is that:
a. the dependent variable will remain constant
b. the dependent variable will be allowed to vary
c. all of the other independent variables remain constant
d. all of the other independent variables be allowed to vary
32. The adjusted R2 adjusts R2 for:
a. non-linearity
b. outliers
c. low correlation
d. the number of explanatory variables in a multiple regression model
33. In linear regression, a dummy variable is used:
a. to represent residual variables
b. to represent missing data in each sample
c. to include hypothetical data in the regression equation
d. to include categorical variables in the regression equation
e. when “dumb” responses are included in the data
34. In linear regression, we can have an interaction variable. Algebraically, the interaction variable is the other variables in
the regression equation.
a. sum
b. ratio
c. product
d. mean
Copyright Cengage Learning. Powered by Cognero. Page 7
Name: Class: Date:
Chapter 10
35. Which of the following is an example of a nonlinear regression model?
a. A quadratic regression equation
b. A logarithmic regression equation
c. Constant elasticity equation
d. The learning curve model
e. All of these options
36. The two primary objectives of regression analysis are to study relationships between variables and to use those
relationships to make predictions.
a. True
b. False
37. Cross-sectional data are usually data gathered from approximately the same period of time from a cross-sectional of a
population.
a. True
b. False
38. Regression analysis can be applied equally well to cross-sectional and time series data.
a. True
b. False
39. In every regression study there is a single variable that we are trying to explain or predict. This is called the response
variable or dependent variable.
a. True
b. False
40. To help explain or predict the response variable in every regression study, we use one or more explanatory variables.
These variables are also called response variables or independent variables.
a. True
b. False
41. Scatterplots are used for identifying outliers and quantifying relationships between variables.
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 8
Name: Class: Date:
Chapter 10
42. An outlier is an observation that falls outside of the general pattern of the rest of the observations on a scatterplot.
a. True
b. False
43. When the scatterplot appears as a shapeless swarm of points, this can indicate that there is no relationship between
the response variable Y and the explanatory variable X, or at least none worth pursuing.
a. True
b. False
44. Correlation is used to determine the strength of the linear relationship between an explanatory variable X and response
variable Y.
a. True
b. False
45. Correlation is measured on a scale from 0 to 1, where 0 indicates no linear relationship between two variables, and 1
indicates a perfect linear relationship.
a. True
b. False
46. The residual is defined as the difference between the actual and predicted, or fitted values of the response variable.
a. True
b. False
47. The least squares line is the line that minimizes the sum of the residuals.
a. True
b. False
48. A useful graph in almost any regression analysis is a scatterplot of residuals (on the vertical axis) versus fitted values
(on the horizontal axis), where a “good” fit not only has small residuals, but it has residuals scattered randomly around
zero with no apparent pattern.
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 9
Name: Class: Date:
Chapter 10
49. A negative relationship between an explanatory variable X and a response variable Y means that as X increases, Y
decreases, and vice versa.
a. True
b. False
50. In reference to the equation, , the value 0.10 is the expected change in Y per unit change in .
a. True
b. False
51. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line:
= 84 +7X. This implies that if advertising is $800, then the predicted amount of sales (in dollars) is $140,000.
a. True
b. False
52. A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line:
= 84 +7X. This implies that if there is no advertising, then the predicted amount of sales (in dollars) is $84,000.
a. True
b. False
53. A regression analysis between weight (Y in pounds) and height (X in inches) resulted in the following least squares line:
= 140 + 5X. This implies that if the height is increased by 1 inch, the weight is expected to increase on average by 5
pounds.
a. True
b. False
54. A regression analysis between sales (in $1000) and advertising (in $) resulted in the following least squares line: = 32
+ 8X. This implies that an increase of $1 in advertising is expected to result in an increase of $40 in sales.
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 10
Name: Class: Date:
Chapter 10
55. In regression analysis, we can often use the standard error of estimate to judge which of several potential regression
equations is the most useful.
a. True
b. False
56. In simple linear regression, the divisor of the standard error of estimate is n – 1; simply because there is only one
explanatory variable of interest.
a. True
b. False
57. The regression line = 3 + 2X has been fitted to the data points (4, 14), (2, 7), and (1, 4). The sum of the residuals
squared will be 8.0.
a. True
b. False
58. In a simple regression analysis, if the standard error of estimate = 15 and the number of observations n = 10, then
the sum of the residuals squared must be 120.
a. True
b. False
59. In a simple linear regression problem, if the percentage of variation explained is 0.95, this means that 95% of the
variation in the explanatory variable X can be explained by regression.
a. True
b. False
60. The percentage of variation explained is the square of the correlation between the observed Y values and the fitted Y
values.
a. True
b. False
61. The multiple R for a regression is the correlation between the observed Y values and the fitted Y values.
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 11
Name: Class: Date:
Chapter 10
62. In a simple regression with a single explanatory variable, the multiple R is the same as the standard correlation
between the Y variable and the explanatory X variable.
a. True
b. False
63. In a simple linear regression problem, suppose that . Then the percentage of
variation explained must be 0.90.
a. True
b. False
64. In a multiple regression problem with two explanatory variables if, the fitted regression equation is
.
a. True
b. False
65. In the multiple regression model we interpret X1 as follows: holding X2 constant, if X1
increases by 1 unit, then the expected value of Y will increase by 9 units.
a. True
b. False
66. For the multiple regression model , if were to increase by 5 units, holding and
constant, the value of Y would be expected to decrease by 50 units.
a. True
b. False
67. In a multiple regression analysis with three explanatory variables, suppose that there are 60 observations and the sum
of the residuals squared is 28. The standard error of estimate must be 0.7071.
a. True
b. False
68. The R2 can only increase when extra explanatory variables are added to a multiple regression model
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 12
Name: Class: Date:
Chapter 10
69. The adjusted R2 is adjusted for the number of explanatory variables in a regression equation, and it has the same
interpretation as the standard R2.
a. True
b. False
70. The adjusted R2 is used primarily to monitor whether extra explanatory variables really belong in a multiple regression
model
a. True
b. False
71. If a categorical variable is to be included in a multiple regression, a dummy variable for each category of the variable
should be used, but the original categorical variables should not be sued.
a. True
b. False
72. An interaction variable is the product of an explanatory variable and the dependent variable.
a. True
b. False
73. We should include an interaction variable in a regression model if we believe that the effect of one explanatory variable
on the response variable Y depends on the value of another explanatory variable .
a. True
b. False
74. If the regression equation includes anything other than a constant plus the sum of products of constants and variables,
the model will not be linear
a. True
b. False
75. In a nonlinear transformation of data, the Y variable or the X variables may be transformed, but not both.
a. True
b. False
Copyright Cengage Learning. Powered by Cognero. Page 13
Name: Class: Date:
Chapter 10
76. The primary purpose of a nonlinear transformation is to “straighten out” the data on a scatterplot
a. True
b. False
77. If a scatterplot of residuals shows a parabola shape, then a logarithmic transformation may be useful in obtaining a
better fit
a. True
b. False
78. The coefficients for logarithmically transformed explanatory variables should be interpreted as the percent change in the
dependent variable for a 1% percent change in the explanatory variable.
a. True
b. False
79. The effect of a logarithmic transformation on a variable that is skewed to the right by a few large values is to “squeeze”
the values together and make the distribution more symmetric
a. True
b. False
80. A logarithmic transformation of the response variable Y is often useful when the distribution of Y is symmetric.
a. True
b. False
81. A constant elasticity, or multiplicative, model the dependent variable is expressed as a product of explanatory variables
raised to powers
a. True
b. False
The marketing manager of a large supermarket chain would like to determine the effect of shelf space (in feet) on the
weekly sales of international food (in hundreds of dollars). A random sample of 12 equal –sized stores is selected, with
the following results:
Store Shelf Space X Weekly Sales Y
1 10 2.0
2 10 2.6
3 10 1.8
4 15 2.3
5 15 2.8
6 15 3.0
Copyright
Cengag e Learning. Powered by Cognero.
Page 14
Name: Class: Date:
Chapter 10
7 20 2.7
8 20 3.1
9 20 3.2
10 25 3.0
11 25 3.3
12 25 3.5
82. (A) Draw a scatterplot of the data and comment on the relationship between shelf space and weekly sales.
(B) Run a regression on this data set and report the results.
(C) What are the least squares regression coefficients of the Y-intercept (a) and slope (b)?
(D) Interpret the meaning of the slope b.
(E) Predict the average weekly sales (in hundreds of dollars) of international food for stores with 13 feet of shelf space
for international food.
(F) Why would it not be appropriate to predict the average weekly sales (in hundreds of dollars) of international food for
stores with 35 feet of shelf space for international food?
(G) Identify the coefficient of determination, , and interpret its meaning.
(H) Determine the standard error of the estimate. What does it represent?
(I) Draw a scatterplot of residuals versus fitted values. What does this graph indicate?
ANSWER:
(A)
It seems that a linear relationship is appropriate to describe the relationship between shelf space and
weekly sales.
Copyright Cengage Learning. Powered by Cognero. Page 15
Name: Class: Date:
Chapter 10
(B)
(C) a = 1.48, and b = 0.074
(D) For each increase in shelf space by one foot, there is an expected increase in weekly sales by $7.40.
(E) (in $100), or $244.20
(F) Shelf space of 35 feet is outside the relevant range for the independent variable X.
(G) = 0.6839. This means that 68.39% of the variation in weekly sales can be explained by the variation
in shelf space available for international food.
(H) The standard error of the estimate = 0.3081. This represents the standard deviation of the residuals.
This value can be compared to the standard deviation of the weekly sales of international food to
determine if much improvement in accuracy has been gained by using the regression equation for
predicting the weekly sales.
(I)
Copyright Cengage Learning. Powered by Cognero. Page 16
Name: Class: Date:
Chapter 10
This is a useful graph in almost any regression analysis. We typically examine such a scatterplot for any
striking patterns. A “good” fit not only has small residuals, but it also has residuals scattered randomly
around 0 with no apparent pattern. This appears to be the case for the shelf space data.
The information below represents the relationship between the selling price (Y, in $1000) of a home, the square footage
of the home ( ), and the number of bedrooms in the home ( ). The data represents 65 homes sold in a particular
area of a city and was analyzed using simple linear regression for each independent variable.
Summary measures
Multiple R 0.8148
R-Square 0.6640
StErr of Estimate 8.5572
Regression coefficients
Coefficient Std Err t-value p-value
Constant 52.157 7.4784 6.9744 0.0000
Square Footage 4.646 0.4164 11.1575 0.0000
Summary measures
Multiple R 0.6487
R-Square 0.4208
StErr of Estimate 11.2344
Regression coefficients
Coefficient Std Err t-value p-value
Constant 100.628 5.2324 19.2316 0.0000
Number of Bedrooms 11.035 1.6310 6.7660 0.0000
Copyright Cengage Learning. Powered by Cognero. Page 17
Name: Class: Date:
Chapter 10
83. (A) Is there evidence of a linear relationship between the selling price and the square footage of the homes? If so,
interpret the least squares line and characterize the relationship (i.e., positive, negative, strong, weak, etc.).
(B) Identify and interpret the coefficient of determination ( ) for the model in (A).
(C) Identify and interpret the standard error of estimate for the model in (A).
(D) Is there evidence of a linear relationship between the selling price and number of bedrooms of the homes? If so,
interpret the least squares line and characterize the relationship (i.e., positive, negative, strong, weak, etc.).
(E) Identify and interpret the coefficient of determination ( ) for the model in (D).
(F) Identify and interpret the standard error of the estimate ( ) for the model in (C).
(G) Which of the two variables, the square footage or the number of bedrooms, is the relationship with home selling
price stronger? Justify your choice.
ANSWER:
(A) Yes; there is evidence of a linear relationship between the selling price and the square footage of the
homes. ; this model shows that homes in this area start at an average of $52,157
and the selling price increases by approximately $4,646 for each square foot in house size.
(B) The coefficient of determination = 0.6640; this represents 66.4% of the variation in selling price can
be explained by this regression equation.
(C) The standard error of the estimate = 8.5572. This represents the standard deviation of the
residuals. This value can be compare to the standard deviation of the selling price (variable Y) to
determine if much improvement in accuracy has been gained by using the regression equation to predict
this price.
(D) Yes; There is evidence of a linear relationship between the selling price and number of bedrooms of
the homes. ; this model shows that homes in this area start at an average of
$100,628 and the selling price increases by approximately $11,035 for each bedroom in the house.
(E) The coefficient of determination = 0.4208; this represents 42.08% of the variation in selling price
can be explained by this regression equation.
(F) The standard error of the estimate se = 11.2344; this represents the standard deviation of the
residuals. This value can be compared to the standard deviation of the selling price (variable Y) to
determine if much improvement in accuracy has been gained by using the regression equation to predict
this price.
(G) Square footage seems to have a stronger relationship with the selling price. When using square
footage as the explanatory variable, the value is higher (0.6640 > .4208) and the standard error of
estimate se value (8.5572 < 11.2344) is lower. This indicates that the first model (using square footage) is
a better fitting model.
Copyright Cengage Learning. Powered by Cognero. Page 18
Name: Class: Date:
Chapter 10
An automobile rental company wants to predict the yearly maintenance expense (Y) for an automobile using the
number of miles driven during the year ( ) and the age of the car ( , in years) at the beginning of the year. The
company has gathered the data on 10 automobiles and run a regression analysis with the results shown below.
Summary measures
Multiple R 0.9689
R-Square 0.9387
Adj R-Square 0.9212
StErr of Estimate 72.218
Regression coefficients
Coefficient Std Err t-value p-value
Constant 33.796 48.181 0.7014 0.5057
Miles Driven 0.0549 0.0191 2.8666 0.0241
Age of car 21.467 20.573 1.0434 0.3314
84. (A) Use the information above to estimate the linear regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Identify and interpret the coefficient of determination ( ), for the model in (A).
(D) Identify and interpret the adjusted for the model in (A).
ANSWER:
(A)
(B) This model shows that the maintenance costs per year start at $33.80 and increases by 5.5 cents for
each mile driven (holding the age of the car constant) and increases by $21.47 for each year of the cars
life (holding the miles driven constant) however, the age of the car is not significant in this model.
(C) = 0.9387; This means that 93.87% of the variation in the yearly maintenance expense can be
explained by this regression equation.
(D) Adjusted = 0.9212, this can be a useful index to monitor the impact of adding additional
explanatory variables into the model, but it does not have a direct interpretation similar to R2 for the model
in (A).
Copyright Cengage Learning. Powered by Cognero. Page 19
Name: Class: Date:
Chapter 10
La Cabaña, a popular motel chain in the southwest, is interested in developing a regression model that can predict the
occupancy rate (%) of its motels. Currently, the company is interested in using two explanatory variables to predict
occupancy. They want to use the amount of advertising (in $) used by each motel and if the particular location a
franchised location. Some regression information is presented below:
Summary measures
Multiple R 0.5358
R-Square 0.2871
Adj R-Square 0.2223
StErr of Estimate 7.582
Regression coefficients
Coefficient Std Err t-value p-value
Constant 43.118 11.4263 3.7735 0.0010
Advertising 0.0013 0.0006 2.4119 0.0247
Franchise 3.038 3.1759 0.9567 0.3491
85. (A) Use the information above to estimate the linear regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Would any of the variables in this model be considered a dummy variable? Explain your answer.
(D) Identify and interpret the coefficient of determination ( ) and the standard error of the estimate (se) for the model in
(A).
ANSWER:
(A)
(B) This model shows that the occupancy rate (%) increases slightly by 0.0013 for every additional dollar
with an increase in advertising (holding the location constant) and also increases by 3.038 if the location
is a franchised location (with advertising held constant).
(C) Yes; the location of the motel is a franchise is a dummy (0, 1) variable. This is a yes or no response.
(D) The coefficient of determination = 0.2871; this represents 28.71% of the variation in the occupancy
can be explained by this regression equation. The standard error of the estimate se = 7.582; this
represents the standard deviation of the residuals.
Copyright Cengage Learning. Powered by Cognero. Page 20
Name: Class: Date:
Chapter 10
A large auto dealership is interested in determining the number of cars that will be sold in a given quarter. The
management of the dealership believes that a relationship can be found between the number of cars sold (Y), the
advertised price ( ) and the current interest rates ( ). Their past experience shows that they tend to have better luck
using a non-linear relationship. Below is the output from a regression analysis using the natural logarithm of the
variables in the model.
Summary measures
Multiple R 0.9326
R-Square 0.8698
Adj R-Square 0.8498
StErr of Estimate 0.0259
ANOVA Table
Source df SS MS F p-value
Explained 2 0.0581 0.0290 43.4187 0.0000
Unexplained 13 0.0087 0.0007
Regression coefficients
Coefficient Std Err t-value p-value
Constant 4.3965 0.7549 5.8239 0.0001
Log Price -0.8255 0.2467 3.3456 0.0053
Log Interest -0.1225 0.1880 -0.6512 0.5262
86. (A) Use the information above to estimate the regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Does using a non-linear model seem to be a good choice in this example? Explain your answerThe R2 value is fairly strong at
.8698. The natural log of the advertised price seems to be doing a good job of predicting the value of Y.
However, you could determine the linear regression model and then compare the fit to this model if the
original data were available.
Copyright Cengage Learning. Powered by Cognero. Page 21
Name: Class: Date:
Chapter 10
The station manager of a local television station is interested in predicting the amount of television (in hours) that people
will watch in the viewing area. The explanatory variables are: age (in years), education (highest level obtained,
in years) and family size (number of family members in household). The multiple regression output is shown below:
Summary measures
Multiple R 0.8440
R-Square 0.7123
Adj R-Square 0.6644
StErr of Estimate 0.5598
ANOVA Table
Source df SS MS F p-value
Explained 3 13.9682 4.6561 14.8564 0.0000
Unexplained 18 5.6413 0.3134
Regression coefficients
Coefficient Std Err t-value p-value
Constant 1.683 1.1696 1.4389 0.1674
Age -0.0498 0.0199 -2.5018 0.0222
Education 0.2135 0.0503 4.2426 0.0005
Family Size 0.0405 0.0784 0.5168 0.6116
87. (A) Use the information above to estimate the linear regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Identify and interpret the coefficient of determination ( ) for the model in (A).
(D) Identify and interpret the standard error of the estimate for the model in (A).
ANSWER:
(A)
(B) This model shows that the number of hours people spend watching television decreases by 0.0498
hours with every additional year in age (while holding education level and family size constant); increases
by 0.2135 with a person’s education level increasing by one year (while holding age and family size
constant), and the number of hours increases by 0.0405 as the family size increases by one person
(while holding age and education level constant).
(C) The coefficient of determination = 0.7123; this represents 71.23% of the variation in the hours
spent watching television can be explained by this regression equation.
(D) se = 0.5598; this represents the standard deviation of the residuals.
This value can be compared to the standard deviation of the hours spent watching television (Y) to
determine if much improvement in accuracy has been gained by using the regression equation to predict
this expense.
Copyright Cengage Learning. Powered by Cognero. Page 22
Name: Class: Date:
Chapter 10
The human resource manager at Gamma, Inc. wants to examine the relationship between annual salaries (Y), the
number of years employees have worked at Gamma, Inc. ( ) and whether the employee is male or female ( ).
They are also interested in whether the interaction between the two explanatory variables ( ) has a significant
impact on salaries. These data have been collected for a sample of 28 employees and the regression output is shown
below.
Summary measures
Multiple R 0.8065
R-Square 0.6504
Adj R-Square 0.6067
StErr of Estimate 6572.3
Regression coefficients
Coefficient Std Err t-value p-value
Constant 29831.68 3904.56 7.640 0.0000
Years Employed 869.04 266.78 3.258 0.0033
Gender -2396.54 4620.04 -0.519 0.6087
Years & Gender 403.93 350.38 1.153 0.2603
Copyright Cengage Learning. Powered by Cognero. Page 23
Name: Class: Date:
Chapter 10
88. (A) Use the information above to estimate the linear regression model.
(B) Write the regression equation in (A) as two separate equations; one for females and one for males, and interpret the
results.
(C) Would any of the variables in the linear regression model in (A) be considered a dummy variable? Explain your
answer.
(D) Identify and interpret the coefficient of determination ( ) for the model in (A).
(E) Identify and interpret the standard error of estimate (se) for the model in (A).
Chapter 10
Regression coefficients
Coefficient Std Err t-value p-value
Constant -19.026 54.769 -0.3474 0.7355
Size 7.494 1.529 4.9010 0.0006
Number of Rooms 7.153 9.211 0.7767 0.4553
Age -0.673 0.992 -0.6789 0.5126
Attached Garage 0.453 20.192 0.0224 0.9826
89. (A) Use the information above to estimate the linear regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Would any of the variables in this model be considered a dummy variable? Explain your answer.
(D) Identify and interpret the coefficient of determination ( ) and the standard error of the estimate (se) for the model in
(A).
(E) Use the estimated model in (A) to predict the sales price of a 2500 square feet, 15-year old house that has 5 rooms
and an attached garage.
ANSWER:
(A)
(B) This model shows that the selling price (in $1,000) increases by about 7.5 for each square foot
increase in size, increase by 7.15 for each additional room, decreases by 0.67 with each year increase in
age, and increases by 0.453 for an attached garage. (In each case, all variables, except the one we are
interpreting its coefficient, are held constant).
(C) Yes; the attached garage is a dummy (0, 1) variable. This is a yes or no response.
(D) The coefficient of determination = 0.8910; this represents 89.1% of the variation in the selling price
can be explained by this regression equation. The standard error of the estimate se = 22.241; this
represents the standard deviation of the residuals.
(E) =7.494(25) + 7.153(5) -0.673(15) + 0.453(1) -19.026 = 194.447(in $1000) or $194,447.
Copyright Cengage Learning. Powered by Cognero. Page 25
Name: Class: Date:
Chapter 10
A new online auction site specializes in selling automotive parts for classic cars. The founder of the company believes
that the price received for a particular item increases with its age (i.e., the age of the car on which the item can be used
in years) and with the number of bidders. The multiple regression output is shown below.
Summary measures
Multiple R 0.8391
R-Square 0.7041
Adj R-Square 0.6783
StErr of Estimate 148.828
Regression coefficients
Coefficient Std Err t-value p-value
Constant -1242.986 331.204 -3.7529 0.0010
Age of part 75.017 10.647 7.0459 0.0000
Number of Bidders 13.973 10.443 1.3380 0.1940
90. (A) Use the information above to estimate the linear regression model.
(B) Interpret each of the estimated regression coefficients of the regression model in (A).
(C) Identify and interpret the coefficient of determination ( ) for the model in (A).
(D) Identify and interpret the standard error of the estimate (se) for the model in (A).
(E) Would you recommend that this company examine any other factors to predict the selling price? If yes, what other
factors would you want to consider? Explain your answer.
ANSWER:
(A)
(B) This model shows that the price received on a particular item increases by $75 as the age of the
automobile increases by a year while the number of bidders is held constant. The price also increases by
$13.97 as the number of bidders increase while the age of the automobile is held constant (however, the
number of bidders is not significant in this model).
(C) The coefficient of determination = 0.7041; this represents 70.41% of the variation in the selling
price can be explained by this regression equation
(D) The standard error of the estimate se = 148.828; this represents the standard deviation of the
residuals. This value can be compared to the standard deviation of the selling price (Y) to determine if
much improvement in accuracy has been gained by using the regression equation to predict this price.
(E) Yes, I would recommend that this company examines other factors to predict the selling price. The R2
value is a little weak. Other factors may include the type of automobile on which the part can be used, if
the car is foreign or domestic, asking price, etc.
An express delivery service company recently conducted a study to investigate the relationship between the cost of
shipping a package (Y), the package weight , and the distance shipped . Twenty packages were randomly
Copyright Cengage Learning. Powered by Cognero. Page 26
Name: Class: Date:
Chapter 10
selected from among the large number received for shipment, and a detailed analysis of the shipping cost was
conducted for each package. The sample information is shown in the table below:
91. (A) Estimate a simple linear regression model involving shipping cost and package weight. Interpret the slope coefficient
of the least squares line as well as the computed value of .
(B) Add another explanatory variable - distance shipped – to the regression to (A). Estimate and interpret this expanded
model. How does the value for this multiple regression model compare to that of the simple regression model
estimated in (A)? Explain any difference between the two values. Compute and interpret the adjusted value for
the revised model.
(C) Suppose that one of the managers of this express delivery service company is trying to decide whether to add an
interaction term involving the package weight and the distance shipped in the multiple regression model
developed previously. Why would the manager want to add such a term to the regression equation?
(D) Estimate the revised model using the interaction term suggested in (C).
(E) Interpret each of the estimated coefficients in your revised model in (D). In particular, how do you interpret the
coefficient for the interaction term in the revised model?
(F) Does this revised model in (D) fit the given data better than the original multiple regression model in (B)? Explain
why or why not.
ANSWER: (A)
Copyright Cengage Learning. Powered by Cognero. Page 27
Name: Class: Date:
Chapter 10
As the package weight increases by one pound, the cost of shipping the package increases by $1.49 on
average.
This simple linear regression model explains 59.85% of the total variation in the cost of shipment.
(B)
Now, holding all else constant, the cost of shipping a package rises by $1.29 when the package weight
increases by one pound. Furthermore, holding all else constant, the cost of shipping a package rises by
approximately $0.04 when the distance shipped increases by one mile.
Both the and adjusted values have increased considerably with the addition of the second
explanatory variable; this multiple regression model fits the given data better than did the simple linear
model. Note that the and adjusted values are quite similar here. Both explanatory variables are adding
to the explanation of the variation in the cost of shipment.
(C) The manager might want to add such a term if she/he believes that the rate of increase of the cost of
shipment with the package weight will be driven upward by a larger shipping distance.
(D)
Copyright Cengage Learning. Powered by Cognero. Page 28
Name: Class: Date:
Chapter 10
(E) The estimated regression coefficients of the revised model are interpreted as follows: As the package
weight increases by one pound and the distance shipped remains constant, the cost of shipment
increases approximately by $0.0199 plus the product of $0.0078 and the current distance shipped. In
other words, the increase in the cost of shipment depends upon the distance shipped. As the distance
shipped increases by one mile and the package weight remains constant, the cost of shipment increases
by $0.0062 plus the product of $0.0078 and the current package weight. In other words, the increase in
the cost of shipment depends upon the package weight.
(F) This revised model yields a higher coefficient of determination (98.53%) and thus fits the given data
better than the original model in (B) ( = 91.62%). The interaction term appears to add to the overall
explanatory power of the model.
Adjustors working for a large insurance agency are each given a company car which they use on the job to travel to
client locations to inspect damage to homes and automobiles that are covered by the agency. Although the cars are
owned by the agency, maintenance is currently left up to the discretion of the adjustors, who are reimbursed for any
costs they report. The agency believes that the lack of a maintenance policy has led to unnecessary maintenance
expenses. In particular, they believe that many agents wait too long to have maintenance performed on their company
cars, and that in such cases, maintenance expenses are inordinately high. The agency recently conducted a study to
investigate the relationship between the reported cost of maintenance visits for their company cars (Y) and the length of
time since the last maintenance service (X). The sample data are shown below:
Copyright Cengage Learning. Powered by Cognero. Page 29
Name: Class: Date:
Chapter 10
92. (A) Estimate a simple linear regression model with Service Interval (X) and Maintenance Cost (Y). Interpret the slope
coefficient of the least squares line as well as the computed value of .
(B) Do you think this model proves the agency’s point about maintenance? Explain your answer.
(C) Obtain a residual plot vs. Service Interval. Does this affect your opinion of the validity of the model in (A)?
(D) Obtain a scatterplot of Maintenance Cost vs. Service Interval. Does this affect your opinion of the validity of the
model in (A)?
(E) Use what you have learned about transformations to fit an alternative model to the one in (A).
(F) Interpret the model you developed in (E). Does it help you assess the agency’s claim? What should the agency
conclude about the relationship between service interval and maintenance costs?Copyright Cengage Learning. Powered by Cognero. Page 30
Name: Class: Date:
Chapter 10
As the service interval increases by one day, the cost of maintenance increases by $2.30 on average.
This simple linear regression model explains 86.8% of the total variation in the maintenance cost.
(B) The model above shows a strong relationship between the variables, as shown the by R2 of almost
87%. Maintenance costs do indeed increase with each day that passes, although that might be
somewhat expected. What the agency is really trying to show, however, is that costs are higher than
expected for longer intervals. In that case, they might hope to see a better fit from a non-linear model with
an upward-curving function.
(C)
The residual plot shows positive residuals for large and small service intervals, and negative residuals for
values in the middle of the range. This indicates that a linear fit may not be the best one for this data,
even though the R2 is relatively high.
Copyright Cengage Learning. Powered by Cognero. Page 31
Name: Class: Date:
Chapter 10
(D)
The scatterplot confirms what the residual plot indicated; a linear fit may not be the best one for this data.
As the agency suspects, there is an upward-curving trend in the data points.
(E)
The
above output is for a regression of log(maintenance cost) on service interval. This model improves the R2
to almost 95%, although we cannot compare it directly to the model in (A) because the dependent
variable has changed. However, the residual plot shown below looks much more acceptable (patternless
distribution of residuals) than the one shown in (C).
Copyright Cengage Learning. Powered by Cognero. Page 32
Name: Class: Date:
Chapter 10
(F) The model above shows that maintenance costs increase by a constant 0.4% for each addition day in
the service interval. Over time this can add up to significant maintenance expenses, as shown in the
scatterplot in (D). The agency might indeed reduce maintenance costs if it can convince or require its
adjustors to have their cars services on a smaller, regular interval.
Copyright Cengage Learning. Powered by Cognero. Page 33
Name: Class: Date:
Chapter 10
[Show More]