1.
Under the normality assumption, the estimator for β1 is a linear combination of normally distributed random variables.
Correct Answer
A. True
Explanation
Under the normality assumption, the estimator for β1 being a linear combination of normally distributed random variables is true. This assumption is often made in linear regression models, where it is assumed that the errors follow a normal distribution. The estimator for β1 is obtained through a combination of the observed data and the error term, and since both of these are normally distributed, the estimator for β1 is also normally distributed.
2.
In the regression model, the variable of interest for study is the response variable.
Correct Answer
A. True
Explanation
In a regression model, the response variable is the variable of interest for study. This means that it is the variable that we are trying to understand, predict, or explain using other variables in the model. The response variable is also sometimes referred to as the dependent variable or the outcome variable. It is the variable that we want to analyze and study the relationship with other variables in the regression model. Therefore, the statement "the variable of interest for study is the response variable" is true.
3.
The constant variance is diagnos=ted using the quantile-quantile normal plot.
Correct Answer
B. False
Explanation
The constant variance is not diagnosed using the quantile-quantile normal plot. The quantile-quantile normal plot is used to check the normality of the residuals in a statistical model. Constant variance is typically diagnosed using other diagnostic plots such as a plot of residuals against fitted values or a plot of residuals against a predictor variable. Therefore, the given statement is false.
4.
β1^ is an unbiased estimator for β0.
Correct Answer
B. False
5.
The estimator σ^2 is a fixed variable.
Correct Answer
B. False
Explanation
The statement "The estimator σ^2 is a fixed variable" is false. An estimator is a statistic used to estimate an unknown parameter, and it is not a fixed value. The estimator σ^2 represents the estimated variance and can vary depending on the sample data used to calculate it. Therefore, it is not a fixed variable.
6.
Only the log-transformation of the response variable can be used when the normality assumption does not hold.
Correct Answer
B. False
Explanation
The statement is false because there are other methods that can be used when the normality assumption does not hold. One alternative is to use non-parametric statistical tests, which do not rely on the assumption of normality. Additionally, transformations other than the log-transformation, such as square root or reciprocal transformations, can also be applied to the response variable to achieve normality.
7.
The only assumptions for a linear regression model are linearity, constant variance, and normality.
Correct Answer
B. False
Explanation
The statement is false because the assumptions for a linear regression model include linearity, constant variance, independence of errors, and normality of errors. In addition to linearity, constant variance, and normality, the assumption of independence of errors is also necessary for the model to be valid. This means that the errors or residuals should not be correlated with each other. Therefore, the given statement is incorrect as it does not include the assumption of independence of errors.
8.
A negative value of β1 is consistent with a direct relationship between x and Y.
Correct Answer
B. False
Explanation
A negative value of β1 is consistent with an *inverse* relationship between x and Y.
9.
In the simple linear regression model, we lose three degrees of freedom because of the estimation of the three model parameters, β0, β1, and σ^2.
Correct Answer
B. False
Explanation
In the simple linear regression model, we do not lose three degrees of freedom because of the estimation of the three model parameters, β0, β1, and σ^2. Instead, we lose two degrees of freedom for estimating β0 and β1, and an additional degree of freedom for estimating σ^2. Therefore, the correct answer is False.
10.
The regression coefficient is used to measure the linear dependence between two variables.
Correct Answer
B. False
Explanation
While this sounds close to the truth, the term "linear dependence" has a very specific definition in linear algebra. A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.
11.
If the constant variance assumption in ANOVA does not hold, the inference on the equality of the means will not be reliable.
Correct Answer
A. True
Explanation
If the constant variance assumption in ANOVA does not hold, it means that the variability of the dependent variable is not the same across all groups or levels of the independent variable. This violates one of the key assumptions of ANOVA, which assumes equal variances among groups. If this assumption is violated, it can lead to unreliable and inaccurate conclusions about the equality of means between groups. Therefore, the statement that the inference on the equality of the means will not be reliable is true.
12.
If one confidence interval in the pairwise comparison includes zero, we conclude that the two means are plausibly equal.
Correct Answer
A. True
Explanation
If one confidence interval in the pairwise comparison includes zero, it means that the difference between the means is not statistically significant. In other words, there is a possibility that the two means are equal. Therefore, we can conclude that the two means are plausibly equal.
13.
The mean sum of square errors in ANOVA measures variability within groups.
Correct Answer
A. True
Explanation
The statement is true because the mean sum of square errors (MSE) in ANOVA is a measure of the variability within groups. It calculates the average of the squared differences between each individual data point and the mean of its respective group. This measure helps to assess how much the data points within each group deviate from their group mean, indicating the level of variability within the groups. Therefore, the statement is correct.
14.
The linear regression model with a qualitative predicting variable with k levels/classes will have k+1 parameters to estimate.
Correct Answer
A. True
Explanation
In a linear regression model, each level/class of a qualitative predicting variable is represented by a binary (dummy) variable. Since there are k levels/classes, there will be k+1 binary variables to represent them. Each binary variable will have a corresponding parameter to estimate in the model. Therefore, the statement is true.
15.
If one confidence interval in the pairwise comparison includes only positive values, we conclude that the difference in means is statistically significantly positive.
Correct Answer
A. True
Explanation
If the confidence interval in a pairwise comparison includes only positive values, it means that the lower limit of the interval is greater than zero. This indicates that there is a statistically significant difference between the means, and the difference is positive. Therefore, we can conclude that the difference in means is statistically significantly positive.
16.
The number of degrees of freedom of the χ2 (chi-square) distribution for the variance estimator is N−k+1 where k is the number of samples.
Correct Answer
B. False
Explanation
The correct answer is False. The number of degrees of freedom of the χ2 (chi-square) distribution for the variance estimator is N-1, not N-k+1. The degrees of freedom in this case is equal to the number of samples minus 1.
17.
For assessing the normality assumption of the ANOVA model, we can use the quantile-quantile normal plot and the historgram of the residuals.
Correct Answer
A. True
Explanation
The quantile-quantile normal plot and the histogram of the residuals are both graphical tools used to assess the normality assumption of the ANOVA model. The quantile-quantile normal plot compares the observed quantiles of the residuals to the quantiles of a normal distribution, and if the points fall approximately along a straight line, it suggests that the residuals are normally distributed. The histogram of the residuals provides a visual representation of the distribution of the residuals, and if it resembles a bell-shaped curve, it indicates normality. Therefore, the statement is true.
18.
We can assess the assumption of constant-variance by plotting the residuals against fitted values.
Correct Answer
A. True
Explanation
The statement is true because plotting the residuals against fitted values allows us to visually examine if there is a consistent pattern in the spread of the residuals. If the spread of the residuals appears to be relatively constant across all levels of the fitted values, it suggests that the assumption of constant variance is met. On the other hand, if there is a clear pattern or trend in the spread of the residuals, it indicates that the assumption of constant variance may be violated. Therefore, plotting the residuals against fitted values is a useful tool for assessing the assumption of constant variance.
19.
The ANOVA is a linear regression model with two qualitative predicting variables.
Correct Answer
B. False
Explanation
The ANOVA (Analysis of Variance) is not a linear regression model with two qualitative predicting variables. ANOVA is a statistical method used to compare means between two or more groups, while linear regression is used to model the relationship between a dependent variable and one or more independent variables. In ANOVA, the predicting variables are categorical, not qualitative. Therefore, the given statement is false.
20.
The sampling distribution for the variance estimator in ANOVA is χ2 (chi-square) regardless of the assumption of the data.
Correct Answer
B. False
Explanation
The statement is false because the sampling distribution for the variance estimator in ANOVA is not always chi-square. It is only chi-square when the assumption of normality and homogeneity of variances is met. If these assumptions are violated, the sampling distribution may not follow a chi-square distribution.
21.
In a multiple linear regression model with 6 predicting variables but without intercept, there are 7 parameters to estimate.
Correct Answer
A. True
Explanation
In a multiple linear regression model without an intercept, each predicting variable is considered as a separate parameter to estimate. Since there are 6 predicting variables, there will be 6 parameters to estimate. Additionally, in this case, there is also an additional parameter for the slope of the regression line. Therefore, the total number of parameters to estimate would be 6 + 1 = 7. Hence, the given statement is true.
22.
The only objective of multiple linear regression is prediction.
Correct Answer
B. False
Explanation
The statement "The only objective of multiple linear regression is prediction" is false. While prediction is indeed one of the main objectives of multiple linear regression, it is not the only objective. Another important objective of multiple linear regression is to understand the relationships between the independent variables and the dependent variable. By analyzing the coefficients of the independent variables, we can determine the strength and direction of these relationships, which provides valuable insights for decision-making and understanding the underlying factors influencing the dependent variable.
23.
We can make causal inference in observational studies.
Correct Answer
B. False
Explanation
Causal inference in observational studies is generally more challenging compared to experimental studies. Observational studies do not involve random assignment of participants to different groups, which can introduce confounding variables and make it difficult to establish a cause-and-effect relationship. While observational studies can provide valuable insights and associations between variables, they cannot definitively establish causation. Therefore, the statement that we can make causal inference in observational studies is false.
24.
In order to make statistical inference on the regression coefficients, we need to estimate the variance of the error terms.
Correct Answer
A. True
Explanation
In order to make statistical inference on the regression coefficients, we need to estimate the variance of the error terms. This is because the error terms represent the variability or randomness in the relationship between the dependent and independent variables. By estimating the variance of the error terms, we can assess the precision and significance of the regression coefficients, which allows us to make inferences about the relationship between the variables in the population. Therefore, the statement is true.
25.
We cannot estimate a multiple linear regression model if the predicting variables are linearly dependent.
Correct Answer
A. True
Explanation
In multiple linear regression, we aim to estimate the relationship between a dependent variable and multiple independent variables. However, if the independent variables are linearly dependent, it means that one or more of the independent variables can be expressed as a linear combination of the others. This leads to a problem called multicollinearity, which makes it impossible to estimate the coefficients accurately. Therefore, it is true that we cannot estimate a multiple linear regression model if the predicting variables are linearly dependent.
26.
The estimated regression coefficients are unbiased estimators.
Correct Answer
A. True
Explanation
The estimated regression coefficients being unbiased estimators means that, on average, they provide accurate estimates of the true population regression coefficients. In other words, there is no systematic tendency for the estimated coefficients to consistently overestimate or underestimate the true coefficients. This is an important property in regression analysis, as it allows us to make reliable inferences about the relationships between variables in the population based on our sample data.
27.
Controlling variables used in multiple linear regression are used to control for bias in the sample.
Correct Answer
A. True
Explanation
Controlling variables in multiple linear regression is indeed used to control for bias in the sample. By including these variables in the regression model, we can account for their potential influence on the dependent variable and isolate the relationship between the independent variables and the dependent variable. This helps to minimize the impact of confounding factors and ensure that the estimated coefficients are more accurate and reliable. Therefore, the statement is true.
28.
We interpret the coefficient corresponding to one predictor in a regression with multiple predictors as the estimated expected change in the response variable associated with one unit of change in the corresponding predicting variable.
Correct Answer
B. False
Explanation
The statement is false because in a regression with multiple predictors, the interpretation of a coefficient corresponding to one predictor is not the estimated expected change in the response variable associated with one unit of change in the corresponding predicting variable. In the presence of multiple predictors, the interpretation of a coefficient is the estimated expected change in the response variable associated with one unit of change in the corresponding predictor, holding all other predictors constant.
29.
The error term in the multiple linear regression cannot be correlated.
Correct Answer
A. True
Explanation
In multiple linear regression, the error term represents the variability in the dependent variable that is not explained by the independent variables. It is assumed that the error term is not correlated, meaning that there is no relationship between the errors and the independent variables. This assumption is important for the validity of the regression model and for making accurate predictions. Therefore, the statement that the error term in multiple linear regression cannot be correlated is true.
30.
The hypothesis test for whether a subset of regression coefficients are all equal to zero is a partial F-test.
Correct Answer
A. True
Explanation
The explanation for the given correct answer is that a partial F-test is used to test whether a subset of regression coefficients, which represents a specific group of independent variables, are all equal to zero. This test is commonly used in regression analysis to determine the significance of a group of variables in explaining the dependent variable. Therefore, it is correct to say that the hypothesis test for whether a subset of regression coefficients are all equal to zero is a partial F-test.
31.
The estimated regression coefficient corresponding to a predicting variable will likely be different in the model with only one predicting variable alone versus in a model with multiple predicting variables.
Correct Answer
A. True
Explanation
In a model with only one predicting variable, the estimated regression coefficient represents the relationship between that variable and the outcome variable in isolation. However, in a model with multiple predicting variables, the estimated regression coefficient represents the relationship between the predicting variable and the outcome variable while controlling for the effects of other variables. Therefore, it is likely that the estimated regression coefficient will be different in the two models.
32.
Analysis of variance (ANOVA) is a multiple regression model.
Correct Answer
A. True
Explanation
ANOVA is not a multiple regression model. ANOVA is a statistical technique used to compare the means of two or more groups to determine if there are any statistically significant differences between them. It is used to analyze categorical independent variables, whereas multiple regression is used to analyze continuous independent variables. Therefore, the statement that ANOVA is a multiple regression model is incorrect.
33.
In multiple linear regression, we study the relationship between one response variable and both predicting quantitative and qualitative variables.
Correct Answer
A. True
Explanation
This statement is true because in multiple linear regression, we can have one response variable (the variable we are trying to predict) and multiple predictor variables (both quantitative and qualitative). The goal is to study the relationship between the response variable and the predictor variables to understand how they influence each other.
34.
We need to assume normality of the response variable for making inference on the regression coefficients.
Correct Answer
A. True
Explanation
In order to make accurate inferences on the regression coefficients, it is necessary to assume that the response variable follows a normal distribution. This assumption allows for the use of statistical techniques that rely on normality, such as hypothesis testing and confidence intervals. Without this assumption, the validity of the inference may be compromised. Therefore, it is important to assume normality of the response variable when making inferences on the regression coefficients.
35.
We can use the normal test to test whether a regression coefficient is equal to zero.
Correct Answer
B. False
Explanation
The statement is false because the normal test is not used to test whether a regression coefficient is equal to zero. The normal test is used to test whether the coefficient follows a normal distribution, not whether it is equal to zero. To test whether a regression coefficient is equal to zero, we typically use a t-test or a hypothesis test with appropriate null and alternative hypotheses.
36.
If a predicting variable is categorical with 5 categories in a linear regression model with intercept, we will include 5 dummy variables in the model.
Correct Answer
B. False
Explanation
In a linear regression model with intercept, if a predicting variable is categorical with 5 categories, we will include 4 dummy variables in the model. This is because we need to create a reference category, and then represent the remaining 4 categories using dummy variables. Each dummy variable represents one category and takes the value of 1 if the observation belongs to that category, and 0 otherwise. Therefore, the correct answer is False.
37.
Multiple linear regression captures the causation of a predicting variable to the response variable, conditional of other predicting variables in the model.
Correct Answer
B. False
Explanation
Multiple linear regression captures the association or relationship between the predicting variables and the response variable, not necessarily the causation. While it can help identify potential causal relationships, it cannot definitively establish causation.
38.
The error term variance estimator has a χ2 (chi-squared) distribution with n−11 degrees of freedom for a multiple regression model​​​​​​​ with 10 predictors.
Correct Answer
A. True
Explanation
The error term variance estimator in a multiple regression model has a chi-squared distribution with n-1 degrees of freedom, where n is the number of observations. In this case, the model has 10 predictors, so the degrees of freedom for the error term variance estimator would be n-11. Therefore, the statement is true.
39.
The sampling distribution for estimating confidence intervals for the regression coefficients is a normal distribution.
Correct Answer
B. False
Explanation
The sampling distribution for estimating confidence intervals for the regression coefficients is not necessarily a normal distribution. It depends on the sample size and the distribution of the population. In large samples, the sampling distribution tends to be approximately normal due to the Central Limit Theorem. However, in small samples or when the population distribution is not normal, the sampling distribution may not be normal. Therefore, the statement that the sampling distribution is always a normal distribution is false.
40.
The estimated variance of the error terms is the sum of squared residuals divided by the sample size minus the number of predictors minus one.
Correct Answer
A. True
Explanation
The estimated variance of the error terms is calculated by taking the sum of squared residuals and dividing it by the sample size minus the number of predictors minus one. This is a commonly used formula in statistics to estimate the variability of the errors in a regression model. By dividing the sum of squared residuals by an adjusted sample size, it accounts for the number of predictors in the model and provides a more accurate estimate of the error variance. Therefore, the statement is true.
41.
Assuming that the data are normally distributed, under the simple linear model, the estimated variance has the following sampling distribution:
Correct Answer
A. Chi-square with n-2 degrees of freedom
Explanation
In the simple linear model, the estimated variance follows a chi-square distribution with n-2 degrees of freedom. This is because the estimation of the variance involves subtracting the mean from each observed value, resulting in n-2 degrees of freedom. The chi-square distribution is commonly used for hypothesis testing and confidence interval estimation in linear regression.
42.
The fitted values are defined as:
Correct Answer
B. The regression line with parameters replaced with the estimated regression coefficients.
Explanation
The fitted values are calculated by replacing the parameters in the regression line with the estimated regression coefficients. These coefficients are estimated based on the observed data and represent the best-fit line that minimizes the sum of squared differences between the observed and predicted values. Therefore, the fitted values represent the predicted values of the response variable based on the estimated regression line.
43.
The estimators of the linear regression model are derived by:
Correct Answer
A. Minimizing the sum of squared differences between observed and expected values of the response variable.
Explanation
The estimators of the linear regression model are derived by minimizing the sum of squared differences between observed and expected values of the response variable. This is because the goal of linear regression is to find the line that best fits the data, and the sum of squared differences is a measure of how well the line fits the data. By minimizing this sum, we are finding the line that minimizes the overall error between the observed and expected values, resulting in the best fit line.
44.
The estimators for the regression coefficients are:
Correct Answer
D. Unbiased regardless of the distribution of the data.
Explanation
The correct answer is "Unbiased regardless of the distribution of the data." This means that the estimators for the regression coefficients are not affected by the distribution of the data. They provide unbiased estimates of the true regression coefficients, regardless of whether the data follows a normal distribution or not. This is a desirable property for estimators as it ensures that the estimates are not systematically too high or too low on average.
45.
The assumption of normality:
Correct Answer
C. It is needed for the sampling distribution of the estimators of the regression coefficients and hence for inference.
Explanation
The assumption of normality is necessary for the sampling distribution of the estimators of the regression coefficients and therefore for inference. This assumption allows us to make inferences about the population parameters based on the sample data. Without this assumption, we cannot accurately estimate the regression coefficients and make valid statistical inferences.
46.
The estimated versus predicted regression line for a given x*:
Correct Answer
B. Have the same expectation
Explanation
The estimated versus predicted regression line for a given x* should have the same expectation. This means that on average, the estimated and predicted values should be equal. However, they may not have the same variance. Variance refers to the spread or variability of the data points around the regression line. Therefore, the correct answer is that the estimated and predicted regression line should have the same expectation, but not necessarily the same variance.
47.
The variability in the prediction comes from:
Correct Answer
C. The variability due to a new measurement and due to estimation.
Explanation
The correct answer is "The variability due to a new measurement and due to estimation." This means that the prediction can vary because of both the uncertainty in the new measurement taken and the inherent variability in the estimation process. Both factors contribute to the overall variability in the prediction.
48.
Which one is correct?
Correct Answer
C. Residual analysis can only be used to assess uncorrelated errors.
Explanation
Residual analysis can only be used to assess uncorrelated errors because residuals are the differences between the observed values and the predicted values. If the errors are correlated, it means that there is a systematic pattern in the residuals, indicating that the model is not capturing all the relevant information. Therefore, by examining the residuals, we can determine if there is any correlation present in the errors and assess the independence assumption. The other options, such as using residuals vs fitted values or normal probability plot, may be useful for different purposes, but they do not specifically address the assessment of correlated errors.
49.
We detect departure from the assumption of constant variance
Correct Answer
A. When the residuals vs fitted values are larger in the ends but smaller in the middle.
Explanation
When the residuals vs fitted values are larger in the ends but smaller in the middle, it suggests a departure from the assumption of constant variance. This pattern indicates heteroscedasticity, which means that the variability of the residuals is not constant across all levels of the predictor variable. In other words, the spread of the residuals is not the same throughout the range of the predicted values. This violation of the assumption can affect the reliability and accuracy of the regression model.
50.
Which one is correct?
Correct Answer
D. None of the above
Explanation
The given answer is "None of the above" because none of the statements in the question are correct. The first statement suggests transforming the predicting variable if a departure from normality is detected, which is incorrect. The second statement suggests transforming the response variable if a departure from the independence assumption is detected, which is also incorrect. The third statement suggests using the Box-Cox transformation to improve upon the linearity assumption, which is again incorrect.