1.
How many coefficients do you need to estimate in a simple linear regression model (one independent variable)?
Correct Answer
A. 2
Explanation
In a simple linear regression model with one independent variable, you only need to estimate two coefficients. One coefficient represents the slope of the regression line, which indicates the relationship between the independent variable and the dependent variable. The other coefficient represents the intercept, which is the value of the dependent variable when the independent variable is zero. These two coefficients are sufficient to define the linear relationship between the variables in the model.
2.
Can the skewness of variables be calculated just using the mean and median?
Correct Answer
B. No
Explanation
Skewness is a measure of the asymmetry of a probability distribution. It provides information about the shape of the distribution. While the mean and median can provide some indication of the central tendency of the data, they do not capture the full picture of the distribution's shape. Skewness takes into account the tails of the distribution and the relative frequencies of extreme values. Therefore, it is not possible to calculate the skewness of variables solely using the mean and median.
3.
Correlated variables can have zero correlation coeffficient.
Correct Answer
A. True
Explanation
Correlated variables can have zero correlation coefficient when the relationship between them is not linear. This means that even though there may be a relationship between the variables, it cannot be accurately represented by a straight line. In such cases, the correlation coefficient will be close to zero, indicating no linear relationship between the variables.
4.
A correlation between a man's health and age is found to be -1.09. Which of these will be reported to the doctor?
Correct Answer
D. None of the above
Explanation
The correlation coefficient between a man's health and age is -1.09, indicating a strong negative correlation. This means that as the man's age increases, his health decreases. Therefore, it cannot be concluded that age is a good predictor of health or that the man is healthy. Hence, the correct answer is "None of the above."
5.
Which of the following is used for predicting continuous dependent variable?
Correct Answer
C. Linear regression
Explanation
Linear regression is used for predicting a continuous dependent variable. It is a statistical model that examines the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the dependent variable and the independent variables and uses this relationship to make predictions. In contrast, mean is a measure of central tendency and is not used for predicting variables. Gaussian distribution is a probability distribution and logistic regression is used for predicting categorical dependent variables.
6.
To test linear relationship of y (dependent) and x (independent) continuous variables, which of the following plot is best suited?
Correct Answer
C. Scatter plot
Explanation
A scatter plot is best suited to test the linear relationship between two continuous variables, y and x. It displays the data points as individual dots on a graph, with the x-axis representing the independent variable and the y-axis representing the dependent variable. By examining the pattern of the dots on the scatter plot, we can determine if there is a linear relationship between the variables. A bar chart is used to compare categorical variables, a pictogram is used to represent data with pictures or symbols, and a histogram is used to display the distribution of a single variable.
7.
Which of the following describes Heteroskedasticity?
Correct Answer
D. Linear regression with varying error terms
Explanation
Heteroskedasticity refers to the situation in linear regression where the variability of the error terms is not constant across all levels of the independent variables. In other words, the spread of the residuals or errors is not the same for all values of the predictor variables. This violates one of the assumptions of linear regression, which assumes that the error terms have constant variance. Therefore, the correct answer is "Linear regression with varying error terms."
8.
Which of these is true of a linear regression model that perfectly fits the training data?
Correct Answer
D. None of the above
Explanation
A linear regression model that perfectly fits the training data means that it has achieved a perfect fit with no errors or residuals. In this case, it is not necessary that there will always be a zero test error. The model may or may not have a zero test error, depending on the quality and representativeness of the training data. Therefore, the correct answer is "None of the above" as none of the options accurately describe the outcome of a linear regression model that perfectly fits the training data.
9.
Predicting with trees evaluate ____ within each group of data.
Correct Answer
A. Homogeneity
Explanation
Predicting with trees evaluates homogeneity within each group of data. Homogeneity refers to the similarity or consistency of the data within a group. When predicting with trees, the algorithm splits the data into different groups based on certain conditions, and then evaluates the homogeneity of each group to make predictions. The goal is to create homogeneous groups where the data within each group is similar, as this allows for more accurate predictions.
10.
Which of the following is a statistical boosting based on additive logistic regression?
Correct Answer
C. Gamboosting
Explanation
Gamboosting is a statistical boosting algorithm based on additive logistic regression. Boosting is a machine learning technique that combines multiple weak learners to create a strong learner. Gamboosting specifically uses generalized additive models (GAMs) as weak learners and combines them through boosting to improve the overall predictive accuracy. GAMs are a flexible and powerful class of models that can capture complex relationships between predictors and the response variable. Therefore, Gamboosting is the correct answer as it is a boosting algorithm based on additive logistic regression using GAMs.