1.
The most widely used metrics and tools to assess a classification model are:
Correct Answer
D. All of the above
Explanation
The correct answer is "All of the above" because the question asks for the most widely used metrics and tools to assess a classification model, and all three options mentioned - Confusion Matrix, Cost-sensitive accuracy, and Area under the ROC curve - are commonly used in evaluating the performance of classification models. The Confusion Matrix provides a breakdown of true positive, true negative, false positive, and false negative predictions. Cost-sensitive accuracy considers the costs associated with different types of misclassifications. The Area under the ROC curve measures the model's ability to distinguish between classes.
2.
Which of the following is a good test dataset characterstic?
Correct Answer
C. Both properties
Explanation
A good test dataset should be both large enough to yield meaningful results and representative of the dataset as a whole. Having a large dataset ensures that the results obtained from testing are statistically significant and reliable. Additionally, the dataset should be representative of the entire dataset to avoid any bias or skewed results. This ensures that the test results can be generalized and applied to the entire dataset, increasing the validity of the findings.
3.
Which of the folllowing is an example of feature extraction?
Correct Answer
D. All of the above
Explanation
The given options all involve feature extraction techniques. Constructing a bag of words vector from an email involves converting the text into a numerical representation that captures the frequency of each word. Applying PCA (Principal Component Analysis) projects to a large high-dimensional dataset involves reducing the dimensionality of the data while preserving important information. Removing stopwords in a sentence involves eliminating common words that do not carry much meaning in text analysis. Therefore, all of the given options are examples of feature extraction.
4.
Which of the following is true about Naive Bayes ?
Correct Answer
C. Both Properties
Explanation
Naive Bayes assumes that all the features in a dataset are equally important and independent. This means that it assumes that no feature is more important than the others in predicting the outcome, and that the presence or absence of one feature does not affect the presence or absence of another feature. This assumption simplifies the calculation of probabilities in the Naive Bayes algorithm, making it computationally efficient and effective for certain types of classification problems.
5.
Which of the following sentence is FALSE regarding regression?
Correct Answer
D. It discovers causal relationships
Explanation
Regression analysis is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It aims to predict the value of the dependent variable based on the values of the independent variables. While regression can provide insights into the strength and direction of relationships between variables, it does not establish causality. It can only identify associations or correlations between variables, but not determine cause and effect relationships. Therefore, the statement that regression discovers causal relationships is false.
6.
Factors which affect the performance of learner system does not include
Correct Answer
D. Good data structures
Explanation
The factors that affect the performance of a learner system include the representation scheme used, the training scenario, and the type of feedback provided. However, good data structures are not considered a factor that directly affects the performance of a learner system. While good data structures can contribute to efficient data processing and organization, they do not directly impact the performance of the learner system in terms of its ability to learn and improve.
7.
Which of the following is characteristic of best machine learning method ?
Correct Answer
D. All of the above
Explanation
The best machine learning method should possess all the characteristics mentioned in the options. It should be fast, meaning it can process and analyze large amounts of data quickly. It should also have high accuracy, meaning it can make precise predictions or classifications. Additionally, it should be scalable, meaning it can handle increasing amounts of data without compromising its performance. Therefore, the best machine learning method should have all of these qualities.
8.
A multiple regression model has
Correct Answer
B. More than one dependent variable
Explanation
This answer is correct because in a multiple regression model, there can be more than one dependent variable. Multiple regression is used to analyze the relationship between a dependent variable and multiple independent variables. The model allows for the examination of how each independent variable affects the dependent variables simultaneously. Therefore, it is possible to have multiple dependent variables in a multiple regression model.
9.
Another name for output attribute is
Correct Answer
B. Independent Variable
Explanation
The correct answer is "Independent Variable." In statistics and research, an independent variable is a variable that is manipulated or controlled by the researcher. It is the variable that is believed to have an effect on the dependent variable. In the context of this question, the output attribute is being referred to as the independent variable, indicating that it is the variable being manipulated or controlled in the study.
10.
Logistic Regression is a _____ regression technique that is used to model data having a ______ outcome.
Correct Answer
D. Non-linear, Binary
Explanation
Logistic Regression is a non-linear regression technique that is used to model data having a binary outcome. In other words, it is used when the dependent variable is categorical and has only two possible outcomes. The logistic regression model calculates the probability of the binary outcome based on the independent variables. It uses a logistic function to transform the linear regression equation into a range of probabilities between 0 and 1. This makes it suitable for predicting binary outcomes such as yes/no, true/false, or success/failure.