Data Mining Course Quiz

1.

The problem of finding hidden structures in unlabeled data is called...

A.

Supervised learning
B.

Unsupervised learning
C.

Reinforcement learning

Correct Answer
B. Unsupervised learning

Explanation
Unsupervised learning is the correct answer because it refers to the problem of finding hidden structures in unlabeled data. Unlike supervised learning, where the data is labeled and the algorithm learns from the provided labels, unsupervised learning involves discovering patterns, relationships, and structures within the data without any prior knowledge or guidance. This approach is particularly useful when dealing with large datasets where manual labeling is impractical or unavailable.

Rate this question:

2

2.

The task of inferring a model from labeled training data is called...

A.

Unsupervised learning
B.

Supervised learning
C.

Reinforcement learning

Correct Answer
B. Supervised learning

Explanation
Supervised learning refers to the process of inferring a model from labeled training data. In this approach, the training data consists of input-output pairs, where the desired output is known for each input. The goal is to learn a mapping function that can predict the correct output for new, unseen inputs. This differs from unsupervised learning, where the training data is unlabeled, and reinforcement learning, which involves learning through interactions with an environment and receiving feedback in the form of rewards or punishments.

Rate this question:

3.

Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers. This is an example of...

A.

Supervised learning
B.

Data extraction
C.

Seriation
D.

Unsupervised learning

Correct Answer
D. Unsupervised learning

Explanation
The given scenario of a telecommunication company wanting to segment their customers into distinct groups aligns with the concept of unsupervised learning. In unsupervised learning, the algorithm analyzes a dataset without any predetermined labels or target variables. It aims to find patterns, relationships, or groupings within the data itself. In this case, the company wants to identify distinct customer groups based on certain criteria, without having predefined categories or labels. Therefore, the use of unsupervised learning techniques would be appropriate for this task.

Rate this question:

4.

Self-organizing map is an example of...

A.

Unsupervised learning
B.

Supervised learning
C.

Reinforcement learning
D.

Missing data imputation

Correct Answer
A. Unsupervised learning

Explanation
A self-organizing map is an example of unsupervised learning because it is a type of artificial neural network that learns from unlabeled data. In unsupervised learning, the algorithm tries to find patterns or relationships in the input data without any predefined labels or targets. Self-organizing maps use a competitive learning process to organize the input data into a two-dimensional grid, where similar data points are grouped together. This allows for clustering and visualization of complex data structures, making it an effective tool for exploratory data analysis and pattern recognition tasks.

Rate this question:

2

5.

You are given data about seismic activity in Japan, and you want to predict the magnitude of the next earthquake. This is in an example of...

A.

Supervised learning
B.

Unsupervised learning
C.

Seriation
D.

Dimensionality reduction

Correct Answer
A. Supervised learning

Explanation
The given scenario of using data about seismic activity in Japan to predict the magnitude of the next earthquake falls under the category of supervised learning. In supervised learning, a model is trained on labeled data, where the input features (seismic activity data) are accompanied by the corresponding output labels (earthquake magnitude). The model learns the relationship between the input and output variables and can then make predictions on new, unseen data. In this case, the model will use the historical seismic activity data to predict the magnitude of the next earthquake based on the patterns and relationships it has learned from the labeled data.

Rate this question:

1

6.

Assume you want to perform supervised learning and to predict the number of newborns according to the size of the storks' population (http://www.brixtonhealth.com/storksBabies.pdf). It is an example of...

A.

Classification
B.

Regression
C.

Clustering
D.

Structural equation modeling

Correct Answer
B. Regression

Explanation
This is an example of regression because the goal is to predict the number of newborns, which is a continuous numerical variable, based on the size of the storks' population. Regression is a type of supervised learning that focuses on predicting continuous variables.

Rate this question:

7.

Discriminating between spam and ham e-mails is a classification task, true or false?

A.

True
B.

False

Correct Answer
A. True

Explanation
Discriminating between spam and ham emails is indeed a classification task. Classification involves categorizing data into different classes based on certain features or characteristics. In this case, the task is to classify emails as either spam or ham (non-spam). Various machine learning algorithms can be used to analyze the content, structure, and other attributes of emails to accurately classify them as spam or ham. Therefore, the correct answer is true.

Rate this question:

8.

In the example of predicting the number of babies based on storks' population size, the number of babies is...

A.

Outcome
B.

Feature
C.

Attribute
D.

Observation

Correct Answer
A. Outcome

Explanation
In the context of predicting the number of babies based on storks' population size, the term "outcome" refers to the result or the dependent variable being predicted. It represents the number of babies, which is the ultimate outcome of the analysis. This term is commonly used in statistical modeling to denote the variable that is being predicted or studied.

Rate this question:

9.

It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.

A.

True
B.

False

Correct Answer
B. False

Explanation
The statement is false because the ROC curve is a useful metric for evaluating the performance of classification models, especially when the dataset is imbalanced. The accuracy paradox refers to a situation where a high accuracy rate does not necessarily indicate a good model performance, but this does not mean that the ROC curve itself suffers from this paradox. The ROC curve provides a comprehensive view of the trade-off between the true positive rate and the false positive rate, allowing for the selection of an appropriate threshold for classification.

Rate this question: