1.
For the given set of points which of the following lines is most suitable to be the decision boundary?
Correct Answer
A. Option 1
2.
Which of the following is/are true about the Perceptron classifier? (Choose multiple option)
Correct Answer(s)
A. It can learn a OR function
B. It can learn a AND function
C. The obtained separating hyperplane depends on the order in which the points are presented in the training process.
Explanation
The Perceptron classifier is capable of learning both the OR function and the AND function. It can learn the OR function by adjusting its weights and biases to classify inputs that satisfy the logical OR condition. Similarly, it can learn the AND function by adjusting its weights and biases to classify inputs that satisfy the logical AND condition. Additionally, the obtained separating hyperplane in the Perceptron classifier depends on the order in which the points are presented during the training process. This means that the order of the training data can affect the final decision boundary of the classifier. However, for a linearly separable problem, there always exists some initialization of the weights that will lead to convergence, eliminating the possibility of non-convergent cases.
3.
Suppose you run K-means clustering algorithm on a given dataset. What are the factors
on which the final clusters depend on ?
I. The value of K
II. The initial cluster seeds chosen
III. The distance function used.
Correct Answer
D. I,II and III
Explanation
The final clusters in K-means clustering depend on the value of K, which determines the number of clusters to be formed. The initial cluster seeds chosen also affect the final clusters as they determine the starting points for the algorithm. Additionally, the distance function used plays a crucial role in calculating the similarity between data points and assigning them to clusters. Therefore, all three factors - the value of K, the initial cluster seeds, and the distance function used - impact the final clusters in K-means clustering.
4.
After training an SVM, we can discard all examples which do not support vectors and can still classify new examples?
Correct Answer
A. True
Explanation
After training an SVM, we can discard all examples which do not support vectors and still classify new examples because support vectors are the only examples that determine the decision boundary of the SVM. These support vectors are the closest examples to the decision boundary and are crucial for classification. By discarding examples that are not support vectors, we can simplify the model without losing classification accuracy.
5.
If g(z) is the sigmoid function, then its derivative with respect to z may be written in term of g(z) as
Correct Answer
A. G(z)(1-g(z))
Explanation
The sigmoid function is defined as g(z) = 1 / (1 + e^(-z)). To find its derivative with respect to z, we can use the quotient rule. Let f(z) = 1 and g(z) = 1 + e^(-z). Applying the quotient rule, we get f'(z)g(z) - f(z)g'(z) / (g(z))^2. Simplifying this expression gives (0)(1 + e^(-z)) - 1(-e^(-z)) / (1 + e^(-z))^2, which simplifies further to -e^(-z) / (1 + e^(-z))^2. Multiplying the numerator and denominator by e^z, we get -e^(-z) * e^z / (e^z + 1)^2. Simplifying this gives -1 / (e^z + 1) * 1 / (e^z + 1), which is equivalent to -1 / (e^z + 1)^2. Since g(z) = 1 / (1 + e^(-z)), we can substitute g(z) into the expression to get -1 / (g(z) + 1)^2. This is equivalent to -g(z)(1 - g(z)), which matches the given answer of g(z)(1 - g(z)).
6.
The back-propagation learning algorithm applied to a two layer neural network
Correct Answer
B. Finds a locally optimal solution which may be globally optimal.
Explanation
The back-propagation learning algorithm applied to a two-layer neural network finds a locally optimal solution which may be globally optimal. This means that while the algorithm may not guarantee the absolute best solution, it is able to find a solution that is locally optimal and could potentially be the globally optimal solution. This is because the back-propagation algorithm iteratively adjusts the weights of the neural network based on the error between the predicted and actual outputs, gradually improving the network's performance. However, it is important to note that the algorithm may still get stuck in local optima depending on the specific problem and network architecture.
7.
Which of the following is true?
Correct Answer
B. In batch gradient descent we update the weights and biases of our neural network
after forward pass over all the training examples.
Explanation
In batch gradient descent, the weights and biases of the neural network are updated after a forward pass over all the training examples. This means that the updates are made using the average gradient calculated from all the training examples in a single iteration. This approach allows for a more accurate update of the parameters and can lead to faster convergence compared to updating after each individual training example.
8.
In a neural network, which one of the following techniques is NOT useful to reduce
overfitting?
Correct Answer
D. Adding more layers
Explanation
Adding more layers is not useful to reduce overfitting in a neural network. Overfitting occurs when a model becomes too complex and starts to memorize the training data instead of learning the underlying patterns. Adding more layers can increase the complexity of the model, potentially exacerbating overfitting. Techniques like dropout, regularization, and batch normalization, on the other hand, are specifically designed to combat overfitting by introducing regularization constraints, reducing co-adaptation between neurons, and normalizing the inputs, respectively.
9.
For an image recognition problem (such as recognizing a cat in a photo), which
architecture of neural network has been found to be better suited for the tasks
Correct Answer
C. Convolutional neural network
Explanation
Convolutional neural networks (CNNs) have been found to be better suited for image recognition problems such as recognizing a cat in a photo. CNNs are specifically designed to process grid-like data, like images, and are able to automatically learn and extract features from the input data. They consist of multiple layers of interconnected neurons, including convolutional layers that apply filters to the input data, pooling layers that downsample the data, and fully connected layers for classification. This architecture allows CNNs to efficiently capture spatial hierarchies and patterns in images, making them highly effective for image recognition tasks.
10.
The Bayes Optimal Classifier
Correct Answer
B. Is an ensemble of all the hypotheses in the hypothesis space
Explanation
The Bayes Optimal Classifier is an ensemble of all the hypotheses in the hypothesis space. This means that it considers and combines the predictions of all possible hypotheses in order to make the most accurate classification decisions. By considering all hypotheses, the Bayes Optimal Classifier aims to minimize the overall classification error and maximize the accuracy of its predictions. Therefore, it is considered to be the most optimal and reliable classifier.