1.
Which of the following model is usually gold standard for data analysis?
Correct Answer
A. Inferential
Explanation
The inferential model is usually considered the gold standard for data analysis because it allows researchers to make predictions and draw conclusions about a population based on a sample. This model involves using statistical techniques to analyze data and make inferences about a larger population. Descriptive analysis, on the other hand, focuses on summarizing and describing the data without making any predictions or inferences. Causal analysis is used to determine cause-and-effect relationships between variables, but it is not typically considered the gold standard for data analysis. Therefore, the correct answer is inferential.
2.
Which of the following are “Measures of Central Tendency”?
Correct Answer
C. Mode, Mean, Median
Explanation
The measures of central tendency are statistical measures used to describe the center or average of a data set. The mode is the most frequently occurring value, the mean is the average of all values, and the median is the middle value when the data set is arranged in ascending or descending order. Therefore, the correct answer is mode, mean, and median as they are all measures of central tendency.
3.
Who is a data scientist?
Correct Answer
D. All of the above
Explanation
A data scientist is someone who possesses a combination of skills in mathematics, statistics, and software programming. They use these skills to analyze and interpret complex data sets, identify patterns and trends, and develop algorithms and models to solve problems and make data-driven decisions. By having expertise in all three areas, data scientists are able to handle the entire process of data analysis, from collecting and cleaning data to implementing and deploying analytical solutions. Therefore, the correct answer is "All of the above" as all three roles (mathematician, statistician, and software programmer) are encompassed within the field of data science.
4.
Which of the following is performed by Data Scientist?
Correct Answer
C. Challenge results
Explanation
Data scientists perform the task of challenging results. This involves critically analyzing and evaluating the outcomes of data analysis and machine learning models. They assess the reliability and accuracy of the results, identify any limitations or biases, and determine if the findings align with the initial research question or hypothesis. By challenging results, data scientists ensure the validity and robustness of the conclusions drawn from the data analysis process.
5.
Which of the following is one of the key data science skill?
Correct Answer
B. Machine learning
Explanation
Machine learning is one of the key data science skills because it involves the use of algorithms and statistical models to enable computers to learn from and make predictions or decisions based on data. It is a crucial skill in data science as it allows for the development of models that can analyze and interpret large amounts of data, identify patterns, and make accurate predictions or classifications. Machine learning is widely used in various industries for tasks such as fraud detection, recommendation systems, image recognition, and natural language processing.
6.
Raw data should be processed only one time.
Correct Answer
B. False
Explanation
Processing raw data multiple times can be necessary in certain situations. For example, if new information or updates are received, the raw data may need to be processed again to incorporate these changes. Additionally, different analyses or calculations may require different processing methods, leading to the need for multiple processing steps. Therefore, the statement that raw data should be processed only one time is incorrect.
7.
Which of the following is characteristic of Processed Data?
Correct Answer
D. None of the mentioned
Explanation
Processed data refers to information that has been organized, structured, or manipulated in some way to make it more useful and meaningful for analysis. It is the opposite of raw data, which is unprocessed and typically not ready for analysis. Therefore, the statement "None of the mentioned" is the correct answer because processed data is indeed ready for analysis and can be used effectively for data analysis purposes.
8.
Which of the following testing is concerned with making decisions using data?
Correct Answer
B. Hypothesis
Explanation
Hypothesis testing is concerned with making decisions using data. In hypothesis testing, a researcher formulates a hypothesis about a population parameter and collects data to determine whether the evidence supports or contradicts the hypothesis. The goal is to make an inference about the population based on the sample data. This involves making decisions, such as accepting or rejecting the null hypothesis, based on the evidence provided by the data. Therefore, hypothesis testing is the correct answer as it involves using data to make decisions.
9.
Which of the following of a random variable is a measure of spread?
Correct Answer
B. Standard deviation
Explanation
Standard deviation is a measure of spread for a random variable. It quantifies the amount of dispersion or variability in the data set. It measures how far each data point is from the mean, providing an indication of the spread or dispersion around the average. A higher standard deviation indicates a greater spread, while a lower standard deviation indicates a narrower spread. Therefore, the correct answer is standard deviation.
10.
Which of the following technique comes under practical machine learning?
Correct Answer
A. Decision Tree
Explanation
Decision Tree is a technique that falls under practical machine learning. It is a supervised learning algorithm that is used for both classification and regression tasks. It is practical because it is easy to understand and interpret, and it can handle both categorical and numerical data. Decision Tree builds a model by learning simple decision rules inferred from the data features, making it a widely used technique in various industries and applications. Data visualization and forecasting, though related to machine learning, are not specific techniques but rather tools or methods that can be used in conjunction with different machine learning algorithms.
11.
Which of the following is definition of Raw Data?
Correct Answer
A. Set of Measurement on Recorded Values
Explanation
Raw data refers to unprocessed and unorganized data that is collected directly from various sources. It consists of measurements or recorded values in their original form, without any manipulation or analysis. Raw data serves as the foundation for data analysis and is typically transformed and processed to extract meaningful insights and patterns. Therefore, the definition "Set of Measurement on Recorded Values" accurately describes raw data.
12.
__________ is the standard deviation of a sampling distribution.
Correct Answer
D. Standard error
Explanation
Standard error is the correct answer because it represents the standard deviation of a sampling distribution. A sampling distribution is a distribution of statistics obtained from multiple samples of the same population. The standard error measures the variability or spread of these statistics, indicating how much they differ from the true population parameter. It is an important measure in inferential statistics as it helps estimate the precision of sample statistics and make inferences about the population.
13.
Which of the following diagram is used to view correlation?
Correct Answer
C. Corrgram
Explanation
A corrgram is a diagram used to view correlation. It displays a matrix of correlation coefficients between variables, usually represented by a grid of squares. Each square represents the correlation between two variables, with the color or shading indicating the strength and direction of the correlation. This diagram is useful for visually understanding the relationships between variables and identifying patterns or trends in the data.
14.
____________ is a multidisciplinary which involves extraction of knowledge from large volumes of data that are structured or unstructured.
Correct Answer
A. Data Science
Explanation
Data Science is the correct answer because it is a multidisciplinary field that involves the extraction of knowledge from large volumes of data, whether it is structured or unstructured. Data scientists use various techniques and tools to analyze and interpret data in order to gain insights and make informed decisions. This field combines elements of statistics, mathematics, computer science, and domain knowledge to extract valuable information from data.
15.
Pick Lazy Algorithm
Correct Answer
C. KNN
Explanation
KNN stands for K-Nearest Neighbors, which is a lazy algorithm used for classification and regression tasks. It works by finding the k nearest neighbors to a given data point in the feature space and making predictions based on the majority class or average value of those neighbors. KNN is a non-parametric algorithm, meaning it does not make any assumptions about the underlying data distribution. It is simple to implement and can be effective for small to medium-sized datasets. However, it can be computationally expensive for large datasets and may not perform well in the presence of irrelevant or noisy features.
16.
3V’s in Big Data
Correct Answer
B. Volume, Velocity, Variety
Explanation
The correct answer is Volume, Velocity, Variety. These are the three main characteristics of big data. Volume refers to the large amount of data being generated and collected. Velocity refers to the speed at which data is being generated and needs to be processed in real-time. Variety refers to the different types and formats of data, including structured, unstructured, and semi-structured data. These three V's are essential for understanding and analyzing big data effectively.
17.
Positive Correlation:
Correct Answer
C. Above 0.8
Explanation
The correct answer is "Above 0.8". In statistics, a positive correlation indicates that as one variable increases, the other variable also tends to increase. The value of 0.8 indicates a strong positive correlation, meaning that there is a high degree of linear relationship between the two variables. Therefore, when the correlation coefficient is above 0.8, it suggests a strong positive correlation between the variables being studied.
18.
Weighted Average is used in:
Correct Answer
C. Forecasting
Explanation
Weighted average is commonly used in forecasting to calculate a weighted average of historical data. This allows for the consideration of different weights or importance assigned to each data point, based on factors such as recency or reliability. By using a weighted average, the forecast can reflect the significance of each data point and provide a more accurate prediction of future trends or values. Therefore, forecasting is a specific application where weighted average is utilized.
19.
Sequential Modelling is done on
Correct Answer
C. RNN
Explanation
Sequential modeling is a technique used to analyze and predict sequential data, such as time series or natural language. Recurrent Neural Networks (RNN) are particularly suitable for sequential modeling as they have a feedback loop that allows information to persist and be processed over time. Therefore, RNN is the correct answer as it is specifically designed for sequential modeling tasks. CNN (Convolutional Neural Networks) are mainly used for image and video analysis, KNN (K-Nearest Neighbors) is a non-parametric algorithm for classification and regression, and ANN (Artificial Neural Networks) is a general term that can refer to any type of neural network model.
20.
Why Machine Learning in Data Science?
Correct Answer
B. For Prediction
Explanation
Machine learning is used in data science for prediction because it allows the development of models that can analyze patterns and make accurate predictions based on historical data. By training these models with known data, they can learn to recognize patterns and relationships, and then apply that knowledge to make predictions on new, unseen data. This prediction capability is valuable in various fields, such as finance, healthcare, and marketing, where accurate predictions can help in decision-making and improving outcomes.
21.
Tableau can create worksheet-specific filters.
Correct Answer
A. True
Explanation
Tableau has the capability to create filters that are specific to individual worksheets. This means that users can apply filters to a particular worksheet without affecting the data displayed in other worksheets. By using worksheet-specific filters, users can easily analyze and visualize data based on specific criteria, allowing for more focused and targeted insights. This feature enhances the flexibility and customization options available to users when working with Tableau.
22.
What is the order of execution of filters in tableau? 1) Context 2) Traditional 3) Custom 4) Show Me
Correct Answer
C. 3,1,2,4
Explanation
The order of execution of filters in Tableau is 3) Custom, 1) Context, 2) Traditional, and 4) Show Me. This means that custom filters are applied first, followed by context filters, then traditional filters, and finally the Show Me filters.
23.
Will filters work when we do data blending?
Correct Answer
A. True
Explanation
When we do data blending, filters will still work. Data blending is a technique used to combine data from multiple sources or tables into a single view. Filters are used to narrow down the data based on specific criteria. Even when data blending is performed, filters can still be applied to limit the data being displayed or analyzed. Thus, filters will continue to work effectively during data blending.
24.
Point out the correct statement:
Correct Answer
A. Machine learning focuses on prediction, based on known properties learned from the training data
Explanation
The correct answer is "Machine learning focuses on prediction, based on known properties learned from the training data." This statement accurately describes the main objective of machine learning, which is to make predictions or decisions based on patterns and relationships learned from a set of training data. Machine learning algorithms analyze the training data to identify these patterns and use them to make predictions on new, unseen data.
25.
Which of the following can be considered as random variable ?
Correct Answer
D. All of the Mentioned
Explanation
All of the mentioned options can be considered as random variables. A random variable is a variable whose value is determined by the outcome of a random event. In this case, the outcome from the roll of a die, the outcome of a flip of a coin, and the outcome of an exam are all determined by random events. Therefore, all of these options can be considered as random variables.