1.
Point out the correct statement:
Correct Answer
A. Raw data is original source of data
Explanation
The correct statement is that raw data is the original source of data. Raw data refers to the unprocessed and unorganized information that is collected directly from the source. It has not undergone any manipulation or analysis. Preprocessed data, on the other hand, refers to the data that has been cleaned, transformed, and organized for further analysis. Therefore, the answer "Raw data is original source of data" is the correct statement.
2.
Which of the following is performed by Data Scientist ?
Correct Answer
D. All of the Mentioned
Explanation
Data scientists perform all of the mentioned tasks. They define the question or problem they are trying to solve, create reproducible code to analyze and manipulate data, and challenge the results to ensure accuracy and reliability. By doing all of these tasks, data scientists are able to extract insights and make data-driven decisions.
3.
Point out the wrong statement:
Correct Answer
B. Data visualization is the organization of information according to preset specifications
Explanation
The correct answer is "Data visualization is the organization of information according to preset specifications." This statement is incorrect because data visualization is the representation of data in graphical or visual format to provide insights and communicate patterns or trends in the data, not the organization of information according to preset specifications.
4.
Which of the following approach should be used to ask Data Analysis question ?
Correct Answer
B. Find out the question which is to be answered
Explanation
The correct approach to ask a Data Analysis question is to first identify the question that needs to be answered. This involves understanding the problem at hand and determining what specific information or insights are required from the dataset. Once the question is clearly defined, appropriate analysis techniques can be applied to find the answer. The other options mentioned, such as finding only one solution or directly extracting the answer from the dataset without asking a question, do not align with the systematic approach of data analysis.
5.
Which of the following is one of the key data science skill ?
Correct Answer
D. All of the Mentioned
Explanation
All of the mentioned options are key data science skills. Statistics is essential for analyzing and interpreting data, Machine Learning is crucial for building predictive models and making data-driven decisions, and Data Visualization is important for effectively communicating insights and patterns from data. Therefore, all of these skills are fundamental in the field of data science.
6.
Which of the following is most important language for Data Science ?
Correct Answer
C. R
Explanation
R is the most important language for Data Science because it is specifically designed for statistical analysis and data manipulation. It has a wide range of packages and libraries that make it easy to perform complex data analysis tasks. R also has a large and active community of users, which means there is a wealth of resources and support available for those working in Data Science. Additionally, R integrates well with other programming languages and tools commonly used in Data Science, making it a versatile and powerful language for this field.
7.
A salesman offers you a choice of three boxes, one containing a million dollars and two containing fifty dollars and tells you to pick one. He then shows you fifty dollars in one of the other two boxes and asks you if you want to change your choice to the remaining box that you have neither picked nor seen inside. What do you do?
Correct Answer
A. Change to the other box
Explanation
The correct answer is to change to the other box. This is known as the Monty Hall problem. Initially, there is a 1/3 chance of picking the box with a million dollars, and a 2/3 chance of picking one with fifty dollars. When the salesman reveals one of the boxes with fifty dollars, the probability of the remaining unopened box containing a million dollars increases to 2/3. Therefore, it is advantageous to switch your choice to the other box.
8.
Which of the following is preferred for text analytics ?
Correct Answer
A. R
Explanation
R is preferred for text analytics because it has a wide range of packages and libraries specifically designed for natural language processing and text mining tasks. These packages provide various functionalities such as tokenization, stemming, sentiment analysis, and topic modeling. R also has robust visualization capabilities, making it easier to analyze and interpret textual data. Additionally, R has a strong community support and a vast number of resources available online, making it a popular choice for text analytics tasks.
9.
______ is simplest class of analytics:
Correct Answer
A. Descriptive
Explanation
Descriptive analytics is the simplest class of analytics because it focuses on analyzing historical data to understand what has happened in the past. It involves summarizing and interpreting data to gain insights and identify patterns and trends. Descriptive analytics does not involve making predictions or prescribing actions for the future, unlike predictive and prescriptive analytics. Instead, it provides a foundation for further analysis and decision-making by providing a clear understanding of past events and their implications.
10.
Your company is attempting to build a Big Data environment. The vendors you are working with tell you that an additional $1m of capital expenditure is needed on top of the $10m made so far. You are worried that the existing environment will not provide all the capability you need, however. Do you:
Correct Answer
B. Pause work while you consider what would be needed to gain the extra capability you need
Explanation
Pausing work while considering what would be needed to gain the extra capability is the most logical choice in this situation. The concern about the existing environment not providing all the necessary capability indicates that further evaluation and planning are required before making a decision. By pausing work, the company can assess the feasibility of meeting their requirements with the additional $1m investment and determine if any adjustments or changes need to be made to ensure the success of the Big Data environment project.
11.
You are operating a public health screening post at an airport and 200 people with a disease are identified. Three quarters of these are young, and two-thirds of all young people are diseased. There are as many non-diseased old people as there are young people in total. You now screen a new previously unseen individual – what is the chance they are old?
Correct Answer
C. 55%
Explanation
Based on the information given, it is stated that there are as many non-diseased old people as there are young people in total. Since three quarters of the identified diseased individuals are young, it can be inferred that the remaining one quarter of diseased individuals are old. Therefore, the chance that the new unseen individual is old is 25% + 25% = 50%. However, since the options provided do not include this percentage, the closest option is 55%.
12.
Data by itself is not useful unless:
Correct Answer
B. It is processed to obtain information
Explanation
Data by itself is raw and unorganized information. In order to derive any meaningful insights or make informed decisions, the data needs to be processed and analyzed to extract valuable information. Processing the data involves organizing, cleaning, and transforming it into a more structured format. This allows for the identification of patterns, trends, and relationships within the data, enabling the generation of useful information that can be used for various purposes. Therefore, processing the data is essential to make it useful and meaningful.
13.
For taking decisions data must be:
Correct Answer
C. Processed correctly
Explanation
To make informed decisions, it is crucial that the data is processed correctly. Processing data correctly involves ensuring that it is organized, cleaned, and transformed in a way that eliminates errors and inconsistencies. By processing data correctly, one can derive meaningful insights and make accurate conclusions. Without proper processing, the data may be unreliable and lead to incorrect decisions. Accuracy, massiveness, and diverse sources are important aspects, but processing the data correctly is the key to utilizing these factors effectively.
14.
Point out the correct statement :
Correct Answer
B. Hadoop stores data in HDFS and supports data compression/decompression
Explanation
Hadoop stores data in HDFS and supports data compression/decompression. This means that Hadoop has the capability to store large volumes of data in its distributed file system (HDFS) and also provides the functionality to compress and decompress the data. This feature is important in big data processing as it helps in reducing storage space and improving data processing efficiency.