1.
What skills required in Data Science?
Correct Answer
D. All the above
Explanation
The correct answer is "All the above". Data Science requires a combination of statistics/mathematics skills, coding/hacking skills, and domain/business knowledge. Statistics and mathematics skills are essential for analyzing and interpreting data. Coding and hacking skills are necessary for programming and manipulating data. Domain and business knowledge is important for understanding the context and making informed decisions. Therefore, all of these skills are required in Data Science.
2.
Which of the following best describes the principal goal of data science?
Correct Answer
B. To mine and analyze large amounts of data to uncover information that can be leveraged for operational improvements and business gains.
Explanation
The principal goal of data science is to mine and analyze large amounts of data to uncover information that can be leveraged for operational improvements and business gains. This involves using techniques and tools to extract valuable insights from data, which can then be used to make informed decisions and drive business growth. Collecting and archiving data sets for record keeping purposes and preparing data for analysts are important steps in the data science process, but they are not the ultimate goal.
3.
Which of the following is performed by a data scientist?
Correct Answer
B. Define the question
Explanation
A data scientist is responsible for defining the question or problem that needs to be solved using data analysis. They need to understand the business objectives and formulate the right questions to guide their analysis. Creating reproducible code and challenging results are also important tasks for a data scientist, but they are not exclusive to their role. Other individuals involved in data analysis may also perform these tasks. Therefore, the correct answer is defining the question.
4.
Which of the following is the most widely used language for data science?
Correct Answer
C. Python
Explanation
Python is the most widely used language for data science due to its simplicity, versatility, and extensive libraries such as NumPy, Pandas, and Scikit-learn. These libraries provide powerful tools for data manipulation, analysis, and machine learning. Python's syntax is easy to understand and its large community of users contribute to its popularity by sharing code and resources. Its integration with other languages and frameworks also makes it a preferred choice for data scientists.
5.
Which of the following is not a step in data analysis?
Correct Answer
D. Securing data
Explanation
Securing data is not a step in data analysis because it is a separate process that focuses on protecting data from unauthorized access, disclosure, alteration, or destruction. While it is essential to secure data to maintain its integrity and confidentiality, it is not directly involved in the analysis of data. The other options, EDA (Exploratory Data Analysis), obtaining data, and cleaning data, are all crucial steps in the data analysis process. EDA involves exploring and understanding the data, obtaining data involves collecting relevant data sources, and cleaning data involves removing errors, inconsistencies, and outliers from the dataset.
6.
Which of the following step is performed by data scientist after acquiring the data?
Correct Answer
B. Data Cleansing
Explanation
After acquiring the data, one of the steps performed by a data scientist is data cleansing. This involves identifying and removing any errors, inconsistencies, or irrelevant information from the dataset. Data cleansing ensures that the data is accurate, complete, and suitable for analysis. It may involve tasks such as removing duplicate records, filling in missing values, correcting inconsistencies, and standardizing formats. By performing data cleansing, the data scientist ensures that the data is of high quality and can be effectively used for further analysis and modeling.
7.
What is the simplest class of analytics?
Correct Answer
A. Descriptive
Explanation
Descriptive analytics is the simplest class of analytics as it focuses on summarizing and interpreting historical data to provide insights into past events and trends. It involves organizing and presenting data in a way that is easy to understand, such as through charts, graphs, and reports. Descriptive analytics helps in understanding what has happened in the past, but it does not involve making predictions or prescribing actions for the future like predictive and prescriptive analytics do.
8.
Point out the correct statement:
Correct Answer
A. Raw data is original source of data
Explanation
The correct statement is that raw data is the original source of data. Raw data refers to the unprocessed and unorganized data that is collected directly from the source without any modifications or transformations. It is the initial data that is collected before any processing steps are applied to it. Preprocessed data, on the other hand, refers to the data that has undergone some form of cleaning, transformation, or organization to make it more suitable for analysis or use. Therefore, the correct answer is that raw data is the original source of data.
9.
The below image is an example of:
Correct Answer
C. Column heading
Explanation
The image provided shows the labels at the top of each column in a spreadsheet. These labels indicate the specific data or information contained within each column. Therefore, the correct answer is "Column heading."
10.
The below image is an example of
Correct Answer
D. Formula bar
Explanation
The given image is an example of a formula bar. The formula bar is a feature in spreadsheet software that displays the contents of the active cell and allows users to enter or edit formulas and data. It is typically located at the top of the spreadsheet interface and provides a convenient way to input and manipulate data in cells.
11.
Which is the most suitable chart for discrete data?
Correct Answer
A. Bar chart
Explanation
A bar chart is the most suitable chart for discrete data because it displays data in separate bars, with each bar representing a specific category or group. This allows for easy comparison between different categories or groups, making it ideal for displaying discrete data. A line chart, on the other hand, is more suitable for continuous data, where the data points are connected by lines to show trends over time.
12.
Which is the most suitable chart for continuous data?
Correct Answer
A. Line chart
Explanation
A line chart is the most suitable chart for continuous data because it shows the relationship between two continuous variables over time. It is ideal for displaying trends, patterns, and changes in data over a continuous period. In contrast, a pie chart is more suitable for displaying categorical data and comparing parts of a whole.
13.
What is the Excel feature that quickly allows us to show trend information in a single cell?
Correct Answer
C. Sparklines
Explanation
Sparklines is the correct answer because it is an Excel feature that allows us to show trend information in a single cell. Sparklines are small, condensed charts that can be inserted within a cell and provide a visual representation of data trends, such as line graphs, bar charts, or win/loss charts. They are useful for quickly analyzing and understanding data patterns without the need for creating a separate chart or graph.
14.
How can you easily perform a summary over a large detailed data set?
Correct Answer
B. Pivot table
Explanation
A pivot table is a useful tool for summarizing large detailed data sets. It allows you to quickly and easily analyze and summarize data by creating a table with rows and columns that can be rearranged and manipulated. You can easily group and aggregate data, perform calculations, and generate summaries such as totals, averages, and percentages. This makes it efficient and convenient to get an overview of the data and identify patterns or trends without having to manually sift through and analyze each individual data point.
15.
Which Excel feature can you use to ensure that users do not enter irrelevant data?
Correct Answer
C. Data validation
Explanation
Data validation is the correct answer because it is an Excel feature that allows users to set specific criteria for data entry. By using data validation, users can restrict the type of data that can be entered in a cell, such as numbers within a certain range, dates, or specific text. This helps to ensure that irrelevant or incorrect data is not entered, improving the accuracy and reliability of the data in the spreadsheet.
16.
Which of the following is the correct formula to add the values in cell A1 to A3?
Correct Answer
B. =SUM(A1:A3)
Explanation
The correct formula to add the values in cell A1 to A3 is =SUM(A1:A3). This formula uses the SUM function in Excel to add the values within the specified range, which in this case is A1 to A3. This formula is the most efficient and concise way to add multiple values in Excel.
17.
In what follows, S is the sample space of the experiment in question and E is the event of interest. n(S) is the number of elements in the sample space S and n(E) is the number of elements in the event E.
A die is rolled, find the probability that an even number is obtained.
Correct Answer
A. 1/2
Explanation
The sample space S consists of all possible outcomes when rolling a die, which are the numbers 1, 2, 3, 4, 5, and 6. The event E consists of the outcomes that are even numbers, which are 2, 4, and 6. Therefore, n(S) = 6 and n(E) = 3. The probability of event E occurring is given by n(E)/n(S) = 3/6 = 1/2.
18.
Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
Correct Answer
C. 1/4
Explanation
When two coins are tossed, there are a total of four possible outcomes: HH, HT, TH, and TT. Since we are interested in the probability of obtaining two heads, there is only one favorable outcome (HH) out of the four possible outcomes. Therefore, the probability of obtaining two heads is 1 out of 4, which can be expressed as 1/4.
19.
Which of these numbers cannot be a probability?
Correct Answer
C. 1.0001
Explanation
A probability must be a number between 0 and 1, inclusive. 1.0001 is greater than 1, which is outside the range of possible probabilities.
20.
What is the probability of the shaded sector from the spinner below?
Correct Answer
D. 2/5
Explanation
The spinner is divided into 5 equal sectors, and the shaded sector occupies 2 of these sectors. Therefore, the probability of landing on the shaded sector is 2/5.
21.
A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is drawn from the jar at random, what is the probability that this marble is white?
Correct Answer
A. 1/2
Explanation
The probability of drawing a white marble can be found by dividing the number of white marbles by the total number of marbles in the jar. In this case, there are 10 white marbles and a total of 20 marbles in the jar. Therefore, the probability of drawing a white marble is 10/20, which simplifies to 1/2.
22.
The blood groups of 200 people are distributed as follows: 50 have type A blood, 65 have B blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at random, what is the probability that this person has O blood type?
Correct Answer
C. 70/200
Explanation
The probability of selecting a person with O blood type can be calculated by dividing the number of people with O blood type (70) by the total number of people (200). Therefore, the probability is 70/200.
23.
A die is rolled, find the probability that the number obtained is greater than 4.
Correct Answer
A. 1/3
Explanation
The probability of rolling a number greater than 4 on a die can be determined by counting the favorable outcomes (numbers 5 and 6) and dividing it by the total number of possible outcomes (numbers 1 to 6). In this case, there are 2 favorable outcomes (5 and 6) out of 6 possible outcomes. Therefore, the probability is 2/6, which simplifies to 1/3.
24.
The sample space S of the experiment in question 8 is shown below:
A card is drawn at random from a deck of cards. Find the probability of getting a queen.
Correct Answer
D. 1/13
Explanation
The sample space S represents all the possible outcomes of the experiment, which is drawing a card at random from a deck of cards. The sample space consists of 52 cards. The event of interest is getting a queen, which is one out of the four queens in the deck. Therefore, the probability of getting a queen is 4/52, which simplifies to 1/13.
25.
The expected value or _______ of a random variable is the center of its distribution.
Correct Answer
C. Mean
Explanation
The expected value of a random variable is the center of its distribution. It represents the average value that the random variable is expected to take on over a large number of trials. The mean is calculated by summing up all the possible values of the random variable, each multiplied by their respective probabilities. It is a measure of central tendency and provides a measure of the typical value of the random variable.
26.
Which of the following of a random variable is a measure of spread?
Correct Answer
A. Variance
Explanation
Variance is a measure of spread because it quantifies how much the values of a random variable vary from the mean. It calculates the average of the squared differences between each value and the mean, providing a measure of the overall dispersion or spread of the data. A higher variance indicates a greater spread of values, while a lower variance indicates a more concentrated or narrow distribution. Therefore, variance is a commonly used statistical measure to understand the variability or spread of a random variable.
27.
The square root of the variance is called the ________ deviation.
Correct Answer
D. Standard
Explanation
The square root of the variance is called the standard deviation. This is a commonly used measure of the amount of variation or dispersion in a set of data. It tells us how spread out the data points are from the mean. By taking the square root of the variance, we obtain the standard deviation, which is expressed in the same units as the original data.
28.
The following questions will be based on this set of numbers:
20, 24, 25, 36, 25, 22, 23
The mode?
Correct Answer
B. 25
Explanation
The mode is the number that appears most frequently in a set of numbers. In this set, the number 25 appears twice, which is more than any other number. Therefore, the mode of this set of numbers is 25.
29.
The following questions will be based on this set of numbers:
20, 24, 25, 36, 25, 22, 23
The mean?
Correct Answer
C. 25
Explanation
The correct answer is 25. To find the mean, you add up all the numbers in the set and then divide by the total number of values. In this case, the sum of the numbers is 175. There are a total of 7 numbers in the set. So, when you divide 175 by 7, you get 25. Therefore, the mean of the set of numbers is 25.
30.
The following questions will be based on this set of numbers:
20, 24, 25, 36, 25, 22, 23
The median?
Correct Answer
D. 24
Explanation
The median is the middle value in a set of numbers when they are arranged in ascending order. In this case, when the numbers are arranged in ascending order, they become 20, 22, 23, 24, 25, 25, 36. The middle value is 24, which is the correct answer.
31.
The following questions will be based on this set of numbers:
20, 24, 25, 36, 25, 22, 23
The standard deviation?
(approximate value)
Correct Answer
D. 5.164
Explanation
The correct answer is 5.164. The standard deviation measures the amount of variation or dispersion in a set of numbers. It indicates how spread out the numbers are from the average. In this case, the standard deviation is approximately 5.164, which suggests that the numbers in the set are relatively spread out from the mean.
32.
The following questions will be based on this set of numbers:
20, 24, 25, 36, 25, 22, 23
The variance?
(approximate value)
Correct Answer
B. 26.666
Explanation
The correct answer is 26.666. To calculate the variance, we need to find the average of the numbers first. Adding up all the numbers and dividing by the total count (7) gives us an average of 25.857. Then, we subtract the average from each number, square the result, and calculate the average of these squared differences. This gives us a variance of approximately 26.666.
33.
Which of the following gave rise to need of graphs in data analysis?
Correct Answer
D. All of the above
Explanation
The need for graphs in data analysis arose due to various reasons, including data visualization, communicating results, and decision making. Data visualization helps in representing complex data in a visual format, making it easier to understand and interpret. Communicating results through graphs allows for effective presentation and sharing of information. Graphs also aid in decision making by providing a clear and concise representation of data, enabling better analysis and informed decision making. Therefore, all of the mentioned reasons contributed to the need for graphs in data analysis.
34.
Which of the following graph can be used for simple summarization of data?
Correct Answer
C. Bar Plot
Explanation
A bar plot can be used for simple summarization of data because it visually represents the frequency or count of different categories or groups. It consists of rectangular bars where the length of each bar corresponds to the quantity or value it represents. This type of graph allows for easy comparison between different categories and is particularly useful for displaying categorical data. It provides a clear and concise summary of the data by showing the distribution and relative frequencies of each category.
35.
In a Statistical data graph, a ____ is a representation of frequency distribution by means of the four-sided figure whose width represents class intervals and whose areas are directly proportional to the corresponding frequencies.
Correct Answer
B. Histogram
Explanation
A histogram is a representation of frequency distribution in a statistical data graph. It uses a four-sided figure where the width represents class intervals and the areas of the bars are directly proportional to the corresponding frequencies. This allows for a visual representation of the distribution of the data, making it easier to identify patterns and trends. A histogram is commonly used to display continuous data and is particularly useful for analyzing large data sets.
36.
Which of the following information is not given from box-plot?
Correct Answer
A. Mode
Explanation
The mode is not given from a box plot. A box plot displays the minimum, first quartile, median, third quartile, and maximum values of a dataset. The mode, however, represents the most frequently occurring value in the dataset and is not represented in a box plot. Therefore, the mode is not given from a box plot.
37.
Color and shape can be used to add dimensions to graph data.
Correct Answer
A. True
Explanation
Color and shape can be used to add dimensions to graph data. By assigning different colors and shapes to different data points, additional information can be conveyed in the graph. For example, different colors can represent different categories or groups, while different shapes can represent different variables or conditions. This helps to visually differentiate and distinguish the data points, making it easier for the viewer to interpret and analyze the graph. Therefore, the statement "Color and shape can be used to add dimensions to graph data" is true.
38.
Which of the following dimension type graph is related to table below?
Bar Plot
Box plot
Density Plot
Histogram
Correct Answer
B. Two-dimensional
Explanation
The correct answer is two-dimensional. This is because a two-dimensional graph, such as a bar plot, box plot, density plot, or histogram, is commonly used to represent data from a table. These types of graphs allow for the visualization of data in two dimensions, typically with one variable on the x-axis and another variable on the y-axis.
39.
Point out the wrong statement:
Correct Answer
A. Plot are created with multiple functions only
Explanation
The given answer is incorrect. Plots can be created with both single and multiple function calls. In fact, plots can be created using a single function call by providing the necessary arguments and data to the function. Multiple function calls may be used to add additional elements or customize the plot further, but it is not necessary to create a plot. Therefore, the correct statement is "Plots are created with both single and multiple function calls."
40.
The most heavily used summarization visualization is the ______, which measures the correlation between every pair of values in a dataset and plots a result in color.
Correct Answer
C. Correlation Plot
Explanation
A correlation plot is a type of visualization that measures the correlation between every pair of values in a dataset and represents it using colors. This plot helps in understanding the relationship between variables and identifying patterns or trends in the data. It is widely used for summarizing and analyzing large datasets to gain insights into the strength and direction of the relationships between variables.