1.
For Pearson’s correlation, if X increases Y increases, and when X decreases Y you don’t know. Pearson’s r should be close to which of the below values?
Correct Answer
C. R=0
Explanation
Pearson's correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. In this case, if X increases and Y increases, it suggests a positive linear relationship. However, when X decreases and Y, we don't have enough information to determine the relationship. Therefore, Pearson's r should be close to 0, indicating a weak or no linear relationship between the variables.
2.
Suppose all salaries in a company are normally distributed, with a mean of $70,000 and a standard deviation of $10,000. If all salaries are doubled. What is the new mean and standard deviation?
Correct Answer
A. Mean = 140,000 and std = $20,000
Explanation
If all salaries in a company are doubled, it will affect both the mean and the standard deviation. Let's calculate the new mean and standard deviation:
Original Mean (μ) = $70,000
Original Standard Deviation (σ) = $10,000
When all salaries are doubled, the new mean (μ') will be:
New Mean (μ') = 2 * Original Mean
New Mean (μ') = 2 * $70,000 = $140,000
The new standard deviation (σ') will also be affected. When salaries are multiplied by a constant (in this case, 2), the standard deviation is also multiplied by that constant. So:
New Standard Deviation (σ') = 2 * Original Standard Deviation
New Standard Deviation (σ') = 2 * $10,000 = $20,000
So, after doubling all the salaries, the new mean is $140,000, and the new standard deviation is $20,000.
3.
Consider a ball that is kicked by a mean of 10 feet in the right direction and with a standard deviation of 1 foot, it is then kicked back in the opposite direction towards where it was started by 5 feet but with a standard deviation of 0.5. What are the mean and standard deviation of this new Gaussian distribution of the distance?
Correct Answer
D. Mean = 5, std= 1.118
Explanation
The mean of the new Gaussian distribution is 5 because the ball is kicked back 5 feet towards where it was started. The standard deviation of the new Gaussian distribution is 1.118 because when the ball is kicked back, the standard deviation is added to the original standard deviation. Therefore, the standard deviation becomes the square root of (1^2 + 0.5^2) = 1.118.
4.
Covariance indicates the strength of the linear relationship between variables.
Correct Answer
B. False
Explanation
The explanation for the given answer, False, is that covariance measures the extent to which two variables vary together, but it does not indicate the strength of the linear relationship between them. Covariance can be positive, negative, or zero, indicating the direction of the relationship, but it does not provide information about the strength or magnitude of the relationship. To measure the strength of the linear relationship between variables, one should use the correlation coefficient.
5.
Correlation measures both the strength and direction of the non-linear relationship between two variables.
Correct Answer
B. False
Explanation
Correlation measures the strength and direction of the linear relationship between two variables, not the non-linear relationship.
6.
IQ is distributed with a mean of 100 and a variance of 225. What is the standard score for IQ of 130?
Correct Answer
A. 2
Explanation
The standard score, also known as the z-score, measures how many standard deviations an individual's IQ score is above or below the mean. To calculate the z-score, we subtract the mean from the IQ score and divide it by the standard deviation. In this case, the standard deviation is the square root of the variance, which is 15. Therefore, the z-score for an IQ of 130 would be (130-100)/15 = 2.
7.
Is the below relationship Linear and Exact?
Correct Answer
B. False
Explanation
The given question is asking whether the relationship is linear and exact. The answer is False. This means that the relationship is either non-linear or it is not exact. In a linear relationship, there is a constant rate of change between the variables, while an exact relationship means that there is no error or uncertainty in the relationship. Therefore, if the answer is False, it indicates that the relationship is either non-linear or there is some degree of error or uncertainty present.
8.
Given a Summer/Winter classification problem:
Winter is 165 days and Summer is 200 days. The temperature is uniformly distributed between 5 - 25 degrees in Winter and 22 - 24 in Summer. What is the classification of the day that temperature is 23 degrees?
Correct Answer
A. Summer
Explanation
To determine the classification of a day with a temperature of 23 degrees, we need to consider the temperature ranges for both Winter and Summer.
In Winter, the temperature ranges from 5 to 25 degrees.
In Summer, the temperature ranges from 22 to 24 degrees.
Since 23 degrees falls within the range of 22 to 24 degrees, the temperature of 23 degrees is within the Summer temperature range. Therefore, the classification of the day with a temperature of 23 degrees is "Summer."
9.
You are given a revolver with six slots. There are two adjacent bullets. You have to shoot twice and are given the chance to rotate the cylinder randomly in-between. How do you maximize your chance of survival?
Correct Answer
B. Do not rotate cylinder
Explanation
By not rotating the cylinder, you ensure that the position of the bullets remains the same. This means that when you shoot the first time, you have a 1 in 6 chance of hitting a bullet. However, since the second shot is also required, the probability of hitting a bullet on the second shot is also 1 in 6. By not rotating the cylinder, you maintain this probability throughout both shots, giving you the maximum chance of survival.
10.
Set S consists of the numbers 4, 10, 12, 7, 19, 10, 5, and x. For what value of x will the mode, the median, and the mean all be equal?
Correct Answer
A. 13
Explanation
To find the value of x that will make the mode, median, and mean all equal, we need to first find the mode, median, and mean of the given set S. The mode is the number that appears most frequently in the set, which is 10. The median is the middle number when the set is arranged in ascending order, which is also 10. The mean is the average of all the numbers in the set, which can be found by summing all the numbers and dividing by the total count. Since the sum of the given numbers is 77, and there are 9 numbers in total, the mean is 77/9 = 8.56 (rounded to two decimal places). Therefore, the value of x that will make the mode, median, and mean all equal is 13, as it will make the mean equal to 10.
11.
If the variance of a dataset is 50 and all data points are increased by 100% then what will be the variance?
Correct Answer
C. 200
Explanation
When all data points in a dataset are increased by 100%, it means that each data point is doubled. This results in a new dataset with values that are twice as large as the original dataset. Since variance is a measure of how spread out the data points are from the mean, doubling all the values will also double the spread. Therefore, the new variance will be 200, which is twice the original variance of 50.
12.
If you have a dataset with n observations and mean m. What will be the new mean if you add 5 to each data point?
Correct Answer
B. M + 5
Explanation
Adding 5 to each data point will increase the value of each observation by 5. Since the mean is calculated by summing up all the observations and dividing by the number of observations, adding 5 to each observation will increase the sum of all the observations by 5 multiplied by the number of observations. Dividing this new sum by the number of observations will give us the new mean, which is m + 5.
13.
Given the following distribution
Which of the following statements is true?
Correct Answer
C. Mode < Median < Mean
Explanation
The mode is the value that appears most frequently in a distribution. The median is the middle value when the data is arranged in ascending or descending order. The mean is the average of all the values in the distribution. In this case, the mode is less than the median and the median is less than the mean. Therefore, the correct statement is "Mode < Median < Mean."
14.
What is the number of observations in a dataset with variance 5 if the sum of squared distances from the mean is 20?
Correct Answer
A. 4
Explanation
The sum of squared distances from the mean is a measure of the variance of a dataset. In this case, the variance is given as 5 and the sum of squared distances is given as 20. The formula for variance is the sum of squared distances divided by the number of observations. So, if we let the number of observations be x, we can set up the equation 20/x = 5. Solving for x, we find that x = 4. Therefore, the number of observations in the dataset is 4.
15.
Rank the below correlation coefficient from lowest to highest coefficient.
Correct Answer
A. B > A > C > D
Explanation
The given answer is B > A > C > D. This means that the correlation coefficient for B is the highest, followed by A, then C, and finally D. The ranking is based on the strength of the correlation between the variables being compared. B has the strongest correlation, A has a weaker correlation than B but stronger than C, and C has a weaker correlation than A but stronger than D. D has the lowest correlation coefficient among all the options.
16.
Given that we have a probability of rain = 0.2 on a given day. What is the probability of having rain at least 2 days during the week?
Correct Answer
C. 0.42
Explanation
The probability of having rain at least 2 days during the week can be calculated by finding the probability of having rain on exactly 2 days, 3 days, 4 days, 5 days, and 6 days, and then adding them together. Since the probability of rain on any given day is 0.2, the probability of having rain on exactly 2 days is (0.2)^(2) * (0.8)^(5), the probability of having rain on exactly 3 days is (0.2)^(3) * (0.8)^(4), and so on. After calculating these probabilities and adding them together, the result is 0.42.
17.
Which statistical measurement is affected by outliers the most?
Correct Answer
B. Mean
Explanation
The mean, or average, is calculated by summing all values in a dataset and dividing by the number of values. Outliers, which are extreme values that deviate significantly from the rest of the data, can disproportionately influence the sum, thus pulling the mean towards their extreme value.
18.
You have n numbers that must sum to 10. How many degrees of freedom are there?
Correct Answer
C. N - 1
Explanation
When you have n numbers that must sum to 10, there is only one constraint - the sum of the numbers must be 10. This means that you have n-1 degrees of freedom, as you can freely choose the values of n-1 numbers and the value of the nth number will be determined by the constraint of the sum.
19.
Subtracting two Gaussian Distributions results in:
Correct Answer
B. Mean is subtracted and variance is added
Explanation
When subtracting two Gaussian distributions, the mean of the resulting distribution is obtained by subtracting the mean of the second distribution from the mean of the first distribution. This is because the mean represents the central tendency of the data. On the other hand, the variance of the resulting distribution is obtained by adding the variances of the two distributions being subtracted. This is because when subtracting random variables, the variances add up. Therefore, the correct answer is that the mean is subtracted and the variance is added.
20.
A bag contains 8 red marbles, 4 blue marbles, and 5 green marbles. If you randomly draw 2 marbles from the bag without replacement, what is the probability that both marbles are red?
Correct Answer
D. 7/34