# Data Science and Machine Learning: Mathematical and Statistical Methods Quiz Questions and Answers

### {1,2,3,3,4,5,5,5,6,6,7} What is the 3rd quartile of this set?

• 6

Explanation :

This first step is to find the median which is 5 since it is the middle number of the 11 number set. To find the 3rd quartile, you find the middle number of the set of numbers above the median. For this set those numbers would be {5,5,6,6,7}. The

### Which of the following options is/are true for K-fold cross-validation? 1. Increase in K will result in higher time required to cross validate the result. 2. Higher values of K will result in higher confidence on the cross-validation result as compared

• 1,2 and 3

Explanation :

Larger k value means less bias towards overestimating the true expected error (as training folds will be closer to the total dataset) and higher running time (as you are getting closer to the limit case: Leave-One-Out CV). We also need to consider the var

### For which of the following hyperparameters, higher value is better for decision tree algorithm? 1. Number of samples used for split 2. Depth of tree 3. Samples for leaf

• Can’t say

Explanation :

For all three options A, B and C, it is not necessary that if you increase the value of parameter the performance may increase. For example, if we have a very high value of depth of tree, the resulting tree may overfit the data, and would not generalize w

### Which of these is not a tool to describe variation in product units?

• Acceptance Sampling

Explanation :

The box plot, the histogram, and the Stem-and-Leaf plot, all are used to illustrate variation among the product units. But, Acceptance sampling cannot be used to describe variation as; it is not a variation describing tool.

### Skewness of Normal distribution is ___________.

• 0

Explanation :

Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.

### Given the following set of data, what is twice the interquartile range? 25,32,49,21,37,43,27,45,31

• 36

Explanation :

Order the data from least to greatest Find the median Calculate the median of both the lower and upper half of the data The IQR is the difference between the upper and lower medians

### Normal Distribution is applied for ___________.

• Continuous Random Distribution

Explanation :

This is the rule on which Normal distribution is defined, no details on this as of why For more knowledge on this aspect, you can refer to any book or website which speaks on the same.

### Which of the following hyper parameter(s), when increased may cause random forest to over fit the data? 1. Number of Trees 2. Depth of Tree 3. Learning Rate

• B

Explanation :

Usually, if we increase the depth of tree it will cause overfitting. Learning rate is not an hyperparameter in random forest. Increase in the number of tree will cause under fitting.