# Machine Learning with Python Quiz Questions and Answers

### Identify what type of problem this is. You have the past data of two cricket teams on the performance of the teams based on different parameters and the match results. You have to predict which team will win.

• Supervised Learning

Explanation :

You have the data of the past few years to train your model on. Since you know the results of different games based on different performance parameters, it would be a supervised learning problem — more specifically, a classification problem since your output variable (i.e. the name of the team) is categorical.

### In the simple linear regression model between TV and sales, the accuracy, or the 'model fit', as measured by R-squared was about 0.81. But, when you brought in the radio and the newspaper variables along with TV, the R-squared increased to 0.91 and 0.83, respectively. Do you think the R-squared value will always increase (or at least remain the same) when you add more variables?

• Yes

Explanation :

The R-squared will always either increase or remain the same when you add more variables. Because you already have the predictive power of the previous variable so the R-squared value can definitely not go down. And a new variable, no matter how insignifi

### A Singapore-based startup Healin launched an app called JustShakeIt that enables a user to send an emergency alert to emergency contacts and/or caregivers simply by shaking the phone with one hand. The program uses a machine learning algorithm to distinguish between actual emergency shakes and everyday jostling, using data with labels to distinguish between everyday jostling and emergency shaking. What kind of problem is this?

• Classification

Explanation :

The algorithm has to distinguish between actual emergency shakes and everyday jostling. Here, your output variable has pre-defined labels (shake/jostle), which are categorical in nature. So, this is a supervised learning - classification problem.

### The coefficients of the least squares regression line are determined by the Ordinary Least Squares method — which basically means minimising the sum of the squares of the:

• y-coordinates of actual data - y-coordinates of predicted data

Explanation :

The Ordinary Least Squares method has the criterion of the minimisation of the sum of squares of residuals. Residuals are defined as the difference between the y-coordinates of actual data and the y-coordinates of predicted data.

### In order to determine whether the coefficient in a simple linear regression model is significant or not, which Null Hypothesis do we propose?

• ß1=0

Explanation :

This is kept so because in case that the Null hypothesis is rejected, you can conclude that β1 is not zero and the coefficient is significant, but if we fail to reject the Null Hypothesis, the coefficient is deemed insignificant.

### Which of the following is indicative of a strong relationship between X and y?

• The correlation coefficient between X and y is 0.95

Explanation :

The correlation coefficient specifies how strong is the relationship between two variables. And in this case, the value is 0.95 which is quite high indicating a strong relationship between X and y.

### What will be the effect of the error terms not being homoscedastic in nature?

• The inferences made on the model would be unreliable.

Explanation :

Even if you fit a line through the data, you cannot make inferences on the model. The parameters used to make inferences (which you will study in later segments) will become highly unreliable.

### Why do you add a constant to the train set using the sm.add_constant() command, when you’re fitting a line using statsmodels?

• statsmodels fits a line passing through the origin by default.

Explanation :

By default, statsmodels fits a line passing through the origin, i.e. it doesn't fit an intercept. Hence, you need to use the command 'add_constant' so that it also fits an intercept.

### Identify what type of problem this is. You feed a large collection of spam emails to the learning model to identify the different sub-groups of these spam mails. No labels are presents in the data set.

• Unsupervised Learning

Explanation :

This can be addressed using unsupervised learning as there are no labels assigned to your data set and they need to be identified.