The Machine Learning Pipeline on AWS Quiz Questions and Answers

A company wants to boost its sales through a Machine Learning-driven solution. For this reason, the company has contracted a Machine Learning Specialist to create a recommendation system that will predict products a customer is likely to buy based on his similarity to other customers’ product preferences. What action should be taken by the ML Specialist?

Answer :
  • Use Apache Spark on Amazon EMR to create a collaborative filtering recommendation engine.

A Machine Learning Specialist has collected a large training dataset of different cat breeds. The Specialist intends to use this for building a neural network model that can identify the breed of a given cat image. The Specialist wants to take advantage of a pre-trained model for his project through transfer learning. How should the Specialist re-train the network with his training data?

Answer :
  • Initialize the network with pre-trained weights in all layers except for the output layer. Initialize the output layer with random weights.

A Machine Learning Specialist is proposing a model that detects fraudulent transactions. However, the training data’s information is not adequate because the company only has a low number of recorded fraudulent activities. Which is the most effective approach to generate fraud cases?

Answer :
  • Apply Synthetic Minority Oversampling Technique (SMOTE) on the fraudulent cases

A Machine Learning Specialist has an Amazon S3-based data lake that contains a gigabyte-size worth of training data and their associated metadata. The Specialist needs to perform ad-hoc queries on the metadata to inspect the dataset. The Specialist wants a solution that has the least amount of effort. How can the Specialist achieve his goal?

Answer :
  • Search through the metadata using Amazon Athena

A Machine Learning Specialist wishes to monitor an Amazon SageMaker instance deployed in a production environment. The Specialist wants to be able to view different performance metrics, create alarms, and automate remediation responses to changes in the traffic. Which AWS service should the Specialist use?

Answer :
  • Amazon CloudWatch

A Machine Learning Specialist is developing an image classification model in Amazon SageMaker using 25 epochs. During training, he has observed that the validation loss starts to increase from the 15th epoch onwards. This results in poor model performance with expensive and slow training times. Which method can the Specialist do to prevent this issue from happening in the future?

Answer :
  • Enable the “Early Stopping” option

A Machine Learning Specialist has trained an Apache MXNet model using Amazon SageMaker. The Specialist wants to accelerate his inference workloads without having to pay for expensive GPU-based instances. Which is the most cost-effective solution for this problem?

Answer :
  • Use Amazon Elastic Inference

A Machine Learning Specialist is designing an ETL pipeline that will read files in different formats, preprocess them, and write them back into an Amazon S3 bucket. The ML-Specialist would like to use Amazon EMR to handle this ETL task. Which framework is the most suitable for this use case?

Answer :
  • Apache Spark

A Machine Learning Specialist is examining two variables from a training dataset. The two variables are graphed in a scatter plot. The graph shows that as the X variable decreases, the Y variable decreases as well. The calculated correlation coefficient is 0.9. Which correlation does this indicate?

Answer :
  • Positive correlation

A Machine Learning Specialist has access to a collection of unprocessed data obtained from IoT devices. The Specialist needs to store this data in a centralized and highly available repository as part of an ML Pipeline. How should the Specialist implement the solution?

Answer :
  • Use Amazon S3 to create a data lake for unprocessed data