DP-100T01: Designing and Implementing a Data Science Solution on Azure Quiz Questions and Answers

Answer :
  • Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.

Explanation :

Specify the path. Example: The following code gets the workspace existing workspace and the desired datastore by name. And then passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ds. from azureml.core import Workspace, Datastore, Dataset datastore_name = 'your datastore name' # get existing workspace workspace = Workspace.from_config() # retrieve an existing datastore in the workspace by name datastore = Datastore.get(workspace, datastore_name) # create a TabularDataset from 3 file paths in datastore datastore_paths = [(datastore, 'weather/2018/11.csv'), (datastore, 'weather/2018/12.csv'), (datastore, 'weather/2019/*.csv')] weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
Answer :
  • No

Explanation :

Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases. Incorrect Answers: Common data tasks for the Scale and Reduce sampling mode include clipping, binning, and normalizing numerical values. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/data-transformation-scale-and-reduce
Answer :
  • resource_group
  • subscription_id

Explanation :

To use the same workspace in multiple environments, create a JSON configuration file. The configuration file saves your subscription (subscription_id), resource (resource_group), and workspace name so that it can be easily loaded. The following sample shows how to create a workspace. from azureml.core import Workspace ws = Workspace.create(name='myworkspace', subscription_id='< azure-subscription-id >', resource_group='myresourcegroup', create_resource_group=True, location='eastus2' ) Reference: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace
Answer :
  • Azure Machine Learning Service

Explanation :

The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. Caffe2 and Chainer are supported by DSVM. DSVM integrates with Azure Machine Learning. Incorrect Answers: B: Use Machine Learning Studio when you want to experiment with machine learning models quickly and easily, and the built-in machine learning algorithms are sufficient for your solutions. References: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview
Answer :
  • Split Rows with the Randomized split parameter set to true

Explanation :

Split Rows: Use this option if you just want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50. Incorrect Answers: B: Regular Expression Split: Choose this option when you want to divide your dataset by testing a single column for a value. C: Relative Expression Split: Use this option whenever you want to apply a condition to a number column. Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data