AWS Certified Data Analytics - Specialty (Big Data on AWS) Quiz Questions and Answers

A company is currently using Amazon DynamoDB as the database for a user support application. The company is developing a new version of the application that will store a PDF file for each support case ranging in size from 1–10 MB. The file should be retrievable whenever the case is accessed in the application. How can the company store the file in the most cost-effective manner?

Answer :
  • Store the file in Amazon S3 and the object key as an attribute in the DynamoDB table.

A marketing firm wishes to enhance its business intelligence and reporting capabilities. During the planning phase, the organization conducted interviews with key stakeholders and learned the following: ✑ Hourly reports for the current months data are generated by the operations team. ✑ The sales team wants to use numerous Amazon QuickSight dashboards to provide a rolling view of the last 30 days per category. Additionally, the sales team wants immediate access to the data as it reaches the reporting backend. ✑ The finance team runs reports daily for the previous month's data and once a month for the previous 24 months' data. Currently, the system has 400 TB of data, with a projected monthly addition of 100 TB. The organization is seeking the most cost-effective option imaginable. Which option best fulfills the needs of the business?

Answer :
  • Store the last 2 months of data in Amazon Redshift and the rest of the months in Amazon S3. Set up an external schema and table for Amazon Redshift Spectrum. Configure Amazon QuickSight with Amazon Redshift as the data source.

A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also helps identify outliers that need to be examined with further analysis. Which visual type in QuickSight meets the sales team’s requirements?

Answer :
  • Geospatial chart

A company ingests a large set of clickstream data in nested JSON format from different sources and stores it in Amazon S3. Data Analysts need to analyze this data in combination with data stored in an Amazon Redshift cluster. Data Analysts want to build a cost-effective and automated solution for this need. Which solution meets these requirements?

Answer :
  • Use the Relational class in an AWS Glue ETL job to transform the data and write the data back to Amazon S3. Use Amazon Redshift Spectrum to create external tables and join with the internal tables.

A Publisher website captures user activity and sends clickstream data to Amazon Kinesis Data Streams. The Publisher wants to design a cost-effective solution to process the data to create a timeline of user activity within a session. The solution must be able to scale depending on the number of active sessions. Which solution meets these requirements?

Answer :
  • Include a session identifier in the clickstream data from the Publisher website and use as the partition key for the stream. Use the Kinesis Client Library (KCL) in the consumer application to retrieve the data from the stream and perform the processing. Deploy the consumer application on Amazon EC2 instances in an EC2 Auto Scaling group. Use an AWS Lambda function to reshard the stream based upon Amazon CloudWatch alarms.

An online retail company wants to perform analytics on data in large Amazon S3 objects using Amazon EMR. An Apache Spark job repeatedly queries the same data to populate an analytics dashboard. The Analytics team wants to minimize the time to load the data and create the dashboard. Which approaches could improve the performance? (Select two)

Answer :
  • Load the data into Spark DataFrames.
  • Use Amazon S3 Select to retrieve the data necessary for the dashboards from the S3 objects.

A real estate company is receiving new property listing data from its agents through .csv files every day and storing these files in Amazon S3. The Data Analytics team created an Amazon QuickSight visualization report that uses a dataset imported from the S3 files. The Data Analytics team wants the visualization report to reflect the current data up to the previous day. How can a Data Analyst meet these requirements?

Answer :
  • Schedule the dataset to refresh daily.

A financial company uses Amazon EMR for its analytics workloads. During the company’s annual security audit, the Security team determined that none of the EMR clusters’ root volumes are encrypted. The Security team recommends the company encrypt its EMR clusters’ root volume as soon as possible. Which solution would meet these requirements?

Answer :
  • Specify local disk encryption in a security configuration. Re-create the cluster using the newly created security configuration.

A business has an application that reads records from a Kinesis data stream using the Amazon Kinesis Client Library (KCL). The application saw a considerable rise in use after a successful marketing effort. As a consequence, a data analyst was forced to separate certain data shards. When the shards were divided, the program began intermittently issuing ExpiredIteratorExceptions. What is the data analyst's role in resolving this?

Answer :
  • Increase the provisioned write capacity units assigned to the stream's Amazon DynamoDB table.

A mortgage firm maintains a microservice for payment acceptance. This microservice encrypts sensitive data before it is written to DynamoDB using the Amazon DynamoDB encryption client and AWS KMS controlled keys. Finance should be able to import this data into Amazon Redshift and aggregate the information contained inside the sensitive fields. Other data analysts from other business divisions share the Amazon Redshift cluster. Which actions should a data analyst take to effectively and safely do this task?

Answer :
  • Create an AWS Lambda function to process the DynamoDB stream. Save the output to a restricted S3 bucket for the finance team. Create a finance table in Amazon Redshift that is accessible to the finance team only. Use the COPY command with the IAM role that has access to the KMS key to load the data from S3 to the finance table.