AWS Certified Data Analytics - Specialty (Big Data on AWS) Quiz Questions and Answers

A company is currently using Amazon DynamoDB as the database for a user support application. The company is developing a new version of the application that will store a PDF file for each support case ranging in size from 1–10 MB. The file should be retrievable whenever the case is accessed in the application. How can the company store the file in the most cost-effective manner?

Answer :
  • Store the file in Amazon S3 and the object key as an attribute in the DynamoDB table.

A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also helps identify outliers that need to be examined with further analysis. Which visual type in QuickSight meets the sales team’s requirements?

Answer :
  • Geospatial chart

A marketing company has data in Salesforce, MySQL, and Amazon S3. The company wants to use data from these three locations and create mobile dashboards for its users. The company is unsure how it should create the dashboards and needs a solution with the least possible customization and coding. Which solution meets these requirements?

Answer :
  • Use Amazon Redshift federated queries to join the data sources. Use Amazon QuickSight to generate the mobile dashboards.

A company has an application that uses the Amazon Kinesis Client Library (KCL) to read records from a Kinesis data stream. After a successful marketing campaign, the application experienced a significant increase in usage. As a result, a data analyst had to split some shards in the data stream. When the shards were split, the application started throwing an Expired Iterator Exceptions error sporadically. What should the data analyst do to resolve this?

Answer :
  • Increase the provisioned write capacity units assigned to the stream’s Amazon DynamoDB table.

A transport company wants to track vehicular movements by capturing geolocation records. The records are 10 B in size and up to 10,000 records are captured each second. Data transmission delays of a few minutes are acceptable, considering unreliable network conditions. The transport company decided to use Amazon Kinesis Data Streams to ingest the data. The company is looking for a reliable mechanism to send data to Kinesis Data Streams while maximizing the throughput efficiency of the Kinesis shards. Which solution will meet the company’s requirements?

Answer :
  • Kinesis Producer Library (KPL)

A Redshift database on Amazon holds sensitive user data. Logging is required to comply with regulatory obligations. Database authentication attempts, connections, and disconnections must be recorded in the logs. Additionally, the logs must include a record of each query executed against the database and the database user who executed each query. Which actions will result in the creation of the relevant logs?

Answer :
  • Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI.

A data analyst is using Amazon QuickSight for data visualization across multiple datasets generated by applications. Each application stores files within a separate Amazon S3 bucket. AWS Glue Data Catalog is used as a central catalog across all application data in Amazon S3. A new application stores its data within a separate S3 bucket. After updating the catalog to include the new application data source, the data analyst created a new Amazon QuickSight data source from an Amazon Athena table, but the import into SPICE failed. How should the data analyst resolve the issue?

Answer :
  • Edit the permissions for the new S3 bucket from within the Amazon QuickSight console.

A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few hours and read-only queries are run throughout the day and evening. There is a particularly heavy load with no writes for several hours each morning on business days. During those hours, some queries are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime. What is the MOST cost-effective solution?

Answer :
  • Enable concurrency scaling in the workload management (WLM) queue.

A company analyzes its data in an Amazon Redshift data warehouse, which currently has a cluster of three dense storage nodes. Due to a recent business acquisition, the company needs to load an additional 4 TB of user data into Amazon Redshift. The engineering team will combine all the user data and apply complex calculations that require I/O intensive resources. The company needs to adjust the cluster’s capacity to support the change in analytical and storage requirements. Which solution meets these requirements?

Answer :
  • Resize the cluster using elastic resize with dense storage nodes.

A company’s marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements: – The data size is approximately 32 TB uncompressed. – There is a low volume of single-row inserts each day. – There is a high volume of aggregation queries each day. – Multiple complex joins are performed. – The queries typically involve a small subset of the columns in a table. Which storage service will provide the MOST performant solution?

Answer :
  • Amazon Redshift