Answer :
- Load the data into Spark DataFrames.
- Use Amazon S3 Select to retrieve the data necessary for the dashboards from the S3 objects.
Explanation :
One of the speed advantages of Apache Spark comes from loading data into immutable dataframes, which can be accessed repeatedly in memory. Spark DataFrames organizes distributed data into columns. This makes summaries and aggregates much quicker to calculate. Also, instead of loading an entire large Amazon S3 object, load only what is needed using Amazon S3 Select. Keeping the data in S3 avoids loading the large dataset into HDFS.