Cloudera Data Scientist Course Overview

Enroll for our 4-day Cloudera Data Scientist training from Koenig Solutions accredited by Cloudera.  In this course you will learn enterprise data science and machine learning using Apache Spark in Cloudera Data Science Workbench (CDSW).

Through a blend of hands-on labs and interactive lectures, you will learn to use Spark SQL to load, explore, cleanse, join, and analyze data and Spark MLlib to specify, train, evaluate, tune, and deploy machine learning pipelines. They dive into the foundations of the Spark architecture and execution model necessary to effectively configure, monitor, and tune their Spark applications. Participants also learn how Spark integrates with key components of the Cloudera platform such as HDFS, YARN, Hive, Impala, and Hue as well as their favorite Python or R packages.

Target Audience:

This course is intended for Data scientists, Data engineers, data analysts, developers, and solution architects

Learning Objectives:

After completing this course, you will be able to:

  • How to use Apache Spark to run data science and machine learning workflows at scale
  • How to use Spark SQL and DataFrames to work with structured data
  • How to use MLlib, Spark’s machine learning library
  • How to use PySpark, Spark’s Python API
  • How to use sparklyr, a dplyr-compatible R interface to Spark
  • How to use Cloudera Data Science Workbench (CDSW)
  • How to use other Cloudera platform components including HDFS, Hive,Impala, and Hue

 

This is a Rare Course and it can be take up to 3 weeks to arrange the training.

The 1-on-1 Advantage

Methodology

Flexible Dates

  • • Choose Start Date
  • • Reschedule After Booking
  • • Weekend / Evening Option

4-Hour Sessions

You will learn:

Module 1: Data Science Overview
  • What Data Scientists Do
  • What Process Data Scientists Use
  • What Tools Data Scientists Use
  • How Cloudera Data Science
  • How to Use Cloudera Data Science
  • Entering Code
  • Getting Help
  • Accessing the Linux Command Line
  • Working with Python Packages
  • Formatting Session Output
  • DuoCar
  • How DuoCar Works
  • DuoCar Datasets
  • DuoCar Business Goals
  • DuoCar Data Science Platform
  • DuoCar Cloudera EDH Cluster
  • HDFS
  • Apache Spark
  • Apache Hive
  • Apache Impala
  • Hue
  • YARN
  • DuoCar Cluster Architecture
  • Apache Spark
  • How Spark Works
  • The Spark Stack
  • Spark SQL
  • DataFrames
  • File Formats in Apache Spark
  • Text File Formats
  • Parquet File Format
  • Summarizing Data with Aggregate
  • Functions
  • Grouping Data
  • Pivoting Data
  • Introduction to Window Functions
  • Creating a Window Specification
  • Aggregating over a Window Specification
  • Possible Workflows for Big Data
  • Exploring a Single Variable
  • Exploring a Categorical Variable
  • Exploring a Continuous Variable
  • Exploring a Pair of Variables
  • Categorical-Categorical Pair
  • Categorical-Continuous Pair
  • Continuous-Continuous Pair
  • DataFrame Operations
  • Input Splits
  • Narrow Operations
  • Wide Operations
  • Stages and Tasks
  • Shuffle
  • Introduction to Topic Models
  • Scenario
  • Extracting and Transforming Features
  • Parsing Text Data
  • Removing Common (Stop) Words
  • Counting the Frequency of Words
  • Specifying a Topic Model
  • Training a topic model using Latent Dirichlet Allocation (LDA)
  • Assessing the Topic Model Fit
  • Examining a Topic Model
  • Applying a Topic Model
  • Introduction to Recommender Models
  • Scenario
  • Preparing Data for a Recommender Model
  • Specifying a Recommender Model
  • Spark Interface Languages
  • PySpark
  • Data Science with PySpark
  • sparklyr
  • dplyr and sparklyr
  • Comparison of PySpark and sparklyr
  • How sparklyr Works with dplyr
  • sparklyr DataFrame and MLlib Functions
  • When to Use PySpark and sparklyr
  • Overview
  • Starting a Spark Application
  • Reading Data into a Spark SQL Data Frame
  • Examining the Schema of a Data Frame
  • Computing the Number of Rows and
  • Examining Rows of a DataFrame
  • Stopping a Spark Application
  • Overview
  • Inspecting a DataFrame
  • Inspecting a DataFrame Column
  • Inspecting a Primary Key Variable
  • Inspecting a Categorical Variable
  • Inspecting a Numerical Variable
  • Inspecting a Date and Time Variable
  • Spark SQL DataFrames
  • Working with Column
  • Selecting Column
  • Dropping Columns
  • Specifying Columns
  • Adding Columns
  • Changing the Column Name
  • Changing the Column Type
  • Monitoring Spark Applications
  • Persisting DataFrames
  • Partitioning DataFrames
  • Configuring the Spark Environment
  • Machine Learning
  • Underfitting and Overfitting
  • Model Validation
  • Hyperparameters
  • Supervised and Unsupervised Learning
  • Machine Learning Algorithms
  • Machine Learning Libraries
  • Apache Spark MLlib
  • Introduction to Regression Models
  • Scenario
  • Preparing the Regression Data
  • Assembling the Feature Vector
  • Creating a Train and Test Set
  • Specifying a Linear Regression Model
  • Training a Linear Regression Model
  • Examining the Model Parameters
  • Examining Various Model Performance Measures
  • Examining Various Model Diagnostics
  • Applying the Linear Regression Model to the Test Data
  • Evaluating the Linear Regression Model on the Test Data
  • Plotting the Linear Regression Model
  • Training a Recommender Model using Alternating Least Squares
  • Examining a Recommender Model
  • Applying a Recommender Model
  • Evaluating a Recommender Model
  • Generating Recommendations
  • Specifying Pipeline Stages
  • Specifying a Pipeline
  • Training a Pipeline Model
  • Querying a Pipeline Model
  • Applying a Pipeline Model
  • Saving and Loading Pipelines and Pipeline Models in Python
  • Loading Pipelines and Pipeline Models in Scala
  • Working with Rows
  • Ordering Rows
  • Selecting a Fixed Number of Rows
  • Selecting Distinct Rows
  • Filtering Rows
  • Sampling Rows
  • Working with Missing Values
  • Spark SQL Data Types
  • Working with Numerical Columns
  • Working with String Columns
  • Working with Date and Timestamp Columns
  • Working with Boolean Columns
  • Complex Collection Data Types
  • Arrays
  • Maps
  • Structs
  • User-Defined Functions
  • Defining a Python Function
  • Registering a Python Function as a
  • User-Defined Function
  • Applying a User-Defined Function
  • Reading and Writing Data
  • Working with Delimited Text Files
  • Working with Text Files
  • Working with Parquet Files
  • Working with Hive Tables
  • Working with Object Stores
  • Working with pandas DataFrames
  • Joining DataFrames
  • Cross Join
  • Inner Join
  • Left Semi Join
  • Left Anti Join
  • Left Outer Join
  • Right Outer Join
  • Full Outer Join
  • Applying Set Operations to
  • DataFrames
  • Splitting a DataFrame
  • Introduction to Classification Models
  • Scenario
  • Preprocessing the Modeling Data
  • Generate a Label
  • Extract, Transform, And Select Features
  • Create Train and Test Sets
  • Specify A Logistic Regression Model
  • Train the Logistic Regression Model
  • Examine the Logistic Regression Model
  • Evaluate Model Performance on the Test Set
  • Requirements for Hyperparameter Tuning
  • Specifying the Estimator
  • Specifying the Hyperparameter Grid
  • Specifying the Evaluator
  • Tuning Hyperparameters using Holdout Cross-validation
  • Tuning Hyperparameters using K-fold Cross-validation
  • Introduction to Clustering
  • Scenario
  • Preprocessing the Data
  • Extracting, Transforming, and Selecting Features
  • Specifying a Gaussian Mixture Model
  • Training a Gaussian Mixture Model
  • Examining the Gaussian Mixture Model
  • Plotting the Clusters
  • Exploring the Cluster Profiles
  • Saving and Loading the Gaussian
  • Mixture Model
  • Connecting to Spark
  • Reading Data
  • Inspecting Data
  • Transforming Data Using dplyr Verbs
  • Using SQL Queries
  • Spark DataFrames Functions
  • Visualizing Data from Spark
  • Machine Learning with MLlib
  • Collaboration
  • Jobs
  • Experiments
  • Models
  • Applications
Live Online Training (Duration : 32 Hours) Fee On Request
We Offer :
  • 1-on-1 Public - Select your own start date. Other students can be merged.
  • 1-on-1 Private - Select your own start date. You will be the only student in the class.

4 Hours
8 Hours
Week Days
Weekend

Start Time : At any time

12 AM
12 PM

1-On-1 Training is Guaranteed to Run (GTR)
Group Training
Date On Request
Course Prerequisites

Participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.

 

Request More Information

Add Name and Email Address of participant (If different from you)

FAQ's


In both, you choose the schedule. In public, other participants can join, Private other participants want to join.
Yes, course requiring practical include hands-on labs.
You can buy online from the page by clicking on "Buy Now". You can view alternate payment method on payment options page.
Yes, you can pay from the course page and flexi page.
Yes, the site is secure by utilizing Secure Sockets Layer (SSL) Technology. SSL technology enables the encryption of sensitive information during online transactions. We use the highest assurance SSL/TLS certificate, which ensures that no unauthorized person can get to your sensitive payment data over the web.
We use the best standards in Internet security. Any data retained is not shared with third parties.
You can request a refund if you do not wish to enroll in the course.
To receive an acknowledgment of your online payment, you should have a valid email address. At the point when you enter your name, Visa, and other data, you have the option of entering your email address. Would it be a good idea for you to decide to enter your email address, confirmation of your payment will be emailed to you.
After you submit your payment, you will land on the payment confirmation screen.It contains your payment confirmation message. You will likewise get a confirmation email after your transaction is submitted.
We do accept all major credit cards from Visa, Mastercard, American Express, and Discover.
Credit card transactions normally take 48 hours to settle. Approval is given right away; however,it takes 48 hours for the money to be moved.
Yes, we do accept partial payments, you may use one payment method for part of the transaction and another payment method for other parts of the transaction.
Yes, if we have an office in your city.
Yes, we do offer corporate training More details
Yes, we do.
Yes, we also offer weekend classes.
Yes, Koenig follows a BYOL(Bring Your Own Laptop) policy.
It is recommended but not mandatory. Being acquainted with the basic course material will enable you and the trainer to move at a desired pace during classes.You can access courseware for most vendors.
Buy-Now. Pay-Later option is available using credit card in USA and India only.
You will receive the digital certificate post training completion via learning enhancement tool after registration.
Yes you can.
Yes, we do. For details go to flexi
You can pay through debit/credit card or bank wire transfer.
Dubai, Goa, Delhi, Bangalore.
Yes you can request your customer experience manager for the same.
Yes, fee excludes local taxes.
Yes, we do.
The Fee includes:
  • Courseware
Yes, Koenig Solutions is a Cloudera Learning Partner
Schedule for Group Training is decided by Koenig. Schedule for 1-on-1 is decided by you.
In 1-on-1 you can select your own schedule, other students can be merged but you select the schedule. Choose 1-on-1 if published schedule do not meet your requirement. If you also want a private session, opt for 1-on-1 Public.
Yes.
No, it is not included.

Prices & Payments

Yes of course.
Yes, We are

Travel and Visa

Yes we do after your registration for course.

Food and Beverages

Yes.

Others

Says our CEO-
“It is an interesting story and dates back half a century. My father started a manufacturing business in India in the 1960's for import substitute electromechanical components such as microswitches. German and Japanese goods were held in high esteem so he named his company Essen Deinki (Essen is a well known industrial town in Germany and Deinki is Japanese for electric company). His products were very good quality and the fact that they sounded German and Japanese also helped. He did quite well. In 1970s he branched out into electronic products and again looked for a German name. This time he chose Koenig, and Koenig Electronics was born. In 1990s after graduating from college I was looking for a name for my company and Koenig Solutions sounded just right. Initially we had marketed under the brand of Digital Equipment Corporation but DEC went out of business and we switched to the Koenig name. Koenig is difficult to pronounce and marketeers said it is not a good choice for a B2C brand. But it has proven lucky for us.” – Says Rohit Aggarwal (Founder and CEO - Koenig Solutions)
All our trainers are fluent in English . Majority of our customers are from outside India and our trainers speak in a neutral accent which is easily understandable by students from all nationalities. Our money back guarantee also stands for accent of the trainer.
Medical services in India are at par with the world and are a fraction of costs in Europe and USA. A number of our students have scheduled cosmetic, dental and ocular procedures during their stay in India. We can provide advice about this, on request.
Yes, if you send 4 participants, we can offer an exclusive training for them which can be started from Any Date™ suitable for you.