We use cookies to give you the best online experience. By using our website you agree to our use of cookies in accordance with our cookie policy.

Please sign in to see the pricing and purchase.
Spark Fundamentals WA2490 - Course Book product photo Front View EL

Delivery Information:

You will receive required software set up for install 48 hours from time of purchase. 

Version 1.0

Product Type: Courseware
Level: Foundation
Duration: 3 Days

Participants should have the general knowledge of programming as well as experience working in Unix-like environments (e.g. running shell commands, etc.)

Language: English (en-US)
Delivery Format: eBook

Delivery Information

Delivered as a voucher. You can access the vouchers and assign them from Active Vouchers on myLeapest or you can use Classes function to assign the vouchers to a group of learners.

Product Content

This product contains the following items. Upon purchasing, you will get access to all available version prior to the latest version.

Course Description :

Delivery Information:

You will receive required software set up for install 48 hours from time of purchase. 

Version 1.0

Course Outline :

Chapter 1.

  • Introduction to Functional Programming
  • What is Functional Programming (FP)?
  • Terminology: First-Class and Higher-Order Functions
  • Terminology: Lambda vs Closure
  • A Short List of Languages that Support FPFP with Java
  • FP With JavaScript
  • Imperative Programming in JavaScript
  • The JavaScript map (FP) Example
  • The JavaScript reduce (FP) Example
  • Using reduce to Flatten an Array of Arrays (FP) Example
  • The JavaScript filter (FP) Example
  • Common High-Order Functions in Python
  • Common High-Order Functions in Scala
  • Elements of FP in R
  • Summary

Chapter 2.

  • Introduction to Apache Spark
  • What is SparkA Short History of Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Applications
  • Spark Shell
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • The Executor and Worker Processes
  • The Spark Application Architecture
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduceSpark as an Alternative to Apache Tez
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)
  • Spark SQLExample of Spark SQL
  • Spark Machine Learning Library
  • GraphXSpark vs RSummary

Chapter 3.

  • Hadoop Distributed File System Overview
  • Hadoop Distributed File System (HDFS)
  • HDFS High Availability
  • HDFS 'Fine Print'Storing Raw Data in HDFS
  • Hadoop Security
  • HDFS Rack-awareness
  • Data Blocks
  • Data Block Replication Example
  • HDFS Name
  • Node Directory Diagram
  • Accessing HDFS
  • Examples of HDFS Commands
  • Other Supported File Systems
  • WebHDFS
  • Examples of WebHDFS Calls
  • Client Interactions with HDFS for the Read Operation
  • Read Operation Sequence Diagram
  • Client Interactions with HDFS for the Write Operation
  • Communication inside HDFS
  • Summary

Chapter 4.

  • The Spark Shell
  • The Spark Shell
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • The Spark Context (sc) and SQL Context (sqlContext)
  • The Shell Spark Context
  • Loading Files
  • Saving Files
  • Basic Spark ETL Operations
  • Summary

Chapter 5.

  • Spark RDDs
  • The Resilient Distributed Dataset (RDD)
  • Ways to Create an RDD
  • Custom RDDs
  • Supported Data Types
  • RDD Operations
  • RDDs are Immutable
  • Spark ActionsRDD Transformations
  • Other RDD Operations
  • Chaining RDD Operations
  • RDD LineageThe Big Picture
  • What May Go Wrong
  • Checkpointing RDDsLocal Checkpointing
  • Parallelized Collections
  • More on parallelize() Method
  • The Pair RDD
  • Where do I use Pair RDDs?
  • Example of Creating a Pair RDD with Map
  • Example of Creating a Pair RDD with keyBy
  • Miscellaneous Pair RDD Operations
  • RDD Caching
  • RDD Persistence
  • The Tachyon Storage
  • Summary

Chapter 6.

  • Shared Variables in Spark
  • Shared Variables in Spark
  • Broadcast Variables
  • Creating and Using Broadcast Variables
  • Example of Using Broadcast Variables
  • Accumulators
  • Creating and Using Accumulators
  • Example of Using Accumulators
  • Custom Accumulators
  • Summary

Chapter 7.

  • Parallel Data Processing with Spark
  • Running Spark on a Cluster
  • Spark Stand-alone Option
  • The High-Level Execution Flow in Stand-alone Spark Cluster
  • Data Partitioning
  • Data Partitioning Diagram
  • Single Local File System RDD Partitioning
  • Multiple File RDD Partitioning
  • Special Cases for Small-sized Files
  • Parallel Data Processing of Partitions
  • Spark Application, Jobs, and Tasks
  • Stages and Shuffles
  • The 'Big Picture'
  • Summary

Chapter 8.

  • Introduction to Spark SQL
  • What is Spark SQL?
  • Uniform Data Access with Spark SQL
  • Hive Integration
  • Hive Interface
  • Integration with BI Tools
  • Spark SQL is No Longer Experimental Developer API!
  • What is a DataFrame?
  • The SQLContext Object
  • The SQLContext API
  • Changes Between Spark SQL 1.3 to 1.4
  • Example of Spark SQL (Scala Example)
  • Example of Working with a JSON File
  • Example of Working with a Parquet File
  • Using JDBC Sources
  • JDBC Connection Example
  • Performance & Scalability of Spark SQL
  • Summary

Chapter 9.

  • Graph Processing with GraphX
  • What is GraphX?
  • Supported Languages
  • Vertices and Edges
  • Graph Terminology
  • Example of Property Graph
  • The GraphX API
  • The GraphX Views
  • The Triplet View
  • Graph Algorithms
  • Graphs and RDDs
  • Constructing Graphs
  • Graph Operators
  • Example of Using GraphX Operators
  • GraphX Performance Optimization
  • The PageRank Algorithm
  • GraphX Support for PageRank
  • Summary

Chapter 10.

  • Machine Learning Algorithms
  • Supervised vs Unsupervised Machine Learning
  • Supervised Machine Learning Algorithms
  • Unsupervised Machine Learning Algorithms
  • Choose the Right Algorithm
  • Life-cycles of Machine Learning Development
  • Classifying with k-Nearest Neighbors (SL)k-Nearest Neighbors Algorithmk-Nearest Neighbors Algorithm
  • The Error

Target Audience :

Developers, Business Analysts, and IT Architects

Learning Objectives :

This high-octane Spark training course provides theoretical and technical aspects of Spark programming. The course teaches developers Spark fundamentals, APIs, common programming idioms and more.This Spark training course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material and quickly get them up to speed on using Spark for data exploration.

Course Agenda :

  • Elements of functional programming
  • Spark Shell
  • RDDs
  • Parallel processing in Spark
  • Spark SQL
  • ETL with Spark
  • MLib Machine Learning Library
  • Graph Processing with GraphX
  • Spark Streaming

Spark Fundamentals WA2490 - Course Book

(0) No ratings yet
Sold by:
Add to Quote Request