SkillSoft Explore Course – Lambers, Inc.

Aspire Data Analyst to Data Scientist Data Science Track 1: Data Analyst

Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.

Asset ID	it_dsadskdj_01_enus
Course Type	Video Course
Course Category	Data Analyst to Data Scientist

Objectives
Accessing Data with Spark: An Introduction to Spark Course Overview recognize where Spark fits in with Hadoop and its components describe Spark RDDs and their characteristics, including what makes them resilient and distributed identify the types of operations which are permitted on an RDD and describe how RDD transformations are lazily evaluated distinguish between RDDs and DataFrames and describe the relationship between the two list the crucial components of Spark and the relationships between them and recognize the functions of the Spark Session, Master and Worker nodes install PySpark and initialize a Spark Context create and load data into an RDD initialize a Spark DataFrame from the contents of an RDD work with Spark DataFrames containing both primitive and structured data types define the contents of a DataFrame using the SQLContext apply the map() function on an RDD to configure a DataFrame with column headers retrieve required data from within a DataFrame and define and apply transformations on a DataFrame convert Spark DataFrames to Pandas DataFrames and vice versa describe basic Spark concepts

Objectives

Accessing Data with Spark: An Introduction to Spark

Course Overview
recognize where Spark fits in with Hadoop and its components
describe Spark RDDs and their characteristics, including what makes them resilient and distributed
identify the types of operations which are permitted on an RDD and describe how RDD transformations are lazily evaluated
distinguish between RDDs and DataFrames and describe the relationship between the two
list the crucial components of Spark and the relationships between them and recognize the functions of the Spark Session, Master and Worker nodes
install PySpark and initialize a Spark Context
create and load data into an RDD
initialize a Spark DataFrame from the contents of an RDD
work with Spark DataFrames containing both primitive and structured data types
define the contents of a DataFrame using the SQLContext
apply the map() function on an RDD to configure a DataFrame with column headers
retrieve required data from within a DataFrame and define and apply transformations on a DataFrame
convert Spark DataFrames to Pandas DataFrames and vice versa
describe basic Spark concepts