SkillSoft Explore Course – Lambers, Inc.

IT Professional Curricula Enterprise Database Systems Solution Area Big Data Big Data Analytics

Spark is an open-source, massively parallel, in-memory solution that allows you to run big data analytics pipelines at high speed. Use this course to learn how Apache Spark works and gain an understanding of its architecture.
As you progress, investigate the industry-leading examples of Uber and Alibaba to recognize how Spark can add business value to data in many industry types.
Moving along, compare the functionality of Spark and Hadoop in relation to use cases, identifying when using Spark is most advantageous. Finally, explore fundamental Spark characteristics, optimization techniques, and best practices.
When you've completed this course, you'll have a solid theoretical understanding of how and when to use Apache Spark for specific big data analytics tasks.

Asset ID	it_dlbdadj_02_enus
Course Type	Video Course
Course Category	Enterprise Database Systems Solution Area

Objectives
Big Data Analytics: Spark for High-speed Big Data Analytics discover the key concepts covered in this course recognize how Spark offers an open-source, scalable, massively parallel, in-memory solution for analytics applications outline the two main components of the Spark architecture: Resilient Distributed Dataset and Directed Acyclic Graph describe how Spark is providing business value to Uber describe how Spark is providing business value to Alibaba describe how Spark is providing business value to the Healthcare industry compare and name the main differences between Spark and Hadoop with respect to ease of use, latency, security, and cost specify in which scenarios and conditions Spark is a better choice than its alternatives list the main features of Spark, such as loading behavior, file formats, parallelism, cache, data skews name the most important performance optimization techniques in Apache Spark, such as file format selection, level of parallelism, and API selection name simple best practices when using Spark, like starting small or resolving skewness summarize the key concepts covered in this course

Objectives

Big Data Analytics: Spark for High-speed Big Data Analytics

discover the key concepts covered in this course
recognize how Spark offers an open-source, scalable, massively parallel, in-memory solution for analytics applications
outline the two main components of the Spark architecture: Resilient Distributed Dataset and Directed Acyclic Graph
describe how Spark is providing business value to Uber
describe how Spark is providing business value to Alibaba
describe how Spark is providing business value to the Healthcare industry
compare and name the main differences between Spark and Hadoop with respect to ease of use, latency, security, and cost
specify in which scenarios and conditions Spark is a better choice than its alternatives
list the main features of Spark, such as loading behavior, file formats, parallelism, cache, data skews
name the most important performance optimization techniques in Apache Spark, such as file format selection, level of parallelism, and API selection
name simple best practices when using Spark, like starting small or resolving skewness
summarize the key concepts covered in this course