5910 Breckenridge Pkwy Suite B, Tampa, FL. 33610
(800) 272-0707

SkillSoft Explore Course

IT Professional Curricula     Enterprise Database Systems Solution Area     Big Data     Big Data Analytics
Spark is an open-source, massively parallel, in-memory solution that allows you to run big data analytics pipelines at high speed. Use this course to learn how Apache Spark works and gain an understanding of its architecture.
As you progress, investigate the industry-leading examples of Uber and Alibaba to recognize how Spark can add business value to data in many industry types.
Moving along, compare the functionality of Spark and Hadoop in relation to use cases, identifying when using Spark is most advantageous. Finally, explore fundamental Spark characteristics, optimization techniques, and best practices.
When you've completed this course, you'll have a solid theoretical understanding of how and when to use Apache Spark for specific big data analytics tasks.

Objectives

Big Data Analytics: Spark for High-speed Big Data Analytics

  • discover the key concepts covered in this course
  • recognize how Spark offers an open-source, scalable, massively parallel, in-memory solution for analytics applications
  • outline the two main components of the Spark architecture: Resilient Distributed Dataset and Directed Acyclic Graph
  • describe how Spark is providing business value to Uber
  • describe how Spark is providing business value to Alibaba
  • describe how Spark is providing business value to the Healthcare industry
  • compare and name the main differences between Spark and Hadoop with respect to ease of use, latency, security, and cost
  • specify in which scenarios and conditions Spark is a better choice than its alternatives
  • list the main features of Spark, such as loading behavior, file formats, parallelism, cache, data skews
  • name the most important performance optimization techniques in Apache Spark, such as file format selection, level of parallelism, and API selection
  • name simple best practices when using Spark, like starting small or resolving skewness
  • summarize the key concepts covered in this course