5910 Breckenridge Pkwy Suite B, Tampa, FL. 33610
(800) 272-0707

SkillSoft Explore Course

IT Professional Certifications     Google     Cloud Certified Professional     Data Engineer
Executing Dataproc implementations with big data can provide a variety of methods. This course will continue the study of Dataproc implementations with Spark and Hadoop using the cloud shell and introduce BigQuery PySpark REPL package.

Objectives

Implementation using Dataproc

  • start the course
  • describe the various Spark and Hadoop processes that can be performed with Dataproc
  • recognize the benefits of separating storage and compute services using Cloud Dataproc
  • recall the process of monitoring and logging Dataproc jobs
  • demonstrate the process of using an SSH tunnel to connect to the master and worker nodes in a cluster
  • define the Spark REPL package and how it's used in Linux

Implementation using Cloud Shell

  • describe the compute and storage processes and the benefits of their separation and the virtualized distribution of Hadoop
  • define BigQuery and its benefits for large-scale analytics
  • describe the MapReduce programming model
  • demonstrate the process of submitting multiple jobs with Dataproc

Practice: Dataproc Implementations

  • recognize the various Dataproc and Cloud Shell job operations and implementations