Please contact us at to receive the materials and the virtual machine to prepare this certification

Global knowledge to be acquired to pass this certification: 

+ Understand Apache Hadoop architecture and data management

+ Dominate data analysis with Hive and Impala

+ Ability to use Flume, Kafka and Sqoop

+ Ability to use Hbase and Kudu

+ Understand Apache Spark architecture and data management

+ Using basic Apache Spark functionality with python:

                           – Extract Transform Load (ETL) with pyspark

                           – Spark SQL

                           – Scalable Data Science

                           – Machine learning (basic notions) with Mllib et ML

Detailed plan of preparation:

+ Hadoop Architecture and MapReduce

+ Running SQL Statements using Hive Shell, Beeline and Impala Shell

+ Using Beeline and Impala Shell in Non-Interactive Mode

+ Using Hive and Impala in Scripts and Applications

+ Browsing Tables in the Metastore

+ Browsing Files in HDFS

+ Creating Databases and Tables

+ Managing Existing Tables

+ Apache Hive and Apache Impala Interoperability

+ Data and File Types

+Loading Files into HDFS

+ Using Sqoop to Load Data from Relational Databases

+ Using Hive and Impala to Load Data into Tables

+ Table Partitioning
+ Apache Hbase and Apache Kudu
+ Streaming with Apache Flume and/or kafka
+ Apache Spark scalability
+ Apache Spark architecture
+ Resilient Distributed Dataset (RDD) and Dataframe
+ Spark SQL
+ Extract Transform Load with Spark
+ Basic notions of machine learning (supervised learning (example: decision tree) and unsupervised learning (example: K-means)
  • Machine Learning with RDD (MLlib with pyspark)
  • Machine Learning with Dataframe (ML with pyspark)