BIG DATA MASTERY CERTIFICATION

BIG DATA MASTERY CERTIFICATION GADET149175

Please contact us at contact@gadet.world-evergreen.online to receive the materials and the virtual machine to prepare this certification

Global knowledge to be acquired to pass this certification:

+ Understand Apache Hadoop architecture and data management

+ Dominate data analysis with Hive and Impala

+ Ability to use Flume, Kafka and Sqoop

+ Ability to use Hbase and Kudu

+ Understand Apache Spark architecture and data management

+ Using basic Apache Spark functionality with python:

– Extract Transform Load (ETL) with pyspark

– Spark SQL

– Scalable Data Science

– Machine learning (basic notions) with Mllib et ML

Detailed plan of preparation:

+ Hadoop Architecture and MapReduce

+ Running SQL Statements using Hive Shell, Beeline and Impala Shell

+ Using Beeline and Impala Shell in Non-Interactive Mode

+ Using Hive and Impala in Scripts and Applications

+ Browsing Tables in the Metastore

+ Browsing Files in HDFS

+ Creating Databases and Tables

+ Managing Existing Tables

+ Apache Hive and Apache Impala Interoperability

+ Data and File Types

+Loading Files into HDFS

+ Using Sqoop to Load Data from Relational Databases

+ Using Hive and Impala to Load Data into Tables

+ Table Partitioning

+ Apache Hbase and Apache Kudu

+ Streaming with Apache Flume and/or kafka

+ Apache Spark scalability

+ Apache Spark architecture

+ Resilient Distributed Dataset (RDD) and Dataframe

+ Spark SQL

+ Extract Transform Load with Spark

+ Basic notions of machine learning (supervised learning (example: decision tree) and unsupervised learning (example: K-means)