Big Data Analytics with Hadoop-Level 2

Big data refers to the use of advanced data analytics methods that help extract value from large data sets both structured an...


Content Provider

40 hrs



Mode Of Delivery

Valid for 6 months post activation

Course Validity




Certification By

This is a paid course.

Course Fee

  • 1,800/-/- 599/-/-


Big data refers to the use of advanced data analytics methods that help extract value from large data sets both structured and unstructured. With the availability of large data sets, there is a need for tools to computationally analyse and help reveal patterns, trends, associations to make meaningful decisions.


Module 1: Apache Hive and HiveQL

What is Hive, Hive DDL - Create/Show database, Hive DDL - Create/Show/Drop tables, Hive DML – Load files & Insert data, Hive SQL - Select, Filter, Join, Group By, Hive architecture & components, Difference between Hive and RDBMS

Module 2: Advance HiveQL

Multi-Table Inserts, Joins, Grouping Sets, Cubes, Rollups, Custom map and Reduce scripts, Hive SerDe, Hive UDF, Hive UDAF.

Module 3: Apache Flume

Sqoop, Oozie, Sqoop - How Sqoop works, Sqoop architecture, Flume complex Flow – Multiplexing,  Oozie - Simple/Complex flow, Oozie service/ Scheduler, Use cases - Time and data triggers.

Module 4: NoSQL Databases

CAP theorem, RDBMS vs NoSQL, Key value stores: Memcached, Riak, Key Value stores: Redis, Dynamo DB, Column Family: Cassandra, HBase, Graph Store: Neo4J, Document Store: MongoDB, CouchDB.

Module 5: Apache HBase

When/Why to use HBase, Hbase architecture/Storage, Hbase data model, Hbase families/ column families, Hbase master, HBase vs RDBMS, Access Hbase data.

Module 6: Apache Zookeeper

Zookeeper Data model, Znokde Types, What is zookeeper, Sequential Znodes, Installing and configuring, Running  zookeeper, Zookeeper use cases.

Module 7: Hadoop 2.0

YARN, MRv2, MapReduce limitations, HDFS 2: Architecture, HDFS 2: High availability, HDFS 2: Federation, YARN Architecture,  Classic vs YARN, YARN multitenancy, YARN capacity scheduler.

Learning Outcomes

  • Understand the concept of BigData
  • Understand the internals of MapReduce and YARN
  • Understand the concept of Hadoop
  • Understand the different modes and distribution of Hadoop
  • Create one node Hadoop cluster
  • Write MapReduce job for word count

Who Should Attend?

  • Engineering and IT students
  • Graduates with a programming background

Job Prospects

  • Data Analyst
  • Java Developer
  • Hadoop Developer
  • Business Analyst
  • Software Developer
  • SAS Analyst


After completing this course and successfully passing the certification examination, the student will be awarded the “Big Data Analytics with Hadoop - Level 2” certification.

If a learner chooses not to take up the examination, they will still get a 'Participation Certificate'.