Tally course and spoken english classes and Digital Marketing Course

Bigdata, Hadoop Course

Tally Course, Tally ERP 9, Spoken English Classes, Digital Marketing Course, Python for Beginners, Java Programming, Graphic Design Course, Website Development Course, Internship

Bigdata, Hadoop Course

Home / Bigdata, Hadoop Course

Course Detail

Bigdata, Hadoop

Oxford Global Academy of Excellence Big Data Hadoop course lets you master the concepts of the Hadoop framework, Big data tools, and methodologies. Achieving a Big Data Hadoop certification prepares you for success as a Big Data Developer. This Big Data and Hadoop training help you understand how the various components of the Hadoop ecosystem fit into the Big Data processing lifecycle. Take this Big Data and Hadoop online training to explore Spark applications, parallel processing, and functional programming.

We have the best courses for you!

Enroll today and learn something new.

Syllabus

Module 1

The architecture of Hadoop cluster:

What is High Availability and Federation?
How to setup a production cluster?
How to setup a production cluster?
Various shell commands in Hadoop
Installing a single node cluster with Cloudera Manager
Understanding Spark, Scala, Sqoop, Pig, and Flume

Introduction to Big Data Hadoop and Understanding HDFS and MapReduce:

Introducing Big Data and Hadoop
What is Big Data and where does Hadoop fit in?
Two important Hadoop ecosystem components, namely, MapReduce and HDFS
In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager

Hands-on Exercise::

HDFS working mechanism
Data replication process
How to determine the size of the block?
Understanding a data node and name node

Deep Dive in MapReduce::

Learning the working mechanism of MapReduce
Understanding the mapping and reducing stages in MR
Various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle, and Sort

Hands-on Exercise::

How to write a WordCount program in MapReduce?
How to write a Custom Partitioner?
What is a MapReduce Combiner?
How to run a job in a local job runner
Deploying a unit test
What is a map side join and reduce side join?
What is a tool runner?
How to use counters, dataset joining with map side, and reduce side joins?

Introduction to Hive::

Introducing Hadoop Hive
Detailed architecture of Hive
Comparing Hive with Pig and RDBMS
Working with Hive Query Language
Creation of a database, table, group by and other clauses
Various types of Hive tables, HCatalog
Storing the Hive Results, Hive partitioning, and Buckets

Module 2

Advanced Hive and Impala:

Indexing in Hive
The ap Side Join in Hive
Working with complex data types
The Hive user-defined functions
Introduction to Impala
Comparing Hive with Impala

Hands-on Exercise:

How to work with Hive queries?
The process of joining the table and writing indexes
External table and sequence table deployment
Data storage in a different table

Introduction to Pig:

Apache Pig introduction and its various features
Various data types and schema in Hive
The available functions in Pig, Hive Bags, Tuples, and Fields

Hands-on Exercise:

Working with Pig in MapReduce and local mode
Loading of data
Limiting data to 4 rows
Storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive

Flume, Sqoop and HBase:

Apache Sqoop introduction
Importing and exporting data
Performance improvement with Sqoop
Sqoop limitations
Introduction to Flume and understanding the architecture of Flume
What is HBase and the CAP theorem?

Hands-on Exercise:

Working with Flume to generate Sequence Number and consume it
Using the Flume Agent to consume the Twitter data
Using AVRO to create Hive Table
AVRO with Pig
Creating Table in HBase
Deploying Disable, Scan, and Enable Table

Writing Spark Applications Using Scala:

Using Scala for writing Apache Spark applications
Detailed study of Scala
The need for Scala
The concept of object-oriented programming
Executing the Scala code
Various classes in Scala like getters, setters, constructors, abstract, extending objects, overriding methods
The Java and Scala interoperability
The concept of functional programming and anonymous functions
Bobsrockets package and comparing the mutable and immutable collections
Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.

Hands-on Exercise:

Writing Spark application using Scala
Understanding the robustness of Scala for Spark real-time analytics operation

Module 3

Use Case Bobsrockets Package:

Introduction to Scala packages and imports
The selective imports
The Scala test classes
Introduction to JUnit test class
JUnit interface via JUnit 3 suite for Scala test
Packaging of Scala applications in the directory structure
Examples of Spark Split and Spark Scala

Introduction to Spark:

Introduction to Spark
Spark overcomes the drawbacks of working on MapReduce
Understanding in-memory MapReduce
Interactive operations on MapReduce
Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
The overview of Spark and how it is better than Hadoop
Deploying Spark without Hadoop
Spark history server and Cloudera distribution

Spark Basics:

Spark installation guide
Spark configuration
Memory management
Executor memory vs. driver memory
Working with Spark Shell
The concept of resilient distributed datasets (RDD)
Learning to do functional programming in Spark
The architecture of Spark

Working with RDDs in Spark:

Spark RDD
Creating RDDs
RDD partitioning
Operations and transformation in RDD
Deep dive into Spark RDDs
The RDD general operations
Read-only partitioned collection of records
Using the concept of RDD for faster and efficient data processing
RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions

Module 4

Aggregating Data with Pair RDDs:

Understanding the concept of key-value pair in RDDs
Learning how Spark makes MapReduce operations faster
Various operations of RDD
MapReduce interactive operations
Fine and coarse-grained update
Spark stack

Writing and Deploying Spark Applications:

Comparing the Spark applications with Spark Shell
Creating a Spark application using Scala or Java
Deploying a Spark application
Scala built application
Creation of the mutable list, set and set operations, list, tuple, and concatenating list
Creating an application using SBT
Deploying an application using Maven
The web user interface of Spark application
A real-world example of Spark
Configuring of Spark

Writing and Deploying Spark Applications:

Comparing the Spark applications with Spark Shell
Creating a Spark application using Scala or Java
Deploying a Spark application
Scala built application
Creation of the mutable list, set and set operations, list, tuple, and concatenating list
Creating an application using SBT
Deploying an application using Maven
The web user interface of Spark application
A real-world example of Spark
Configuring of Spark

Project Solution Discussion and Cloudera Certification Tips and Tricks:

Working towards the solution of the Hadoop project solution
Its problem statements and the possible solution outcomes
Preparing for the Cloudera certifications
Points to focus on scoring the highest marks
Tips for cracking Hadoop interview questions

Hands-on Exercise:

The project of a real-world high value Big Data Hadoop application
Getting the right solution based on the criteria set by the Intellipaat team

Parallel Processing:

Learning about Spark parallel processing
Deploying on a cluster
Introduction to Spark partitions
File-based partitioning of RDDs
Understanding of HDFS and data locality
Mastering the technique of parallel operations
Comparing repartition and coalesce
RDD actions

Module 5

Spark RDD Persistence:

The execution flow in Spark
Understanding the RDD persistence overview
Spark execution flow, and Spark terminology
Distribution shared memory vs. RDD
RDD limitations
Spark shell arguments
Distributed persistence
RDD lineage
Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey

Spark MLlib:

Introduction to Machine Learning
Types of Machine Learning
Introduction to MLlib
Various ML algorithms supported by MLlib
Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise:

Building a Recommendation Engine

Integrating Apache Flume and Apache Kafka:

Why Kafka and what is Kafka?
Kafka architecture
Kafka workflow
Configuring Kafka cluster
Operations
Kafka monitoring tools
Integrating Apache Flume and Apache Kafka

Hands-on Exercise:

Configuring Single Node Single Broker Cluster
Configuring Single Node Multi Broker Cluster
Producing and consuming messages
Integrating Apache Flume and Apache Kafka

Spark Streaming:

Introduction to Spark Streaming
Features of Spark Streaming
Spark Streaming workflow
Initializing Streaming Context, discretized Streams (DStreams), input DStreams and Receivers
Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful

Important windowed operators and stateful operators Hands-on Exercise:

Twitter Sentiment analysis
Streaming using Netcat server
Kafka–Spark streaming
Spark–Flume streaming

Module 6

Improving Spark Performance:

Introduction to various variables in Spark like shared variables and broadcast variables
Learning about accumulators
The common performance issues
Troubleshooting the performance problems

Spark SQL and Data Frames:

Learning about Spark SQL
The context of SQL in Spark for providing structured data processing
JSON support in Spark SQL
Working with XML data
Parquet files
Creating Hive context
Writing data frame to Hive
Reading JDBC files
Understanding the data frames in Spark
Creating Data Frames
Manual inferring of schema
Working with CSV files
Reading JDBC tables
Data frame to JDBC
User-defined functions in Spark SQL
Shared variables and accumulators
Learning to query and transform data in data frames
Data frame provides the benefit of both Spark RDD and Spark SQL
Deploying Hive on Spark as the execution engine

Scheduling/Partitioning:

Learning about the scheduling and partitioning in Spark
Hash partition
Range partition
Scheduling within and around applications
Static partitioning, dynamic sharing, and fair scheduling
Map partition with index, the Zip, and GroupByKey
Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions

Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2:

Create a 4-node Hadoop cluster setup
Running the MapReduce Jobs on the Hadoop cluster
Successfully running the MapReduce code
Working with the Cloudera Manager setup

Hands-on Exercise:

The method to build a multi-node Hadoop cluster using an Amazon EC2 instance
Working with the Cloudera Manager

Hadoop Administration – Cluster Configuration:

Overview of Hadoop configuration
The importance of Hadoop configuration file
The various parameters and values of configuration
The HDFS parameters and MapReduce parameters
Setting up the Hadoop environment
The Include and Exclude configuration files
The administration and maintenance of name node, data node directory structures, and files
What is a File system image?
Understanding Edit log

Hands-on Exercise:

The process of performance tuning in MapReduce

Module 7

Hadoop Administration – Maintenance, Monitoring and Troubleshooting:

Introduction to the checkpoint procedure, name node failure
How to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes

Hands-on Exercise:

How to go about ensuring the MapReduce File System Recovery for different scenarios
JMX monitoring of the Hadoop cluster
How use the logs and stack traces for monitoring and troubleshooting
Using the Job Scheduler for scheduling jobs in the same cluster
Getting the MapReduce job submission flow
FIFO schedule
Getting to know the Fair Scheduler and its configuration

ETL Connectivity with Hadoop Ecosystem (Self-Paced):

How ETL tools work in Big Data industry?
Introduction to ETL and data warehousing
Working with prominent use cases of Big Data in ETL industry
End-to-end ETL PoC showing Big Data integration with ETL tool

Hands-on Exercise:

Connecting to HDFS from ETL tool
Moving data from Local system to HDFS
Moving data from DBMS to HDFS,
Working with Hive with ETL Tool
Creating MapReduce job in ETL tool

Hadoop Application Testing:

Importance of testing
Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing, and Release testing

Roles and Responsibilities of Hadoop Testing Professional:

Understanding the Requirement
Preparation of the Testing Estimation
Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure
Consolidating all the defects and create defect reports
Validating new feature and issues in Core Hadoop

Module 8

Framework Called MRUnit for Testing of MapReduce Programs:

Report defects to the development team or manager and driving them to closure
Consolidate all the defects and create defect reports
Responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Unit Testing:

Automation testing using the OOZIE
Data validation using the query surge tool

Test Execution:

Test plan for HDFS upgrade
Test automation and result

Test Plan Strategy and Writing Test Cases for Testing Hadoop Application:

Test, install and configure

Career Opportunities

Hadoop/Big Data Developer
Hadoop Administrator.
Data Engineer
Big Data Architect

Machine Learning Engineer
Software Development Engineer
Big data Engineer
Big Data Consultant

Entry Qualification

Candidates will be admitted on the basis of interviews and / or group discussions.
20% of the total seats will be reserved for SC, ST and OBC candidates.If the reserved seats are not filled within the specified period, the vacant seats will be offered to the general candidates.

Download Full Syllabus

Course Features

Instructor

Industry Experienced Trainer

Rating

4.9 (Google Review)

Study Mode

Offline & Online

Duration

6 month

Language

English, Bengali, Hindi

100% Job Assistance

Yes

Internship

Free & Paid

Course Price

Click to Know

Enroll Now

Our Students Testimonials

Oxford Global Academy of Excellence Courses

MD Kashid Hossain

I am MD Kasid Hossain. I am a student of Oxford Global Academy of Excellence, Kolkata. Here I am doing Spoken English class. Oxford Global Academy of Excellence is a very advantage platform by spoken English, computer course and more. There sirs, madams are very Helpful. All time they Support and guide us. I always enjoy my classes.

View All Testimonials