Big Data & Hadoop Training

8700 Learners


Course Objective:

On completion of the training, participants will be able to analyze the records written in any file format with the help of specific apache tools. Cluster Setup, Storing and processing of real time live streaming Data. Data analysation is used for many purposes like Business intelligence, future predictions on various industry purposes, Health industry, cyber crime, credit card and loan fraud in banks etc. in less time and with ease.

Learn Big Data and Hadoop course for beginners and professionals


Course Content:

Introduction of Bigdata
  • Introduction to Big Data
  • Different Sources of Big Data
  • Challenges in Big Data
  • Data
  • Data Storage and Analysis
  • Comparison with Other Systems
  • RDBMS
  • Grid Computing
  • Volunteer Computing
  • A Brief History of Hadoop
  • Apache Hadoop and the Hadoop Ecosystem
  • Hadoop Releases
  • Why Companies working for big Data
  • V’s of Big Data
  • Why Big Data is a problem And How it could be resolved
  • Introduction to Hadoop and why did it came into existence
HDFS architecture
  • HDFS Architecture (how data is stored in distributed environment)The Design of HDFS
  • HDFS Concepts
  • Blocks
  • Namenodes and Datanodes
  • HDFS Federation
  • HDFS High-Availability
HDFS Admin
  • The Command-Line Interface
  • Basic File system Operations
  • Hadoop File systems
  • Interfaces
  • The Java Interface
  • Reading Data from a Hadoop URL
  • Reading Data Using the File System API
  • Writing Data Directories
  • Querying the File system
  • Deleting Data
  • Data Flow
  • Anatomy of a File Read
  • Anatomy of a File Write
  • Coherency Model
  • Parallel Copying with
  • Hadoop I/O
  • Data Integrity
  • Data Integrity in HDFS
  • Local File System
  • Checksum File System
  • Compression
Configure hadoop using hadoop distribution
  • Hadoop installation
Hadoop admin command
  • Hadoop Commands (Developer,Admin)
MapReduce Programming
  • Concept of Map reduce
  • A Weather Dataset
  • Data Format
  • Analyzing the Data with Unix Tools
  • Analyzing the Data with Hadoop
  • Map and Reduce
  • Java Map Reduce
  • Scaling Out
  • Data Flow
  • Combiner Functions
  • Running a Distributed Map Reduce Job
  • Hadoop Streaming
  • Compiling and Running
  • Practical's on map reduce
Pig Programming
  • Introduction to Apache Pig
  • Installing and Running Pig
  • Execution Types
  • Running Pig Programs
  • Grunt
  • Pig Latin Editors
  • An Example
  • Generating Examples
  • Comparison with Databases
  • Pig Latin
  • Structure
  • Statements
  • Expressions
  • Types
  • Schemas
  • Functions
  • Macros
  • User-Defined Functions
  • A Filter UDF
  • An Eval UDF
  • A Load UDF
  • Data Processing Operators
  • Loading and Storing Data
  • Filtering Data
  • Grouping and Joining Data
  • Sorting Data
  • Combining and Splitting Data
  • Extending Pig
  • Adding Flexibility with Parameters
  • Macros and Imports
  • UDFs
  • Contributed Functions
  • Using Other Languages to Process Data with Pig
Udf in Pig
  • Hands-On Exercise: Extending Pig with Streaming and UDFs
  • Practicals on Apache Pig
Introduction of HIVE
  • Hive and its concepts
  • Installing Hive
  • The Hive Shell
  • An Example
  • Running Hive
  • Configuring Hive
  • Hive Services
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes
  • HiveQL
  • Data Types
  • Operators and Functions
  • Tables
  • Managed Tables and External Tables
  • Partitions and Buckets *Storage Formats
  • Importing Data
  • Altering Tables
  • Dropping Tables
  • Querying Data
  • Sorting and Aggregating
  • Map Reduce Scripts
  • Joins
  • Sub queries
Hive Programming
  • Relational Data Analysis with Hive
  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Joining Data Sets
  • Common Built-in Functions
  • Practical's based on hive
Advance Hive
  • Hive Optimization
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data
  • Extending Hive
  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • Hands-On Exercise: Data Transformation with Hive
  • Choosing the Best Tool for the Job
  • Comparing Map Reduce, Pig, Hive and Relational Databases
  • Which to Choose?
RDBMS connection using Sqoop
  • Sqoop and its concepts
  • Practical's based on Sqoop
Data streaming connection using flume
  • Flume and its concepts
  • Practical on flume
Installation of hadoop
  • Setting Up a Hadoop Cluster
  • Cluster Specification
  • Network Topology
  • Cluster Setup and Installation
  • Installing Java
  • Creating a Hadoop User
  • Installing Hadoop
  • Testing the Installation
  • SSH Configuration
  • Hadoop Configuration
  • Configuration Management
  • Environment Settings
  • Important Hadoop Daemon Properties
  • Hadoop Daemon Addresses and Ports
  • Other Hadoop Properties
  • User Account Creation
  • YARN Configuration
  • Important YARN Daemon Properties
  • YARN Daemon Addresses and Ports
  • Security
Doubt Session & Revision
Q & A

Key Features

  • Gain skills and competencies required in Industry by Experts.
  • Work on Real-time Projects depending upon the course you select.
  • Students work in a professional corporate environment.
  • Get a globally recognized Certificate form WebTek with our partner logos.
  • Global Brand recognition for Placements.

Includes

  • Course Duration: 4 – 6 Weeks
  • Regular Batches: Online / Offline

Want to stay updated about Big Data & Hadoop Training ?

Get information on offers, new launches, webinars and more!
Copyright © 2019 webteklabs.com
Webteklabs
Webteklabs
Webteklabs
Webteklabs
Free Demo
Call now