Big Data & Hadoop

10000 Learners


Course Objective:

On completion of the training, participants will be able to analyze the records written in any file format with the help of specific apache tools. Cluster Setup, Storing and processing of real time live streaming Data. Data analysation is used for many purposes like Business intelligence, future predictions on various industry purposes, Health industry, cyber crime, credit card and loan fraud in banks etc. in less time and with ease.


Course Content:

Lesson: 1
  • Introduction to Big Data
  • Different Sources of Big Data
  • Challenges in Big Data
  • Data
  • Data Storage and Analysis
  • Comparison with Other Systems
  • RDBMS
  • Grid Computing
  • Volunteer Computing
  • A Brief History of Hadoop
  • Apache Hadoop and the Hadoop Ecosystem
  • Hadoop Releases
  • Why Companies working for big Data
  • V’s of Big Data
  • Why Big Data is a problem And How it could be resolved
  • Introduction to Hadoop and why did it came into existence
Lesson: 2
  • HDFS Architecture (how data is stored in distributed environment)The Design of HDFS
  • HDFS Concepts
  • Blocks
  • Namenodes and Datanodes
  • HDFS Federation
  • HDFS High-Availability
Lesson: 3
  • The Command-Line Interface
  • Basic File system Operations
  • Hadoop File systems
  • Interfaces
  • The Java Interface
  • Reading Data from a Hadoop URL
  • Reading Data Using the File System API
  • Writing Data Directories
  • Querying the File system
  • Deleting Data
  • Data Flow
  • Anatomy of a File Read
  • Anatomy of a File Write
  • Coherency Model
  • Parallel Copying with
  • Hadoop I/O
  • Data Integrity
  • Data Integrity in HDFS
  • Local File System
  • Checksum File System
  • Compression
Lesson: 4
  • Hadoop installation
Lesson: 5
  • Hadoop Commands (Developer,Admin)
Lesson: 6
  • Concept of Map reduce
  • A Weather Dataset
  • Data Format
  • Analyzing the Data with Unix Tools
  • Analyzing the Data with Hadoop
  • Map and Reduce
  • Java Map Reduce
  • Scaling Out
  • Data Flow
  • Combiner Functions
  • Running a Distributed Map Reduce Job
  • Hadoop Streaming
  • Compiling and Running
  • Practical's on map reduce
Lesson: 7
  • Introduction to Apache Pig
  • Installing and Running Pig
  • Execution Types
  • Running Pig Programs
  • Grunt
  • Pig Latin Editors
  • An Example
  • Generating Examples
  • Comparison with Databases
  • Pig Latin
  • Structure
  • Statements
  • Expressions
  • Types
  • Schemas
  • Functions
  • Macros
  • User-Defined Functions
  • A Filter UDF
  • An Eval UDF
  • A Load UDF
  • Data Processing Operators
  • Loading and Storing Data
  • Filtering Data
  • Grouping and Joining Data
  • Sorting Data
  • Combining and Splitting Data
  • Extending Pig
  • Adding Flexibility with Parameters
  • Macros and Imports
  • UDFs
  • Contributed Functions
  • Using Other Languages to Process Data with Pig
Lesson: 8
  • Hands-On Exercise: Extending Pig with Streaming and UDFs
  • Practical's on Apache Pig
Lesson: 9
  • Hive and its concepts
  • Installing Hive
  • The Hive Shell
  • An Example
  • Running Hive
  • Configuring Hive
  • Hive Services
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes
  • HiveQL
  • Data Types
  • Operators and Functions
  • Tables
  • Managed Tables and External Tables
  • Partitions and Buckets *Storage Formats
  • Importing Data
  • Altering Tables
  • Dropping Tables
  • Querying Data
  • Sorting and Aggregating
  • Map Reduce Scripts
  • Joins
  • Sub queries
Lesson: 10
  • Relational Data Analysis with Hive
  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Joining Data Sets
  • Common Built-in Functions
  • Practical's based on hive
Lesson: 11
  • Hive Optimization
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data
  • Extending Hive
  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • Hands-On Exercise: Data Transformation with Hive
  • Choosing the Best Tool for the Job
  • Comparing Map Reduce, Pig, Hive and Relational Databases
  • Which to Choose?
Lesson: 12
  • Sqoop and its concepts
  • Practicals based on Sqoop
Lesson: 13
  • Flume and its concepts
  • Practical on flume
Lesson: 14
  • Setting Up a Hadoop Cluster
  • Cluster Specification
  • Network Topology
  • Cluster Setup and Installation
  • Installing Java
  • Creating a Hadoop User
  • Installing Hadoop
  • Testing the Installation
  • SSH Configuration
  • Hadoop Configuration
  • Configuration Management
  • Environment Settings
  • Important Hadoop Daemon Properties
  • Hadoop Daemon Addresses and Ports
  • Other Hadoop Properties
  • User Account Creation
  • YARN Configuration
  • Important YARN Daemon Properties
  • YARN Daemon Addresses and Ports
  • Security
Doubt session & Revision
Q & A

Key Features

  • Gain skills and competencies required in Industry by Experts.
  • Work on Real-time Projects depending upon the course you select.
  • Students work in a professional corporate environment.
  • Get a globally recognized Certificate form WebTek with our partner logos.
  • Global Brand recognition for Placements.

Includes

  • 45 - 60 hrs
  • Regular Batches: 1st Yr / 2nd Yr / 3rd Yr / 4th Yr B.Tech. / Diploma / MCA / BCA students
Copyright © 2019 webteklabs.com
Webteklabs
Webteklabs
Webteklabs
Webteklabs
Free Demo
Call now