BIG DATA ANALYSIS USING PYTHON AND R

BIG DATA ANALYSIS USING PYTHON AND R


₹ 10500

    

80

Keeping in mind, the huge demand of BigData and Hadoop, it is important to have a thorough knowledge about the subject because no one can deny its real-life implementation. This course is designed in such a manner which will help the user to interface properly. BigData and Hadoop are considered to be one of the most important courses in this era as the applications are increasing every day with a growing curve of demand. It is widely used due to its attractive features such as cost-effective nature, flexibility, resiliency to failure and much more. This course sketches a detailed outline of important modules such as introduction to Hadoop and its installation, Linux, Working with HDFS, Hive and Sqoop, Pig, Spark with Scala, Flume and much more. In this, each and every module is covered with a proper set of knowledge with reference to the real-time problems and much more. Prerequisites: Basic Knowledge of RDBMS and SQL

Syllabus

1. BIG DATA
i. Characteristics of Big Data
ii. Importance of Big Data in the corporate world
iii. Importance of data governance for managing Big Data
iv. Examples of Big Data usage in different areas

2. Introduction to Hadoop
i. Discuss various products developed by vendors
ii. Key Components of Hadoop (HDFS and MapReduce)
iii. Identify and discuss various processes/daemons of hadoop
iv. Introduction to other components of Hadoop ecosystem
v. Master the concepts of Hadoop Distributed File System and MapReduce framework
vi. Understanding NameNode
vii. Understanding DataNode
viii. Secondary NameNode

3. Linux
i. Linux introduction
ii. Linux Installation
iii. Basic Commands of Linux

4. Hadoop Installation
i. Hadoop installation and configuration
ii. Hadoop ver2 installation in Linux
iii. Setup of 2 node cluster in Hadoop
iv. Commands for Hadoop HDFS
v. Verifying failover in Hadoop cluster
vi. Using Webui in Hadoop
vii. Working with various configuration files
viii. Hands on exercises

5. Working with HDFS
i. Basic file commands
ii. Reading & Writing to HDFS
iii. Coping, moving, removing, etc file into HDFS from different
platform
iv. Web Based User Interface
v. View jobs in the Web UI
vi. Hands on exercises

6. Hive
i. Introduction to Hive
ii. Hive installation in Hadoop
iii. Details of Hive commands and HQL
iv. Processing data in Hive
v. Hive in client server mode
vi. Hands on exercises

7. Sqoop
i. Introduction to Sqoop
ii. Sqoop installation in Hadoop
iii. Importing and exporting RDBMS data in Hadoop by Sqoop
iv. Commands to process data by Sqoop
v. Advance commands of Sqoop
vi. Sqoop Job Creation
vii. Hands on exercises
8. Pig
i. Introduction to pig
ii. PIG installation in Hadoop
iii. Basic Data processing with PIG
iv. Hands on exercises

9. Mapreduce
i. Introduction to Mapreduce
ii. Examples of Mapreduce programming
iii. Hadoop streaming
iv. Executing Mapreduce program using php
v. Executing Mapreduce program using Java
vi. Hands on exercises

10. Flume
i. Introduction to Flume
ii. Flume installation and configuration in Hadoop
iii. Processing log files using Flume
iv. Maintaining Flume with Hive
v. Hands on exercises

11. Spark with Scala

i. Spark and Scala
ii. Introduction to Spark
iii. Concept of Spark Architecture
iv. Limitation of Mapreduce in Hadoop
v. Spark Vs Hadoop
vi. Introduction to Scala
vii. Writing Scala program
viii. Installation of Spark in Hadoop Cluster
ix. WEBUI of Spark
x. Writing Mapreduce in Spark
xi. Introduction to RDD
xii. Creating RDD
xiii. RDD operations and methods in Spark
xiv. Introduction to Spark SQL


12. Project discussion
i. Project with Data Analysis using BigData and Hadoop