BIG DATA ANALYTICS

Unit - I Introduction to Big Data Analytics and Data Architecture

1.1 Classification of Data : Structured, Semi-structured and Unstructured

1.2 Introduction :Big Data Definitions, Need of Big Data
1.3 Big Data Characteristics : Volume, Velocity, Variety, Veracity
1.4 Big Data Types
1.5 Big Data Processing Architecture Design
1.6 Big Data Analytics : Data analytics Definitions, Phases in Analytics
1.7 Big Data Analytics Applications : Big Data in Marketing and Sales, Big Data and Healthcare, Big Data in Medicine, Big Data in Advertising

Unit - II Introduction to Hadoop and MapReduce

2.1 Introduction to Hadoop
2.2 Hadoop and its Ecosystem : Hadoop Core Components, Features of Hadoop, Hadoop Ecosystem Components
2.3 Hadoop Distributed File System : HDFS data storage, HDFS Commands for interacting with files in HDFS
2.4 MapReduce Framework and Programming Model : Hadoop MapReduce Framework, MapReduce Programming Model
2.5 Hadoop Yarn : Hadoop 2 Execution Model

2.6 MapReduce : Map Tasks, Key-Value Pair, Grouping by Key, Partitioning, Combiners, Reduce Tasks, Details of MapReduce Processing Steps

Unit - III NoSQL Databases and Big Data Management

3.1 Introduction NoSQL in Big Data

3.2 NoSQL Data Store : NoSQL, CAP theorem, Schema-less Models

3.3 NoSQL Data Architecture Patterns : Key-Value Store, Document Store, Tabular Data, Object Data Store.Graph Database

3.4 NoSQL to manage Big Data

3.5 MongoDB Database

Unit - IV Hive and Pig

4.1 Introduction to Hive : Hive Characteristics, Limitations

4.2 Hive Architecture

4.3 Hive Data Types and File Formats

4.4 Hive Integration and Workflow Steps

4.5 Hive Built-in functions

4.6 HiveQL : HiveQL DDL, HiveQL DML, HiveQL for Querying the Data

4.7 Introduction to Pig : Applications of Apache Pig, Features of Pig, Compare Pig with SQL, MapReduce, and Hive

4.8 Pig Architecture

4.9 Pig Latin Data Model

Unit - V Spark and Real-Time Analytics

5.1 Introduction to Big Data tool Spark : Main components of Spark Architecture, Features of Spark, Spark Software Stack

5.2 Introduction to Data Analysis with Spark : Spark SQL

5.3 Programming with RDDs and Machine learning with MLib

5.4 Data ETL (Extract, Transform and Load) Process: Composing Spark Program steps for ETL

5.5 Analytics, Reporting and Visualization

5.6 Apache Spark Streaming Platform: Spark Streaming Architecture, Spark streaming vs Structured streaming, Internal Working of Spark Streaming

5.7 Spark streaming characteristics: Scalable, Fault Tolerance and Load Balancing

Search This Blog

Artificial Intelligence and Machine Learning

BIG DATA ANALYTICS

Comments

Post a Comment

Popular posts from this blog

PRINCIPLES OF IMAGE PROCESSING