Hadoop for Business Analysts培训
Section 1: Introduction to Hadoop
hadoop history, concepts
eco system
distributions
high level architecture
hadoop myths
hadoop challenges
hardware / software
Labs : first look at Hadoop
Section 2: HDFS Overview
concepts (horizontal scaling, replication, data locality, rack awareness)
architecture (Namenode, Secondary namenode, Data node)
data integrity
future of HDFS : Namenode HA, Federation
labs : Interacting with HDFS
Section 3 : Map Reduce Overview
mapreduce concepts
daemons : jobtracker / tasktracker
phases : driver, mapper, shuffle/sort, reducer
Thinking in map reduce
Future of mapreduce (yarn)
labs : Running a Map Reduce program
Section 4 : Pig
pig vs java map reduce
pig latin language
user defined functions
understanding pig job flow
basic data analysis with Pig
complex data analysis with Pig
multi datasets with Pig
advanced concepts
lab : writing pig scripts to analyze / transform data
Section 5: Hive
hive concepts
architecture
SQL support in Hive
data types
table creation and queries
Hive data management
partitions & joins
text analytics
labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions
Section 6: BI Tools for Hadoop
BI tools and Hadoop
Overview of current BI tools landscape
Choosing the best tool for the job