Getting Started with HDFS
Learning to work with Hadoop Distributed File System (HDFS) is a baseline skill for anyone administering or developing in the Hadoop ecosystem. In this course, you will learn how to work with HDFS, Hive, Pig, Sqoop and HBase from the command line.
Getting Started with Hadoop Distributed File System (HDFS) is designed to give you everything you need to learn about how to use HDFS to read, store, and remove files. In addition to working with files in Hadoop, you will learn how to take data from relational databases and import it into HDFS using Sqoop. After we have our data inside HDFS, we will learn how to use Pig and Hive to query that data. Building on our HDFS skills, we will look at how use HBase for near real-time data processing. Whether you are a developer, administrator, or data analyst, the concepts in this course are essential to getting started with HDFS.
Author Name: Thomas Henson
Author Description:
Thomas is a Senior Software Engineer and Certified ScrumMaster. During his career he has been involved in many projects from building web applications to setting up Hadoop clusters. Thomas’s specialization is with Hortonworks Data Platform and Agile Software Development. Thomas is a proud alumnus of the University of North Alabama where he received his BBA – Computer Information System and his MBA – Information Systems. He currently resides in north Alabama with his wife and daughter, where … more
Table of Contents
- Understanding HDFS
19mins - Creating, Manipulating, and Retrieving HDFS Files
47mins - Transferring Relational Data to HDFS Using Sqoop
22mins - Querying Data with Pig and Hive
36mins - Processing Sparse Data with HBase
24mins - Automating Basic HDFS Operations
18mins
There are no reviews yet.