Go beyond the basics and master the next generation of Hadoop data processing platforms
Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Hadoop is synonymous with Big Data processing. Its simple programming model, “code once and deploy at any scale” paradigm, and an ever-growing ecosystem makes Hadoop an all-encompassing platform for programmers with different levels of expertise.
This book explores the industry guidelines to optimize MapReduce jobs and higher-level abstractions such as Pig and Hive in Hadoop 2.0. Then, it dives deep into Hadoop 2.0 specific features such as YARN and HDFS Federation.
This book is a step-by-step guide that focuses on advanced Hadoop concepts and aims to take your Hadoop knowledge and skill set to the next level. The data processing flow dictates the order of the concepts in each chapter, and each chapter is illustrated with code fragments or schematic diagrams.
Chapter 1: Hadoop 2.X
Chapter 2: Advanced MapReduce
Chapter 3: Advanced Pig
Chapter 4: Advanced Hive
Chapter 5: Serialization and Hadoop I/O
Chapter 6: YARN – Bringing Other Paradigms to Hadoop
Chapter 7: Storm on YARN – Low Latency Processing in Hadoop
Chapter 8: Hadoop on the Cloud
Chapter 9: HDFS Replacements
Chapter 10: HDFS Federation
Chapter 11: Hadoop Security
Chapter 12: Analytics Using Hadoop
Appendix: Hadoop for Microsoft Windows