Analyze, Architect, and Innovate with Databricks Lakehouse
Key Features
Description
The Databricks Lakehouse is groundbreaking technology that simplifies data storage, processing, and analysis. This cookbook offers a clear and practical guide to building and optimizing your Lakehouse to make data-driven decisions and drive impactful results.
This definitive guide walks you through the entire Lakehouse journey, from setting up your environment, and connecting to storage, to creating Delta tables, building data models, and ingesting and transforming data. We start off by discussing how to ingest data to Bronze, then refine it to produce Silver. Next, we discuss how to create Gold tables and various data modeling techniques often performed in the Gold layer. You will learn how to leverage Spark SQL and PySpark for efficient data manipulation, apply Delta Live Tables for real-time data processing, and implement Machine Learning and Data Science workflows with MLflow, Feature Store, and AutoML. The book also delves into advanced topics like graph analysis, data governance, and visualization, equipping you with the necessary knowledge to solve complex data challenges.
By the end of this cookbook, you will be a confident Lakehouse expert, capable of designing, building, and managing robust data-driven solutions.
What you will learn
Who this book is for
This book is meant for Data Engineers, Data Analysts, Data Scientists, Business intelligence professionals, and Architects who want to go to the next level of Data Engineering using the Databricks platform to construct Lakehouses.
Table of Contents
1. Introduction to Databricks Lakehouse
2. Setting-up a Databricks Workspace
3. Connecting to Storage
4. Creating Delta Tables
5. Data Profiling and Modeling in the Lakehouse
6. Extracting from Source and Loading to Bronze
7. Transforming to Create Silver
8. Transforming to Create Gold for Business Purposes
9. Machine Learning and Data Science
10. SQL Analysis
11. Graph Analysis
12. Visualizations
13. Governance
14. Operations
15. Tips, Tricks, Troubleshooting, and Best Practices