Apache Iceberg
Master Apache Iceberg to manage large-scale data lakes with reliability, performance, and open table formats.
97% Started a new career BUY THIS COURSE (
GBP 12 GBP 29 )-
86% Got a pay increase and promotion
Students also bought -
-
- Apache Spark and PySpark
- 50 Hours
- GBP 12
- 888 Learners
-
- Apache Flink
- 10 Hours
- GBP 12
- 10 Learners
-
- Cloud Security
- 10 Hours
- GBP 12
- 10 Learners

-
Understand the core principles of Apache Iceberg.
-
Learn how Iceberg solves limitations of Hive and parquet-only datasets.
-
Use schema evolution, partitioning, and time travel.
-
Run queries with Spark, Flink, Trino, and Presto.
-
Manage ACID transactions in a data lakehouse.
-
Integrate Iceberg with cloud storage (S3, GCS, ADLS).
-
Apply best practices for scaling and performance.
-
Data engineers managing data lakes and lakehouses.
-
Analytics engineers working with Spark, Flink, and Trino.
-
Data scientists needing reliable and consistent data for ML.
-
DevOps engineers deploying scalable storage systems.
-
Students & professionals learning modern big data architectures.
-
Enterprises migrating from Hive tables to Iceberg.
-
Start with Iceberg basics – architecture and motivation.
-
Experiment with small datasets and Iceberg tables in Spark.
-
Progress to schema evolution, time travel, and partitioning.
-
Integrate with query engines like Flink and Trino.
-
Deploy Iceberg with cloud-native storage.
-
Revisit modules for optimization and best practices.
By completing this course, learners will:
-
Deploy and configure Apache Iceberg.
-
Create and manage Iceberg tables.
-
Implement schema evolution and partitioning.
-
Use time travel and rollback features.
-
Integrate Iceberg with modern query engines.
-
Operate Iceberg at scale in a data lakehouse environment.
Course Syllabus
Module 1: Introduction to Apache Iceberg
-
What is Apache Iceberg?
-
Iceberg vs Hive/Delta/Parquet tables
-
Installing and setting up Iceberg
Module 2: Core Architecture
-
Table format design
-
Metadata layers (snapshots, manifests)
-
ACID transactions in Iceberg
-
Partitioning strategies
Module 3: Table Operations
-
Creating Iceberg tables
-
Inserting and updating data
-
Deletes and incremental updates
-
Time travel and rollback
Module 4: Schema Evolution
-
Adding, renaming, and deleting columns
-
Partition evolution
-
Managing table versions
-
Handling large-scale schema changes
Module 5: Query Engines Integration
-
Iceberg with Apache Spark
-
Iceberg with Apache Flink
-
Iceberg with Trino and Presto
-
SQL queries and analytics
Module 6: Cloud & Storage Integration
-
Iceberg with Amazon S3
-
Iceberg with Google Cloud Storage (GCS)
-
Iceberg with Azure Data Lake (ADLS)
-
On-premises Hadoop compatibility
Module 7: Deployment & Scaling
-
Deploying Iceberg in production
-
Optimizing query performance
-
Compaction and garbage collection
-
Monitoring and observability
Module 8: Real-World Projects
-
Building a data lakehouse with Iceberg and Spark
-
Streaming ingestion with Flink + Iceberg
-
Time travel analytics for business intelligence
-
Multi-engine queries with Trino + Iceberg
Module 9: Best Practices & Future Trends
-
Iceberg vs Delta Lake vs Apache Hudi
-
Cost optimization strategies
-
Data governance and compliance
-
The future of open table formats
Apache Iceberg skills prepare learners for roles such as:
-
Data Engineer (big data pipelines)
-
Analytics Engineer (BI + data lakes)
-
Cloud Data Engineer (AWS/GCP/Azure)
-
Big Data Architect (lakehouse systems)
-
Machine Learning Engineer (data preparation at scale)
Iceberg is being rapidly adopted by companies like Netflix, Apple, and Adobe to power modern data platforms, making it a valuable and in-demand skill.
Learners will receive a Certificate of Completion from Uplatz, validating their expertise in Apache Iceberg and modern data lakehouse technologies. This certification demonstrates readiness for roles in data engineering, analytics, and big data platform development.
1. What is Apache Iceberg?
An open table format for large-scale analytics datasets, enabling schema evolution, time travel, and ACID transactions in data lakes.
2. How does Iceberg differ from Hive tables?
Iceberg supports schema evolution, partitioning, and ACID operations, while Hive tables are rigid and lack transactional reliability.
3. What are Iceberg’s key features?
-
Schema evolution
-
Time travel queries
-
Hidden partitioning
-
ACID transactions
4. What query engines support Iceberg?
Spark, Flink, Trino, Presto, and Hive.
5. What is time travel in Iceberg?
The ability to query past versions of a dataset using snapshots.
6. How does Iceberg achieve ACID transactions?
Through atomic snapshot replacement and metadata layers that track changes safely.
7. What storage systems work with Iceberg?
Amazon S3, Google Cloud Storage, Azure Data Lake, and HDFS.
8. What are the benefits of Iceberg?
-
Reliability at scale
-
Performance with big data
-
Compatibility with multiple engines
-
Open-source, vendor-neutral
9. What are challenges with Iceberg?
-
Complex setup compared to Hive
-
Relatively new ecosystem
-
Requires expertise in Spark/Flink integration
10. Where is Apache Iceberg being adopted?
By enterprises and tech leaders like Netflix, Apple, Adobe, and others modernizing their data lakehouse architectures.