Apache Iceberg

Software Development

View 1 employee

About us

Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses

Website: https://iceberg.apache.org/
External link for Apache Iceberg
Industry: Software Development
Company size: 1 employee
Headquarters: California
Type: Nonprofit

Locations

Primary

California, US

Get directions

Employees at Apache Iceberg

Brian Оlsen

US marine 🔀 developer 🔀 open source advocate | open standards🔏 | fedi 🕸️ | adhd 🧠 | data + ml 📊 | musician 🎸 | odd duck 🦆

See all employees

Updates

Apache Iceberg

14,683 followers
15h
Report this post
Looking for a way to start your training path to learn #Iceberg? Alex Merced provides a list of high quality content 👇
Alex Merced

Best Selling Co-Author of “Apache Iceberg: The Definitive Guide” | Senior Tech Evangelist at Dremio (Data Lakehouse Evangelist) | Tech Content Creator
1d

RECENT DATA ARCHITECTURE/ENGINEERING/ANALYTICS CONTENT — Apache Iceberg — > What is Data Lakehouse Table Format? https://lnkd.in/eE3F_Gvq > Comparing Iceberg to Other Lakehouse Solutions https://lnkd.in/eDKXA4es > Iceberg Migration Guide https://lnkd.in/eYxTTsTz > Hands-on with Managed Polaris Catalog https://lnkd.in/e_YvxXrg > Hands-on with Self-Managed Polaris https://lnkd.in/eB_3aBks — Hybrid Lakehouse — > 3 Dremio Use Cases for On-Prem Data Lakes https://lnkd.in/ek-YS_jb > Hybrid Lakehouse Solution: NetApp https://lnkd.in/eUwsxQ_4 > Hybrid Lakehouse Solution: Minio https://lnkd.in/egknhrHH > Hybrid Lakehouse Solution: Vast Data https://lnkd.in/eE4WuQ-b > Hybrid Lakehouse Solution: Pure Storage https://lnkd.in/enrMw2di — Unified Analytics — > Analysts Guide to JDBC/ODBC, REST, and Arrow Flight https://lnkd.in/ePt9J9ZF > Unified Lakehouse https://lnkd.in/eyv5Rt2S #DataEngineering #DataLakehouse #DataScience #DataAnalytics #DataArchitecture
Like Comment Share
Apache Iceberg reposted this

Rafal Mitula

AWS Hero | Cloud Data Architect | Follow for Cloud & Data Engineering insights
3d
Report this post
🚨 Thats huge! Processing streaming data and saving them into a Data Lakehouse just got much easier! Any real-time data stream that we push through Amazon Kinesis Data Streams or directly into Amazon Kinesis Firehose can be now (in preview) saved in Apache Iceberg format. Apache Iceberg adds ACID (atomicity, consistency, isolation, and durability) transactions, snapshots, time travel, schema evolution and more. It is designed to provide efficient and scalable data storage and analytics capabilities — particularly for big data workloads. Previously you could save streaming data on S3 only in plain file formats (like JSON, Parquet) under a specified partition (or dynamic partition) that often resulted with huge number of small files (it makes queries on it really inefficient). Now, being able to write in Apache Iceberg we can overcome this issues with AWS Glue Data Catalog automatic compaction of your Iceberg tables. AWS took a bet on Apache Iceberg as a leading table format that they now spread across many services (Amazon Security Lake, AWS HealthLake...) and to be honest i really like that! Remember: It's still in preview (since August 1st), so it's not recommended yet for production workloads. Fingers crossed for Kinesis team for a quick GA! ♻️ Recycle this post by like or share #aws #awscommunity #awshero #data #dataengineering
9 Comments

Like Comment Share
Apache Iceberg reposted this

Danny McKean

Strategic Account Executive | Cloud-Native Analytics
5d
Report this post
Interested in learning how you can leverage Apache Iceberg to optimize overhead and software costs? Join us for a free 2-Hour Virtual Hands-On Lab on August 14th. https://lnkd.in/gjdnYmai
3 Comments

Like Comment Share
Apache Iceberg

14,683 followers
6d
Report this post
Lakehouse Interoperability with Apache XTable (Incubating)
Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author
1w

Breaking down Apache XTable's Architecture. Apache XTable (Incubating) is an omni-directional translation layer on top of open table formats such as Apache Hudi, Apache Iceberg & Delta Lake. It is NOT ❌ a new table format! Essentially what we are doing is this: SOURCE ---> (read metadata) ---> XTable's Model ---> write into TARGET We read the metadata from the SOURCE table format, put it as a unified representation & write the metadata in the TARGET format. * Note that we are only touching metadata, not the actual data files (such as #Parquet) with XTable. Let's breakdown its architecture. XTable’s architecture consists of three key components: 1. Conversion Source: ✅ These are table format specific modules responsible for reading metadata from the source ✅ They extract information like schema, transactions, partitions & translate it into XTable’s unified internal representation 2. Conversion Logic: ✅ This is the central processing unit of XTable ✅ It orchestrates the entire translation process, including initializing of all components, managing sources and targets, among other critical things 3. Conversion Target: ✅ These mirror the source readers ✅ They take the internal representation of the metadata & maps it to the target format’s metadata structure Blog in comments for a detailed read. #dataengineering #softwareengineering
1 Comment

Like Comment Share
Apache Iceberg reposted this

James Malone

Product @ Snowflake ❄️
1w
Report this post
Heard about Apache Iceberg and curious why it's so popular all of a sudden? Wondering why enterprises (and data platforms) are moving quickly to adopt Iceberg? I have some hot takes on a cool topic. https://lnkd.in/gjDYNHsB

Why Apache Iceberg is on fire right now

infoworld.com

Like Comment Share
Apache Iceberg

14,683 followers
1w
Report this post
Polaris Catalog!
Upsolver

5,532 followers
1w

Polaris Catalog is now open source and available for users to self-host or use as a managed service provided by Snowflake! 🔥 ✨ We at Upsolver, the optimized Apache Iceberg lakehouse platform, are excited to announce support for using Polaris Catalog with Upsolver! ✨ 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐏𝐨𝐥𝐚𝐫𝐢𝐬 𝐂𝐚𝐭𝐚𝐥𝐨𝐠? Polaris Catalog enables more open, secure lakehouse architectures with broad read-and-write interoperability and cross-engine access controls. Polaris Catalog is open source under the Apache 2.0 license and now available on GitHub: https://lnkd.in/grcGySDn. Snowflake’s managed service for Polaris Catalog is now available in public preview. 𝐖𝐡𝐲 𝐮𝐬𝐞 𝐏𝐨𝐥𝐚𝐫𝐢𝐬 𝐰𝐢𝐭𝐡 𝐔𝐩𝐬𝐨𝐥𝐯𝐞𝐫? Users of Upsolver can now configure Polaris Catalog as their default Iceberg Lakehouse catalog and begin to ingest data from databases, streams and files into a high-performance Iceberg lake with only a few clicks. Take advantage of Upsolver’s 𝑨𝒅𝒂𝒑𝒕𝒊𝒗𝒆 𝑶𝒑𝒕𝒊𝒎𝒊𝒛𝒆𝒓 to maximize your lakehouse query performance and cost savings. Learn more about configuring Polaris Catalog with Upsolver: https://lnkd.in/gHGSch_p
Like Comment Share
Apache Iceberg reposted this

Vinija Jain

Brand partnership
1w
Report this post
🧑💻 Hands-on Workshop: Build Real-Time AI Apps on Apache Iceberg 🔷 Apache Iceberg is a high-performance format for managing massive analytic tables. It simplifies big data management, enabling engines like Apache Spark, Trino , Flink, Presto, Hive, and Impala to work safely with the same tables. It supports easy data merging, updates, and targeted deletes, ensuring flexible and efficient data handling. 🔷 Iceberg serves as an open table format for massive analytic datasets, acting as a middle layer between the computing (Flink, Spark) and storage layers (ORC, Parquet, Avro). This setup allows high-performance SQL table functionalities within these engines. 🔷 Coupled with a local deployment via a virtual private cloud (VPC) solution, Iceberg enables secure, reliable, real-time analytics, providing subsecond analytics and powering low-latency applications while ensuring top-notch security and data integrity. 📅 Workshop Details: - Date: Tuesday, July 30 - Time: 10-11am PT / 10:30-11:30pm IST 🔗 https://lnkd.in/gqgXuPK6 What You'll Learn: 🔷 Unfreeze your lakehouse for low-latency applications with native Apache Iceberg integration. 🔷 Faster vector search and improved full-text search techniques. 🔷 Strategies to scale your apps and reduce complexity with Autoscaling. 🔷 How to deploy SingleStore in your own VPC. ✅ Includes a live demo and code-share session. Register to receive the code and webinar recording even if you can't attend live.

1 Comment

Like Comment Share
Apache Iceberg reposted this

Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author
2w
Report this post
Breaking down a Lakehouse Architecture - [New Blog] Lakehouse architectures are gaining a lot of traction. The promise is - - reliability & performance of data warehouses - scalability & cost-effectiveness of data lakes. Most importantly though, it is the open foundation & flexibility offered by table formats such as Apache Hudi, Apache Iceberg & Delta Lake on top of file formats like #Parquet. Since your data is stored in an 'independent data tier' in a lakehouse, it is open to pretty much any compute engine that is compatible with the formats. So, depending on the type of workload (BI, ML, streaming), you can bring your compute engine to the 'independent data tier'. This is actually the opposite of how we have been working traditionally until this point with analytical workloads. We load data to a proprietary database (OLAP/warehouse) & can use only that database's compute engine to process the data. What if we need to use any other compute in such environment? Well, we make data copies, i.e. export a subset of it and then load it to say data lakes (cloud storage) to hand over to other data teams. While lakehouses are becoming ubiquitous, there is also the hype and marketing jargons that comes with it. We often use different terms to refer to the same thing. Or make a complete different thing out of it. In the Hadoop world, this was the same with Apache Hive - There is a Hive query engine, Hive table format & Metastore (HMS). Nebulous right? In this new blog, I go back to the basics and explain what is a lakehouse and how it works. Things covered: ✅ Evolution of data systems ✅ Lakehouse components & advantages ✅ Implementation guide ✅ Use cases/applications ✅ Real-world examples ✅ Open Table Formats ✅ Vendor Platforms ✅ Future of lakehouse Check out the link in comments & reach out for any questions/clarifications. #dataengineering #softwareengineering
17 Comments

Like Comment Share
Apache Iceberg

14,683 followers
2w
Report this post
How Z-Ordering in Apache Iceberg Helps Improve Performance by Dremio https://lnkd.in/d4uXGwiW

How Z-Ordering in Apache Iceberg Helps Improve Performance | Dremio

https://www.dremio.com

Like Comment Share
Apache Iceberg reposted this

Youcef Boudouh

Strategic Account Executive at SingleStore
3w
Report this post
Amazing article published by Dave Eyler on how SingleStore’s bidirectional integration to Apache Iceberg can help you get the most of the data stored into your data lakehouse to power your mission critical applications and still guaranteeing sub-second SLA. https://lnkd.in/e9XWJwya

Unfreeze Apache Iceberg to Thaw Your Data Lakehouse

https://thenewstack.io

Like Comment Share

Apache Iceberg

Software Development

About us

Locations

Employees at Apache Iceberg

Brian Оlsen

US marine 🔀 developer 🔀 open source advocate | open standards🔏 | fedi 🕸️ | adhd 🧠 | data + ml 📊 | musician 🎸 | odd duck 🦆

Updates

Why Apache Iceberg is on fire right now

infoworld.com

How Z-Ordering in Apache Iceberg Helps Improve Performance | Dremio

https://www.dremio.com

Unfreeze Apache Iceberg to Thaw Your Data Lakehouse

https://thenewstack.io

Join now to see what you are missing

Similar pages

Delta Lake

Tabular (now part of Databricks)

Apache Hudi

DuckDB

Databricks

Apache XTable (Incubating)

Apache Airflow

dbt Labs

Apache Iceberg Workshops

Snowflake