Apache Iceberg

Apache Iceberg

Software Development

About us

Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses

Website
https://iceberg.apache.org/
Industry
Software Development
Company size
1 employee
Headquarters
California
Type
Nonprofit

Locations

Employees at Apache Iceberg

Updates

  • View organization page for Apache Iceberg, graphic

    14,683 followers

    Looking for a way to start your training path to learn #Iceberg? Alex Merced provides a list of high quality content 👇

    View profile for Alex Merced, graphic

    Best Selling Co-Author of “Apache Iceberg: The Definitive Guide” | Senior Tech Evangelist at Dremio (Data Lakehouse Evangelist) | Tech Content Creator

    RECENT DATA ARCHITECTURE/ENGINEERING/ANALYTICS CONTENT — Apache Iceberg — > What is Data Lakehouse Table Format? https://lnkd.in/eE3F_Gvq > Comparing Iceberg to Other Lakehouse Solutions https://lnkd.in/eDKXA4es > Iceberg Migration Guide https://lnkd.in/eYxTTsTz > Hands-on with Managed Polaris Catalog https://lnkd.in/e_YvxXrg > Hands-on with Self-Managed Polaris https://lnkd.in/eB_3aBks — Hybrid Lakehouse — > 3 Dremio Use Cases for On-Prem Data Lakes https://lnkd.in/ek-YS_jb > Hybrid Lakehouse Solution: NetApp https://lnkd.in/eUwsxQ_4 > Hybrid Lakehouse Solution: Minio https://lnkd.in/egknhrHH > Hybrid Lakehouse Solution: Vast Data https://lnkd.in/eE4WuQ-b > Hybrid Lakehouse Solution: Pure Storage https://lnkd.in/enrMw2di — Unified Analytics — > Analysts Guide to JDBC/ODBC, REST, and Arrow Flight https://lnkd.in/ePt9J9ZF > Unified Lakehouse https://lnkd.in/eyv5Rt2S #DataEngineering #DataLakehouse #DataScience #DataAnalytics #DataArchitecture

    • No alternative text description for this image
  • Apache Iceberg reposted this

    View profile for Rafal Mitula, graphic

    AWS Hero | Cloud Data Architect | Follow for Cloud & Data Engineering insights

    🚨 Thats huge! Processing streaming data and saving them into a Data Lakehouse just got much easier! Any real-time data stream that we push through Amazon Kinesis Data Streams or directly into Amazon Kinesis Firehose can be now (in preview) saved in Apache Iceberg format. Apache Iceberg adds ACID (atomicity, consistency, isolation, and durability) transactions, snapshots, time travel, schema evolution and more. It is designed to provide efficient and scalable data storage and analytics capabilities — particularly for big data workloads. Previously you could save streaming data on S3 only in plain file formats (like JSON, Parquet) under a specified partition (or dynamic partition) that often resulted with huge number of small files (it makes queries on it really inefficient). Now, being able to write in Apache Iceberg we can overcome this issues with AWS Glue Data Catalog automatic compaction of your Iceberg tables. AWS took a bet on Apache Iceberg as a leading table format that they now spread across many services (Amazon Security Lake, AWS HealthLake...) and to be honest i really like that! Remember: It's still in preview (since August 1st), so it's not recommended yet for production workloads. Fingers crossed for Kinesis team for a quick GA! ♻️ Recycle this post by like or share #aws #awscommunity #awshero #data #dataengineering

    • No alternative text description for this image
  • View organization page for Apache Iceberg, graphic

    14,683 followers

    Lakehouse Interoperability with Apache XTable (Incubating)

    View profile for Dipankar Mazumdar, M.Sc 🥑, graphic

    Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author

    Breaking down Apache XTable's Architecture. Apache XTable (Incubating) is an omni-directional translation layer on top of open table formats such as Apache Hudi, Apache Iceberg & Delta Lake. It is NOT ❌ a new table format! Essentially what we are doing is this: SOURCE ---> (read metadata) ---> XTable's Model ---> write into TARGET We read the metadata from the SOURCE table format, put it as a unified representation & write the metadata in the TARGET format. * Note that we are only touching metadata, not the actual data files (such as #Parquet) with XTable. Let's breakdown its architecture. XTable’s architecture consists of three key components: 1. Conversion Source: ✅ These are table format specific modules responsible for reading metadata from the source ✅ They extract information like schema, transactions, partitions & translate it into XTable’s unified internal representation 2. Conversion Logic: ✅ This is the central processing unit of XTable ✅ It orchestrates the entire translation process, including initializing of all components, managing sources and targets, among other critical things 3. Conversion Target: ✅ These mirror the source readers ✅ They take the internal representation of the metadata & maps it to the target format’s metadata structure Blog in comments for a detailed read. #dataengineering #softwareengineering

    • No alternative text description for this image
  • View organization page for Apache Iceberg, graphic

    14,683 followers

    View organization page for Upsolver, graphic

    5,532 followers

    Polaris Catalog is now open source and available for users to self-host or use as a managed service provided by Snowflake! 🔥 ✨ We at Upsolver, the optimized Apache Iceberg lakehouse platform, are excited to announce support for using Polaris Catalog with Upsolver! ✨ 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐏𝐨𝐥𝐚𝐫𝐢𝐬 𝐂𝐚𝐭𝐚𝐥𝐨𝐠? Polaris Catalog enables more open, secure lakehouse architectures with broad read-and-write interoperability and cross-engine access controls. Polaris Catalog is open source under the Apache 2.0 license and now available on GitHub: https://lnkd.in/grcGySDn. Snowflake’s managed service for Polaris Catalog is now available in public preview. 𝐖𝐡𝐲 𝐮𝐬𝐞 𝐏𝐨𝐥𝐚𝐫𝐢𝐬 𝐰𝐢𝐭𝐡 𝐔𝐩𝐬𝐨𝐥𝐯𝐞𝐫? Users of Upsolver can now configure Polaris Catalog as their default Iceberg Lakehouse catalog and begin to ingest data from databases, streams and files into a high-performance Iceberg lake with only a few clicks. Take advantage of Upsolver’s 𝑨𝒅𝒂𝒑𝒕𝒊𝒗𝒆 𝑶𝒑𝒕𝒊𝒎𝒊𝒛𝒆𝒓 to maximize your lakehouse query performance and cost savings. Learn more about configuring Polaris Catalog with Upsolver: https://lnkd.in/gHGSch_p

    • No alternative text description for this image
  • Apache Iceberg reposted this

    View profile for Vinija Jain, graphic

    Brand partnership

    🧑💻 Hands-on Workshop: Build Real-Time AI Apps on Apache Iceberg 🔷 Apache Iceberg is a high-performance format for managing massive analytic tables. It simplifies big data management, enabling engines like Apache Spark, Trino , Flink, Presto, Hive, and Impala to work safely with the same tables. It supports easy data merging, updates, and targeted deletes, ensuring flexible and efficient data handling. 🔷 Iceberg serves as an open table format for massive analytic datasets, acting as a middle layer between the computing (Flink, Spark) and storage layers (ORC, Parquet, Avro). This setup allows high-performance SQL table functionalities within these engines. 🔷 Coupled with a local deployment via a virtual private cloud (VPC) solution, Iceberg enables secure, reliable, real-time analytics, providing subsecond analytics and powering low-latency applications while ensuring top-notch security and data integrity. 📅 Workshop Details: - Date: Tuesday, July 30 - Time: 10-11am PT / 10:30-11:30pm IST 🔗 https://lnkd.in/gqgXuPK6 What You'll Learn: 🔷 Unfreeze your lakehouse for low-latency applications with native Apache Iceberg integration. 🔷 Faster vector search and improved full-text search techniques. 🔷 Strategies to scale your apps and reduce complexity with Autoscaling. 🔷 How to deploy SingleStore in your own VPC. ✅ Includes a live demo and code-share session. Register to receive the code and webinar recording even if you can't attend live.

  • Apache Iceberg reposted this

    View profile for Dipankar Mazumdar, M.Sc 🥑, graphic

    Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author

    Breaking down a Lakehouse Architecture - [New Blog] Lakehouse architectures are gaining a lot of traction. The promise is - - reliability & performance of data warehouses - scalability & cost-effectiveness of data lakes. Most importantly though, it is the open foundation & flexibility offered by table formats such as Apache Hudi, Apache Iceberg & Delta Lake on top of file formats like #Parquet. Since your data is stored in an 'independent data tier' in a lakehouse, it is open to pretty much any compute engine that is compatible with the formats. So, depending on the type of workload (BI, ML, streaming), you can bring your compute engine to the 'independent data tier'. This is actually the opposite of how we have been working traditionally until this point with analytical workloads. We load data to a proprietary database (OLAP/warehouse) & can use only that database's compute engine to process the data. What if we need to use any other compute in such environment? Well, we make data copies, i.e. export a subset of it and then load it to say data lakes (cloud storage) to hand over to other data teams. While lakehouses are becoming ubiquitous, there is also the hype and marketing jargons that comes with it. We often use different terms to refer to the same thing. Or make a complete different thing out of it. In the Hadoop world, this was the same with Apache Hive - There is a Hive query engine, Hive table format & Metastore (HMS). Nebulous right? In this new blog, I go back to the basics and explain what is a lakehouse and how it works. Things covered: ✅ Evolution of data systems ✅ Lakehouse components & advantages ✅ Implementation guide ✅ Use cases/applications ✅ Real-world examples ✅ Open Table Formats ✅ Vendor Platforms ✅ Future of lakehouse Check out the link in comments & reach out for any questions/clarifications. #dataengineering #softwareengineering

    • No alternative text description for this image

Similar pages