Apache XTable (Incubating)

Data Infrastructure and Analytics

Menlo Park, CA 4,766 followers

Seamless cross-table interop between Apache Hudi, Delta Lake, and Apache Iceberg

About us

Apache XTable (Incubating) is a cross-table omni-directional interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. XTable is formerly known as and recently renamed from OneTable. XTable is NOT a new or separate format, XTable provides abstractions and tools for the translation of lakehouse table format metadata. Choosing a table formats is a costly evaluation. Each project has rich features that may fit different use-cases. Some vendors use a table format as a point of lock-in. Your data should be UNIVERSAL! https://github.com/apache/incubator-xtable

Website: https://xtable.apache.org
External link for Apache XTable (Incubating)
Industry: Data Infrastructure and Analytics
Company size: 11-50 employees
Headquarters: Menlo Park, CA
Type: Partnership
Founded: 2023
Specialties: Data Lakehouse, Data Engineering, Lakehouse, Apache Iceberg, Apache Hudi, Delta Lake, Apache Spark, Trino, Apache Flink, and Presto

Locations

Primary

Menlo Park, CA 94025, US

Get directions

Updates

Apache XTable (Incubating) reposted this

Sagar Lakshmipathy

Solutions Engineering @ Onehouse | We're Hiring!
3d Edited
Report this post
If you're using Apache Hudi/Apache Iceberg and want to get to Delta Lake and want to use OSS Unity Catalog, you could use Apache XTable (Incubating) to convert the Hudi/Iceberg metadata to Delta Lake and then use UC's APIs to register in the catalog. These tables can then be queried from DuckDB as well. Docs: https://lnkd.in/gHhWNFQn
Like Comment Share
Apache XTable (Incubating)

4,766 followers
3d
Report this post
Nice Blog from Alex Merced at Dremio on how Apache XTable works in action and how to sync XTable-synced tables to Dremio's native catalog or open source Nessie 🔥 🔗 Blog Link: https://lnkd.in/deSd5qHG #dataengineering #softwareengineering #lakehouse
Like Comment Share
Apache XTable (Incubating) reposted this

Lokesh Venkenddini

||Solutions Architect ||Senior Data engineer|| Databricks ||Azure|| Ryerson University alumni ||Entrepreneur ||
6d Edited
Report this post
Got sometime to look at XTable this weekend and here is a short summary. 🌟Apache XTable: A New Era of Interoperability Between Open Table Formats 🌟 What is Apache XTable? Apache XTable is an innovative tool designed for seamless interoperability between various lakehouse table formats, such as Apache Hudi, Iceberg, and Delta Lake. It enables users to write data in any format and convert it to multiple target formats without data duplication.Apache XTable simplifies the complexities of data management in a lakehouse environment, making it a valuable addition for businesses leveraging big data analytics. Key Features: Real-time Replication: Achieve transparent and real-time replication in any direction. Accurate and Lossless: Ensure an accurate and lossless model for your data. Extensibility: Designed to be flexible and extensible to support future formats and versions. Community-driven: Built by a neutral and inclusive community of vendors, cloud providers, and users. How It Works: 1. Setup and Configuration: - Download and install Apache XTable from the official repository. - Create a configuration file (e.g., `datasetConfig.yaml`) specifying your source and target formats. ""sourceFormat: delta targetFormats: - iceberg sourcePath: s3://my-bucket/delta-table"" 2. Running XTable: - Execute the XTable command using the Java binary, pointing to your configuration file. ""java -jar xtable.jar --config datasetConfig.yaml"" 3. Data Synchronization: - XTable supports incremental and full synchronization modes. Incremental mode is preferred for efficiency, syncing only new commits from the source. 4. Metadata Management: - It maintains metadata for the target formats, ensuring schema updates and statistics are accurately reflected. 5. Integration: - XTable can be integrated into data pipelines, such as Apache Airflow, allowing automated data processing. Benefits: High Throughput: Handles large volumes of structured data efficiently. Flexibility: Supports multiple formats, enhancing data accessibility. Cost-Effective: Avoids data duplication, reducing storage costs. Current Status: Supported formats include Apache Hudi, Apache Iceberg, and Delta Lake. Compatible with platforms like Apache Spark, Trino, Microsoft Fabric, Databricks, BigQuery, Snowflake, and Redshift. Features like on-demand incremental conversion, copy-on-write, and catalog integration are already in place. #ApacheXTable #DataInteroperability #DataLakehouse #OpenSource #BigData #DataManagement #JoinTheRevolution

Like Comment Share
Apache XTable (Incubating) reposted this

Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author
1w
Report this post
Breaking down Apache XTable's Architecture. Apache XTable (Incubating) is an omni-directional translation layer on top of open table formats such as Apache Hudi, Apache Iceberg & Delta Lake. It is NOT ❌ a new table format! Essentially what we are doing is this: SOURCE ---> (read metadata) ---> XTable's Model ---> write into TARGET We read the metadata from the SOURCE table format, put it as a unified representation & write the metadata in the TARGET format. * Note that we are only touching metadata, not the actual data files (such as #Parquet) with XTable. Let's breakdown its architecture. XTable’s architecture consists of three key components: 1. Conversion Source: ✅ These are table format specific modules responsible for reading metadata from the source ✅ They extract information like schema, transactions, partitions & translate it into XTable’s unified internal representation 2. Conversion Logic: ✅ This is the central processing unit of XTable ✅ It orchestrates the entire translation process, including initializing of all components, managing sources and targets, among other critical things 3. Conversion Target: ✅ These mirror the source readers ✅ They take the internal representation of the metadata & maps it to the target format’s metadata structure Blog in comments for a detailed read. #dataengineering #softwareengineering
3 Comments

Like Comment Share
Apache XTable (Incubating)

4,766 followers
2w
Report this post
Apache XTable provides users with the ability to translate metadata from one #lakehouse table format to another omni-directionally. What exactly happens after the XTable "Sync" Process is run? The sync process provides users with the following: ✅ Syncs the data files along with their column-level statistics and partition metadata information ✅ All the schema-level updates in the source table are reflected on to the target format metadata ✅ Metadata maintenance for the target table format. - If the target format is Apache Hudi, unreferenced files will be marked as ‘cleaned’ to control metadata table size - If the target format is Apache Iceberg, snapshots will be expired after a configured amount of time - If the target format is Delta Lake, the transaction log will be retained for a configured amount of time ⭐️ Want to try out XTable? - Here is a link to the getting started page: https://lnkd.in/gHMBQeqV #dataengineering #softwareengineering
1 Comment

Like Comment Share
Apache XTable (Incubating) reposted this

Thomas Hass

👨💻 Cloud Data & AI Engineer / Architect | 8x AWS certified
3w Edited
Report this post
✨ Easily Switch Between Iceberg, Hudi and Delta in Your Data Platforms using Apache XTable Modern data platforms bet on open table formats to gain data warehouse-like behaviors, such as ACID transactions on their Parquet files in cheap cloud object stores. There are three leading table formats: Apache Iceberg, Apache Hudi and Delta Lake. The general concept of these table formats is very similar, as they all provide a metadata layer on top of the data. Some use cases, as well as some query engines and vendors, favor one table format over the others. This becomes a challenge in large organizations when different teams build their solutions on different table formats. This is where Apache XTable (Incubating) comes into play: Apache XTable solves this by providing a converter from any one of the three table formats to any other one (without touching the actual data). As there are not many practical demos showing how this could work, I have uploaded a YouTube video with an end-to-end guide where we start with generating data in Hudi using AWS Glue and transform it to Iceberg and Delta with XTable, and then read it with Snowflake (Iceberg) and Databricks (Delta). Check out the video and let me know what you think! 🔗 https://lnkd.in/d7cfVpj5 #Delta #Iceberg #Hudi #XTable
10 Comments

Like Comment Share
Apache XTable (Incubating) reposted this

Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author
3w
Report this post
Lakehouse Table Formats interoperability in Databricks. 🎯Scenario: Users in a Databricks environment uses Delta Lake to build ML workflows. However, there are certain scenarios when they need to access datasets stored in other formats like Apache Hudi & Apache Iceberg used by other teams to build robust ML models. So they should have an easy way to read any table format data without having to configure other formats or any other dependencies. ✅ This is where Apache XTable (Incubating) comes into the picture. XTable does a lightweight metadata translation that makes reading any table format as if they were Delta Lake tables. Like: spark.read.format(“delta”).load(“/mnt/mydata/<table_name>”) Read all about the use case and how-to in my blog (link in comments) #dataengineering #softwareengineering
5 Comments

Like Comment Share
Apache XTable (Incubating) reposted this

Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineering Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Distributed Systems | Technical Author
3w
Report this post
Apache XTable - What’s, Why’s and How’s? Apache Hudi, Apache Iceberg & Delta Lake provide a table-like abstraction on top of the native file formats like #Parquet. They serve as a metadata layer and offer necessary primitives for compute engines to interact with the storage. While these formats have enabled organizations to store data in an independent open tier, decoupled from compute, the decision to select and stick to one particular format is challenging. The question is — ‘Is there another way?’ I jotted all of it in the blog linked in comments. #dataengineering #softwareengineering
4 Comments

Like Comment Share
Apache XTable (Incubating) reposted this

Data Council

5,636 followers
3w
Report this post
"One table to bridge them all!" 🧙♂️✨ Ever wish you could seamlessly switch between Hudi, Delta Lake and Iceberg? The open-source project XTable makes it possible. Kyle Weller, Head of Product at Onehouse, took the stage at Data Council '24 to showcase how XTable works. This innovative project enables seamless interoperability across these major data lake formats, making your data management more efficient and versatile. Kyle's session includes: - A live demo of XTable in action - Real-world applications across Spark, Presto, Trino and Flink - Insights into the strengths and weaknesses of Hudi, Delta and Iceberg Watch the full session at the link in the comments.
2 Comments

Like Comment Share
Apache XTable (Incubating) reposted this

Georgi Mullassery

Martech I Director - Analytics at IPG | Ex-IBMer
3w
Report this post
Migrating from Iceberg to Delta for Efficient Incremental Order Processing ---------------------- An e-commerce company stores customer order data in an Apache Iceberg table on #AWSGlueCatalog. This data is critical for order fulfillment, inventory management, and customer insights. The company processes a high volume of orders daily, and needs an efficient system to handle updates. Initial Setup: The data team creates an Iceberg table in Glue Catalog to store the order data. Migration to Delta: To streamline incremental order processing, the team decides to convert the Iceberg table to a Delta table using X-Table. This one-time conversion creates a Delta table with the same data as the Iceberg table. Loading New Orders: Daily, new orders are placed on the e-commerce platform. The data team continues to load this new data into the original Iceberg table. Incremental Order Processing with Delta: Since the Iceberg table now points to the Delta table behind the scenes (thanks to X-Table), any queries or data loads targeting the Iceberg table automatically interact with the Delta table. X-Table efficiently identifies and loads only the incremental data (new or updated orders) into the Delta table. This approach offers several benefits: Simplified Data Pipeline: The data team can maintain their existing Iceberg table-based data pipeline for loading new orders. Efficient Incremental Processing: X-Table ensures only incremental data is loaded into Delta, optimizing processing time and costs associated with big data. Unlocking Delta Lake Features: The team can now leverage Delta Lake's functionalities like ACID transactions, data versioning, and efficient data repairs for better data management and analytics, ensuring data accuracy for critical business decisions.

Like Comment Share

Apache XTable (Incubating)

Data Infrastructure and Analytics

Menlo Park, CA 4,766 followers

Seamless cross-table interop between Apache Hudi, Delta Lake, and Apache Iceberg

About us

Locations

Updates

Join now to see what you are missing

Similar pages

Onehouse

Apache Hudi

Apache Iceberg

Delta Lake

Tabular (now part of Databricks)

Art of Data Engineering

Apache Doris

Polars

Apache Iceberg Workshops

DuckDB