📢We are thrilled to announce the release of Apache Doris 2.1.0! For our long-term supportive users, allow me to re-introduce Apache Doris with its amazing new features and substantially improved data writing and query performance! For those who are new to Apache Doris, this is great timing for a proof of concept to see how it performs in your use case! Fasten up and be ready for: 🚶♂️ 100% faster out-of-the-box performance proven by TPC-DS benchmark tests 🚶♀️ Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark 🏃♂️ Solid support for semi-structured data analysis 🏃♀️ Materialized view across multiple tables to accelerate multi-table joins 💃 Enhanced real-time writing efficiency powered by AUTO_INCREMENT column, AUTO PARTITION, forward placement of MemTable, and Group Commit. 🕺 Better workload management for higher performance stability https://lnkd.in/gjVXD6gQ #database #dataengineering #analytics #bigdata #opensource
Apache Doris’ Post
More Relevant Posts
-
Ever wonder why you're moving data from lakes to warehouses for quicker queries? 🤔 It's a common fix but comes with its own set of challenges. Our article explores how data lakes have transformed to provide data warehouse-level performance, enabling in-place analysis and eliminating the need for expensive data transfers and complex governance. Check out our insights: https://hubs.la/Q02q-FQ_0 Keen to learn more or have questions? Join our live webinar tomorrow on how to achieve better lakehouse performance with Apache Iceberg and StarRocks: https://hubs.la/Q02q-zb80 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse
How to Seamlessly Accelerate Data Lake Queries
celerdata.wistia.com
To view or add a comment, sign in
-
Apache Hudi. Apache Iceberg. Delta Lake. Wouldn’t it be great to worry less about the open table format choices for your data lakehouse so you can focus more on driving business value with data analytics? Apache XTable (formally OneTable) is an open-source project that will enable the next level of interoperability in your data lakehouse. Read more: https://lnkd.in/g-JKZHMq #DataLakehouse
To view or add a comment, sign in
-
-
📢 Check out my latest article where I'm introducing Apache Parquet, a powerhouse in data storage and analytics. 📊 Discover its benefits like efficient storage, fast query performance, and schema evolution. 🏠 Plus, learn how to get started with Parquet using PySpark, converting data to Parquet format, and even tackling advanced use cases like house price prediction. 🏡 Join me on this data-driven journey as we unlock the secrets of efficient data processing and storage. Read the article now and level up your data game! #ApacheParquet #DataAnalytics #PySpark #DataScience #LinkedInArticle
The Power of Apache Parquet:Unlocking Efficiency and Performance
link.medium.com
To view or add a comment, sign in
-
data Lakehouse or we can say it data lake with addition layer (delta lake which is not only in market there is also Apache Iceberg and Apache Hudi). it is comes to market approximately 2020 actually it solves the separation of data lake and RDW (Relational Data Warehouse) . it is add (RDW) as layer above the data lake in format called delta lake (Parquet files in folder and transaction log to keep track all changes ) which give you many features one of it ability to make DML commands (which data lake doesn't give you native support for these operations ) #dataarchitecture #decipheringdataarchitectures
To view or add a comment, sign in
-
-
Lead Data Engineer (Spark| Python | Bigdata | Kafka | AWS | 2 x Databricks | Data Architect | SAS Viya | Technical Writer)
𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐟𝐫𝐨𝐦 𝐃𝐚𝐭𝐚 𝐰𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 𝐭𝐨 𝐃𝐚𝐭𝐚 𝐋𝐚𝐤𝐞 We know there are different data lake services like Apache Iceberg and Apache Hudi. But if you plan to move from a traditional data warehouse to a data lake, which one should you choose? In my latest Medium article, I cover when to choose which. Please read and share your comments! https://lnkd.in/gpJDxBYj #DataLakes #ApacheIceberg #ApacheHudi #DataWarehousing #BigData #DataEngineering
Choosing the Right Data Lake: Iceberg vs. Hudi for Transitioning from Data Warehouses to Data Lakes
medium.com
To view or add a comment, sign in
-
🌊 At the recent Apache Iceberg Summit, Ryan Nowakoski, Principal Data Engineer at Demandbase, presented key insights into overcoming technical hurdles with Apache Iceberg and highlighted the role and benefits of StarRocks/CelerData in their unified data platform. Watch the video to learn more: https://lnkd.in/gEP-MZj7 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse
Building a scalable, open source application data platform using Apache Iceberg (DemandBase)
https://www.youtube.com/
To view or add a comment, sign in
-
As much as we say Apache Doris is an all-in-one data platform that is capable of various analytics workloads, it is always more compelling to show a use case.🦄 See how Apache Doris speeds up data reporting, tagging, and data lake analytics.🧐 #database https://lnkd.in/gp5ZrSwU
To view or add a comment, sign in
-
-
In a world where vast volumes of data fuel insights, the strategic selection of a data storage format is crucial. When it comes to storing and querying large amounts of data, there are many storage options available. Apache Parquet and Apache Iceberg are popular general purpose big data file formats, while Apache Druid segments are more specific for analytics. As the data landscape continues to evolve, the choice between these technologies will largely depend on specific use case requirements. So which one is right for you? For scenarios prioritizing efficient storage and complex read operations, Parquet may be an ideal solution. When dealing with large, dynamic datasets requiring robust data management features, Iceberg stands out. While for applications where real-time analytics and immediate data availability are crucial, Druid’s segmentation model offers unparalleled advantages. In @[Rick](urn:li:person:wNqImSbT2q)'s new blog, he gives a detailed overview of each choice. He covers key features, benefits, defining characteristics, and provides a table comparing the file formats. Dive in and explore the characteristics of {hashtag|\#|ApacheParquet}, {hashtag|\#|ApacheIceberg} and {hashtag|\#|ApacheDruid} segments here: https://lnkd.in/eesDXRAN
To view or add a comment, sign in
-
-
In our latest discussion with TFiR, Sida Shen highlighted the problems associated with heavily relying on complex data pipelines as a solution for database challenges. While these pipelines can address certain issues, they often introduce new problems such as increased costs, added complexity, and governance challenges. 🔎 Case in Point: 🌊 Data Lake Analytics: Despite innovations like Apache Hudi and Apache Iceberg, data lakes still struggle with efficient low-latency queries. This gap forces businesses to replicate data into separate systems, resulting in inefficiencies and elevated costs. To delve deeper into this issue: https://lnkd.in/gdSRCjwu 📊 Multi-Table Joins: A significant obstacle in real-time analytics, where the cost and complexity of executing multi-table joins at scale are substantial. The common workaround? Creating extensive pre-joined tables, which is not only resource-intensive but also adds to the rigidity and maintenance burden. Learn how to go pipeline-free with real-time analytics: https://lnkd.in/eJfza5b6 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse
To view or add a comment, sign in
-
More data more problems. Efficiently managing large datasets is more critical than ever. This afternoon, Alex Merced presented on Apache Iceberg, an interesting tool that solves several challenges I’ve run into myself. - How to manage schema changes at scale? - Where to store table and file level metadata? - How to integrate that metadata into execution engines? Apache Iceberg is a metadata format and SDKs and APIs for communicating and managing the metadata. The metadata includes partitioning schemes, table schemas, and various statistics on a per-file basis. This capability simplifies schema evolution. Traditionally, schema changes require rewriting and maybe re-ETLing entire datasets. Iceberg enables schema modifications without impacting existing data. This means you can seamlessly add columns or make other schema adjustments while ensuring that queries adapt to the schema of each data file accordingly. For example, if a new column is added, Iceberg allows queries to ignore this column in older files where it doesn't exist, ensuring data integrity and query accuracy. For those interested in exploring further, I highly recommend looking into Alex Merced's talk and the wealth of resources available on Apache Iceberg. If you already use it, I would love to chat with you about it. How do you manage schema evolution in your data lake or warehouse? I’m keen to hear about the strategies and tools you've found effective. https://lnkd.in/eSvNPu7Z #apacheiceberg
Data Engineer's Lunch 107: Exploring the Apache Iceberg Ecosystem
https://www.youtube.com/
To view or add a comment, sign in
Meet the amazing Apache Doris developers and users on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog