Apache Doris’ Post

View organization page for Apache Doris, graphic

2,347 followers

5mo

📢We are thrilled to announce the release of Apache Doris 2.1.0! For our long-term supportive users, allow me to re-introduce Apache Doris with its amazing new features and substantially improved data writing and query performance! For those who are new to Apache Doris, this is great timing for a proof of concept to see how it performs in your use case! Fasten up and be ready for: 🚶♂️ 100% faster out-of-the-box performance proven by TPC-DS benchmark tests 🚶♀️ Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark 🏃♂️ Solid support for semi-structured data analysis 🏃♀️ Materialized view across multiple tables to accelerate multi-table joins 💃 Enhanced real-time writing efficiency powered by AUTO_INCREMENT column, AUTO PARTITION, forward placement of MemTable, and Group Commit. 🕺 Better workload management for higher performance stability https://lnkd.in/gjVXD6gQ #database #dataengineering #analytics #bigdata #opensource

Another big leap: Apache Doris 2.1.0 is released - Apache Doris

doris.apache.org

4 Comments

Apache Doris

5mo

Meet the amazing Apache Doris developers and users on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-1t3wfymur-0soNPATWQ~gbU8xutFOLog

1 Reaction

Henrico Bekker

Engineering Manager @ takealot,

5mo

Scott Hall Chris Myburgh

3 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

CelerData

7,398 followers
4mo
Report this post
Ever wonder why you're moving data from lakes to warehouses for quicker queries? 🤔 It's a common fix but comes with its own set of challenges. Our article explores how data lakes have transformed to provide data warehouse-level performance, enabling in-place analysis and eliminating the need for expensive data transfers and complex governance. Check out our insights: https://hubs.la/Q02q-FQ_0 Keen to learn more or have questions? Join our live webinar tomorrow on how to achieve better lakehouse performance with Apache Iceberg and StarRocks: https://hubs.la/Q02q-zb80 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse

How to Seamlessly Accelerate Data Lake Queries

celerdata.wistia.com
Like Comment
To view or add a comment, sign in
Open Data Blend

270 followers
5mo
Report this post
Apache Hudi. Apache Iceberg. Delta Lake. Wouldn’t it be great to worry less about the open table format choices for your data lakehouse so you can focus more on driving business value with data analytics? Apache XTable (formally OneTable) is an open-source project that will enable the next level of interoperability in your data lakehouse. Read more: https://lnkd.in/g-JKZHMq #DataLakehouse
Like Comment
To view or add a comment, sign in
Zwelibanzi Ntongana

Data Engineer/Data analyst/Data Science
11mo
Report this post
📢 Check out my latest article where I'm introducing Apache Parquet, a powerhouse in data storage and analytics. 📊 Discover its benefits like efficient storage, fast query performance, and schema evolution. 🏠 Plus, learn how to get started with Parquet using PySpark, converting data to Parquet format, and even tackling advanced use cases like house price prediction. 🏡 Join me on this data-driven journey as we unlock the secrets of efficient data processing and storage. Read the article now and level up your data game! #ApacheParquet #DataAnalytics #PySpark #DataScience #LinkedInArticle

The Power of Apache Parquet:Unlocking Efficiency and Performance

link.medium.com
Like Comment
To view or add a comment, sign in
islam mohammed

CDMP | Business Intelligence & Data Engineering Consultant at The Royal Commission for AlUla
6mo Edited
Report this post
data Lakehouse or we can say it data lake with addition layer (delta lake which is not only in market there is also Apache Iceberg and Apache Hudi). it is comes to market approximately 2020 actually it solves the separation of data lake and RDW (Relational Data Warehouse) . it is add (RDW) as layer above the data lake in format called delta lake (Parquet files in folder and transaction log to keep track all changes ) which give you many features one of it ability to make DML commands (which data lake doesn't give you native support for these operations ) #dataarchitecture #decipheringdataarchitectures
Like Comment
To view or add a comment, sign in
Venkatakrishnan Ramanan

Lead Data Engineer (Spark| Python | Bigdata | Kafka | AWS | 2 x Databricks | Data Architect | SAS Viya | Technical Writer)
2mo
Report this post
𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐟𝐫𝐨𝐦 𝐃𝐚𝐭𝐚 𝐰𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 𝐭𝐨 𝐃𝐚𝐭𝐚 𝐋𝐚𝐤𝐞 We know there are different data lake services like Apache Iceberg and Apache Hudi. But if you plan to move from a traditional data warehouse to a data lake, which one should you choose? In my latest Medium article, I cover when to choose which. Please read and share your comments! https://lnkd.in/gpJDxBYj #DataLakes #ApacheIceberg #ApacheHudi #DataWarehousing #BigData #DataEngineering

Choosing the Right Data Lake: Iceberg vs. Hudi for Transitioning from Data Warehouses to Data Lakes

medium.com

3 Comments
Like Comment
To view or add a comment, sign in
CelerData

7,398 followers
2mo Edited
Report this post
🌊 At the recent Apache Iceberg Summit, Ryan Nowakoski, Principal Data Engineer at Demandbase, presented key insights into overcoming technical hurdles with Apache Iceberg and highlighted the role and benefits of StarRocks/CelerData in their unified data platform. Watch the video to learn more: https://lnkd.in/gEP-MZj7 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse

Building a scalable, open source application data platform using Apache Iceberg (DemandBase)

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Apache Doris

2,347 followers
7mo
Report this post
As much as we say Apache Doris is an all-in-one data platform that is capable of various analytics workloads, it is always more compelling to show a use case.🦄 See how Apache Doris speeds up data reporting, tagging, and data lake analytics.🧐 #database https://lnkd.in/gp5ZrSwU
Like Comment
To view or add a comment, sign in
Imply

14,937 followers
6mo Edited
Report this post
In a world where vast volumes of data fuel insights, the strategic selection of a data storage format is crucial. When it comes to storing and querying large amounts of data, there are many storage options available. Apache Parquet and Apache Iceberg are popular general purpose big data file formats, while Apache Druid segments are more specific for analytics. As the data landscape continues to evolve, the choice between these technologies will largely depend on specific use case requirements. So which one is right for you? For scenarios prioritizing efficient storage and complex read operations, Parquet may be an ideal solution. When dealing with large, dynamic datasets requiring robust data management features, Iceberg stands out. While for applications where real-time analytics and immediate data availability are crucial, Druid’s segmentation model offers unparalleled advantages. In @[Rick](urn:li:person:wNqImSbT2q)'s new blog, he gives a detailed overview of each choice. He covers key features, benefits, defining characteristics, and provides a table comparing the file formats. Dive in and explore the characteristics of {hashtag|\#|ApacheParquet}, {hashtag|\#|ApacheIceberg} and {hashtag|\#|ApacheDruid} segments here: https://lnkd.in/eesDXRAN
Like Comment
To view or add a comment, sign in
CelerData

7,398 followers
6mo Edited
Report this post
In our latest discussion with TFiR, Sida Shen highlighted the problems associated with heavily relying on complex data pipelines as a solution for database challenges. While these pipelines can address certain issues, they often introduce new problems such as increased costs, added complexity, and governance challenges. 🔎 Case in Point: 🌊 Data Lake Analytics: Despite innovations like Apache Hudi and Apache Iceberg, data lakes still struggle with efficient low-latency queries. This gap forces businesses to replicate data into separate systems, resulting in inefficiencies and elevated costs. To delve deeper into this issue: https://lnkd.in/gdSRCjwu 📊 Multi-Table Joins: A significant obstacle in real-time analytics, where the cost and complexity of executing multi-table joins at scale are substantial. The common workaround? Creating extensive pre-joined tables, which is not only resource-intensive but also adds to the rigidity and maintenance burden. Learn how to go pipeline-free with real-time analytics: https://lnkd.in/eJfza5b6 #DataAnalytics #DataEngineering #DataLakeAnalytics #DataLake #DataLakeHouse
Like Comment
To view or add a comment, sign in
Robert R.

Fractional CTO with Solint
5mo
Report this post
More data more problems. Efficiently managing large datasets is more critical than ever. This afternoon, Alex Merced presented on Apache Iceberg, an interesting tool that solves several challenges I’ve run into myself. - How to manage schema changes at scale? - Where to store table and file level metadata? - How to integrate that metadata into execution engines? Apache Iceberg is a metadata format and SDKs and APIs for communicating and managing the metadata. The metadata includes partitioning schemes, table schemas, and various statistics on a per-file basis. This capability simplifies schema evolution. Traditionally, schema changes require rewriting and maybe re-ETLing entire datasets. Iceberg enables schema modifications without impacting existing data. This means you can seamlessly add columns or make other schema adjustments while ensuring that queries adapt to the schema of each data file accordingly. For example, if a new column is added, Iceberg allows queries to ignore this column in older files where it doesn't exist, ensuring data integrity and query accuracy. For those interested in exploring further, I highly recommend looking into Alex Merced's talk and the wealth of resources available on Apache Iceberg. If you already use it, I would love to chat with you about it. How do you manage schema evolution in your data lake or warehouse? I’m keen to hear about the strategies and tools you've found effective. https://lnkd.in/eSvNPu7Z #apacheiceberg

Data Engineer's Lunch 107: Exploring the Apache Iceberg Ecosystem

https://www.youtube.com/
Like Comment
To view or add a comment, sign in

2,347 followers

View Profile Follow

Apache Doris’ Post

Another big leap: Apache Doris 2.1.0 is released - Apache Doris

doris.apache.org

More from this author

Less Components, Higher Performance: Apache Doris Instead of ClickHouse, MySQL, Presto, and HBase

Introduction to Apache Doris: A Next-Generation Real-Time Data Warehouse

LLM-Powered OLAP: the Tencent Experience with Apache Doris

Explore topics

Apache Doris’ Post

More Relevant Posts

Building a scalable, open source application data platform using Apache Iceberg (DemandBase)

https://www.youtube.com/

Data Engineer's Lunch 107: Exploring the Apache Iceberg Ecosystem

https://www.youtube.com/

More from this author

Less Components, Higher Performance: Apache Doris Instead of ClickHouse, MySQL, Presto, and HBase

Introduction to Apache Doris: A Next-Generation Real-Time Data Warehouse

LLM-Powered OLAP: the Tencent Experience with Apache Doris

Explore topics