Apache Doris

Apache Doris

Software Development

San Francisco, California 2,347 followers

Apache Doris is an open-source real-time data warehouse based on MPP architecture.

About us

Apache Doris is an open-source real-time data warehouse based on MPP architecture, known for its fast speed and ease of use. It supports real-time data ingestion and real-time query response in both high-concurrency point query and high-throughput analysis scenarios. With it, users can process and analyze large datasets in the blink of an eye. In June 2022, Apache Doris became a full-fledged, top-level project incubated by ASF. It accumulated nearly 600 contributors and more than 20,000 developers are using Apache Doris today. Doris is also used in production within over 2000 companies around the world, trusted by business giants such as AWS, Fuse, JD.com, Lenovo, OPPO, Shoppe, TikTok, Tencent, Vivo, Xiaomi and etc. We welcome more open source technology enthusiasts to join the Apache Doris community and together discover infinite possibilities! Learn more about Apache Doris on Github: https://github.com/apache/doris Join the Apache Doris community on Slack: https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2gmq5o30h-455W226d79zP3L96ZhXIoQ

Website
https://doris.apache.org/
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, California
Type
Nonprofit
Founded
2018

Locations

Updates

  • View organization page for Apache Doris, graphic

    2,347 followers

    📢We are thrilled to announce the release of Apache Doris 2.1.0! For our long-term supportive users, allow me to re-introduce Apache Doris with its amazing new features and substantially improved data writing and query performance! For those who are new to Apache Doris, this is great timing for a proof of concept to see how it performs in your use case! Fasten up and be ready for: 🚶♂️ 100% faster out-of-the-box performance proven by TPC-DS benchmark tests 🚶♀️ Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark 🏃♂️ Solid support for semi-structured data analysis 🏃♀️ Materialized view across multiple tables to accelerate multi-table joins 💃 Enhanced real-time writing efficiency powered by AUTO_INCREMENT column, AUTO PARTITION, forward placement of MemTable, and Group Commit. 🕺 Better workload management for higher performance stability https://lnkd.in/gjVXD6gQ #database #dataengineering #analytics #bigdata #opensource

    Another big leap: Apache Doris 2.1.0 is released - Apache Doris

    Another big leap: Apache Doris 2.1.0 is released - Apache Doris

    doris.apache.org

  • View organization page for Apache Doris, graphic

    2,347 followers

    The design of the compute-storage decoupled mode of Apache Doris, available in the upcoming version 3.0, highlights the transformation of the FE's in-memory metadata model into a shared metadata service. This approach offers a globally consistent state view, allowing any node to directly submit writes without needing to go through the FE for publishing. During write operations, data is stored in shared storage, while metadata is managed by the metadata service. This effectively controls the number of small files in shared storage. Meanwhile, the real-time write performance for individual tables is nearly on par with that in the compute-storage coupled mode. The system's overall write capacity is no longer limited by the processing power of a single FE node.

    • No alternative text description for this image
  • Apache Doris reposted this

    View organization page for PRINCIPAL , graphic

    2,158 followers

    🎯 Léto s daty: Úspory s open-source u PRINCIPAL Ani v létě naše #PRINCIPAL datová laboratoř nezahálí! Představujeme vám novinky, na kterých náš tým intenzivně pracuje. Přiblíží vám to kolega Tomas Novotny, který stojí za našimi nejnovějšími testy.   "Cílem naší datové laboratoře je sestavit z open-source komponent stabilní řešení, které našim zákazníkům ušetří náklady na datovou platformu. Pokud zákazník žasne nad příliš vysokým účtem za cloudový data warehouse, my v #PRINCIPAL máme řešení. Přesně pro tento účel testujeme například #Apache Doris - MPP databázi s podobnou filosofií jako má například #Teradata či #Snowflake. Doris si sama konzumuje z Kafky a plní jimi landing vrstvu datového skladu. Databáze se dobře hodí jako platforma pro ODS (operační datový sklad), nebo tam, kde se úvahy točí kolem možnosti využít pro tento účel #Elastic a #Kibanu. Pro testování využíváme mimo jiné i volně dostupná data o hustotě dopravy na vybraných dopravních úsecích, které například kombinujeme s výsledky skórování fotek z webových kamer, za využití AI modelů z #HuggingFace. Výsledné dashboardy jsou vytvořeny v #Grafaně, která nám aktuální stav dopravy zobrazuje po 10 sekundách. Sami zjišťujeme, kolik peněz lze ušetřit za využití dobře ochočených open-source softwarů v on-premis či hybridní variantě." Jaké zkušenosti máte s využitím open-source nástrojů ve vaší datové infrastruktuře?

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,347 followers

    How Apache Doris makes life easier for data engineers? 🦉 Self-adaptive query optimizer A query optimizer that enables fast performance for most use cases without any manual fine-tuning https://lnkd.in/gFZiieCH 🦉 Auto Partition It supports partitioning data by RANGE or by LIST and further enhances flexibility in data sharding. https://lnkd.in/gHmmqJg5 🦉 Auto Increment A useful feature in dictionary encoding, primary key generation, data updates, and pagination https://lnkd.in/gr6tvnu2 🦉 SQL Dialect Convertor Doris supports multiple SQL dialects, including Presto, Trino, Hive, PostgreSQL, Spark, Oracle, and Clickhouse. Users can directly query data in Doris using these SQL dialects. https://lnkd.in/gMXK5s9k 🦉 X2Doris X2Doris is a visualized tool that facilitates data migration from other OLAP systems to Doris. https://lnkd.in/gqTg-D2M

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,347 followers

    A use case of Apache Doris supporting the farm-to-fork journey of a listed fresh agri-produce enterprise: Before Apache Doris, they were using the combination of a real-time data warehouse based on HBase and an offline one based on Hive. With Apache Doris, they integrate their real-time data streams and batch data processing pipelines to simplify their data architecture. 💡 Benefits they reap: ☑️ 3X personnel efficiency: By mastering SQL alone, the entire real-time reporting development can be completed quickly. The computation logic can be changed by simply modifying the SQL. ☑️ $1 million in cost savings: Cost reductions due to the data compression capability and computing performance of Doris, which allows the user to cut down storage costs and labor inputs. ☑️ 30X computing efficiency: compared to Hive ☑️ Convenience in data ingestion: The real-time data ingestion process requires only configuration on the web page, avoiding the tedious operations associated with Flink JAR package modification and upload.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,347 followers

    Apache Doris provides native support for several key features of Apache Iceberg: 1️⃣ Multiple Iceberg Catalog types, including Hive Metastore, Hadoop, REST, Glue, Google Dataproc Metastore, and DLF. 2️⃣ Iceberg V1/V2 table formats, and reading Positional Delete and Equality Delete files. 3️⃣ Querying the snapshot history of Iceberg tables through table functions. 4️⃣ Time Travel 5️⃣ Iceberg table engine. Users can directly create, manage Iceberg tables, and write data to them via Apache Doris.  6️⃣ Partition Transform functions, enabling hidden partitioning and schema evolution.

    • No alternative text description for this image
  • View organization page for Apache Doris, graphic

    2,347 followers

    Apache Doris has evolved itself into a mature Data Lakehouse solution. 🚗 V0.15 1️⃣ Introduced Hive and Iceberg external tables. 🚄 V1.2 1️⃣ Introduced Multi-Catalog, achieving automatic metadata mapping and data access for various data sources. ✈️ V2.1 1️⃣ Enhanced Lakehouse architecture with better reading and writing capabilities of mainstream data lake formats (Hudi, Iceberg, Paimon, etc.). 2️⃣ Introduced compatibility with multiple SQL dialects. 3️⃣ Enabled seamless migration from existing systems to Apache Doris 4️⃣ Integrated the Arrow Flight high-speed reading interface, achieving 100X data transfer efficiency. As we're providing more doc guides about the data lakehouse capabilities of Apache Doris, this is a quick start to query data from Apache Paimon using Doris. https://lnkd.in/gFQPAyVn

    Apache Doris & Paimon Quick Start - Apache Doris

    Apache Doris & Paimon Quick Start - Apache Doris

    doris.apache.org

Similar pages