Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doris Roadmap 2023 #16392

Closed
morningman opened this issue Feb 3, 2023 · 14 comments
Closed

Doris Roadmap 2023 #16392

morningman opened this issue Feb 3, 2023 · 14 comments
Labels
Discuss kind/community Issues or PRs related to Doris community

Comments

@morningman
Copy link
Contributor

morningman commented Feb 3, 2023

This is Apache Doris Roadmap 2023.

The plan is currently under discussion, so if you have comments or suggestions on any aspect of the plan or beyond, please feel free to leave a comment or send an email to dev@doris.apache.org.

We encourage developers to discuss anything in the dev mailing list, to subscribe to the mailing list please refer to How to subscribe.

We will gradually create issues for each direction of the plan to describe and track the progress in detail. Developers who wish to contribute are also welcome to create issues directly and associate with them (just leave a comment)

Roadmap 2022

Our Main Focus

  • Blazing fast OLAP
    • Reporting
    • Ad-hoc query
    • Customer or User facing analytics (high-concurrency)
  • Blazing fast query engine for datalake and lakehouse
    • Query acceleration for Hive
    • Query acceleration for open table format (Iceberg, Hudi, DeltaLake)
  • Semi-structured data storage and analysis
    • Log storage, retrieval, and analysis
    • Time series data storage, retrieval, and analysis
  • High-speed data processing (data engineering)
    • ETL/ELT acceleration
    • Streaming data warehouse

Release Schedule

We plan to release Apache Doris at the following pace:

V 1.2.x V 2.0.x V 2.1.x V 2.2.x
Jan. 1.2.1
Feb. 1.2.2
Mar. 1.2.3 2.0.0 alpha
Apr. 1.2.4 2.0.0 beta1
May 1.2.5 2.0.0 beta2
Jun. 1.2.6 2.0.0
Jul. 2.0.1 .
Aug. 2.0.2
Sept. 2.0.3 2.1.0 alpha
Oct. 2.0.4 2.1.0 beta
Nov. 2.1.0
Dec. 2.1.1

Features

We plan to develop or continuously optimize these features:

Hybrid Workloads

  • Query excution engine
    • Pipeline task parallelism
    • CodeGen
    • Adaptive execution enhencment
  • Spill To Disk
    • Sort Node
    • HashJoin Node
    • Aggregation Node
    • Sort Merge Join
    • Sort Aggregation
    • Optimize Spill To disk like compression, encryption, spill disk managment
    • New query management framework by using Spill To Disk
  • Workload manager for hybrid workloads
    • Resource isolation based on pipeline engine (CPU, Memory, IO)
    • Resource queue
    • Async execution
    • Query priority
    • Query scheduler

Semi-Structure Data Analysis

  • Complex Data Type
    • Array data type & functions
    • Jsonb data type & functions
    • Map data type & functions
    • Struct data type & functions
    • IPv4 & IPv6 data type & functions
    • GEO data type & functions
  • Index Enhancement
    • Ngram bloom filter index
    • Full-Text index for string/number/date
    • BKD numeric index for string/number/date
    • Full-Text & BKD index for Array
    • Full-Text & BKD index for Map
    • Full-Text & BKD index for Struct
    • BKD index for IPv4 & IPv6
    • BKD index for GEO
  • Dynamic Schema Table
    • Dynamic Schema Table syntax
    • Dynamic Schema Table write and read
    • Dynamic Schema Table index

Lakehouse & Data Integration

  • Query acceleration for datalake and lakehouse
    • Parquet, csv, orcfile
    • Iceberg
    • Hudi MOW
    • Hudi MOR
    • DeltaLake
    • Paimon(Flink Table Store)
  • Catalog & Cloud Storage integration
    • Hive Meta Store
    • AWS Glue
    • Alibaba Cloud DLF
    • Object Storage of AWS , Azure, GCP, Alibaba Cloud, Tencent Cloud, Huawei Cloud
  • Managed lake engine
    • Parquet writer
    • ORC writer
    • Doris Catalog for Iceberg
    • Managed Iceberg lake engine
  • Data Security
    • Keberos
    • KMS
    • Apache Ranger integration
    • Public Cloud (Alibaba Cloud, AWS) IAM Role
  • New Spark/Flink Load
    • Writing Doris data format file externally.
    • Refractor the framework of Spark/Flink Load to support batch load.
  • Hive/Presto/Spark function compatibility
  • Graph database federated query support

New Optimizer (Nereids)

  • Features
    • Fully feature support or replace the old query optimizer
    • DML (insert, update, merge)
    • Query cache
  • Performance
    • Optimize the time consumption of the plan stage
    • RBO Rules enhancement
    • CBO Rules enhancement, inline CTE, etc.
  • Support for hybrid workloads
    • Optimize rules for datalake engine
    • Adaptive query plan
    • Adaptive sort/agg algorithm
  • Statistics enhancement
    • Statistics derivation optimization, improve accuracy, support complex expressions
    • Richer statistics to support non-uniform distribution data
    • Optimize statistics persistence and caching mechanism
    • Auto collect statistics
    • Optimiza cost model that is more adaptable to distributed scenarios

Cost Efficiency & Performance

  • Cloud Native
    • Cold & Hot Data Separation
    • Elastic Compute Node
  • Low-latency, high-concurrency point query
  • Aggregating index & projection
  • Performance Self Tunning
  • Multi-Table Materialized View
    • Automatic Incremental refresh
    • Automatic query rewriting

Data Modeling & Storage Engine

  • Cross Cluster Replication (CCR) & Binlog
    • CCR to enable higher HA
    • Binlog to enable streaming computing
  • Unique Key Constraint
    • Merge-on-Write (MoW) Unique Key Table
    • Partial Column Update on MoW UNIQUE Key Table
  • DDL Simplification
    • Support functions in partitioning
    • Auto Bucket Number
  • Unified Data Model
  • General Delete, Update, Merge Support
  • Light Schema Change
    • Do not effect on historical data and work on newer data

Ecosystem

  • Enhance BI tools compatibility
    • Matebase
    • Superset
    • Tableau
  • Enhance doris-dbt
  • Enhance Doris-Airbyte
  • Enhance integration with cloud data integration tools

Utility & Stability

  • RBAC (Role-Based-Access-Control) enhancement
  • Support column-level authorization
  • Profiling / Tracing enhancement
  • Doris Manager enhancement
  • Multi-language UDF
  • More Fuzzy tests
  • All HTTP APIs support HTTPS and authorization
  • Full support for K8s deployment
@morningman morningman added kind/community Issues or PRs related to Doris community Discuss labels Feb 3, 2023
@morningman morningman pinned this issue Feb 3, 2023
@emerkfu
Copy link
Contributor

emerkfu commented Feb 3, 2023

flink-doris-connector sink doris, Doris target table dynamic update function.

The current flink-doris-connector can well support the data writing operation of the established table. However, when adding a new write target table in the same flink job, it is necessary to stop the Flink job and reload the target table name. If flink-doris-connector can be configured to read the value of a field in the stream data to dynamically obtain the target table to be written, then the Flink job does not need to be stopped, and the operation and maintenance work will become easier.

@wangbo
Copy link
Contributor

wangbo commented Feb 4, 2023

I think CCR(Cross Cluster Replication) is a nice idea.
If we can clone a cluster partially or completely from Production ENV, then we can do many things to the cloned cluster.
1 Replay online query to the cloned cluster find online bug.
2 Verify major version upgrade, such upgrade the cloned cluster from 1.x to 2.x, to find potential problems.

To achive above goals, we still need some tools, such as query replay/load/ddl tool.
This is undoubtedly very helpful for improving stability and realizing rapid development.

@yangzhg
Copy link
Member

yangzhg commented Feb 6, 2023

Column-level authorization is capability of controlling access to specific columns or fields within a database or table, rather than just the entire table itself. This type of authorization allows for fine-grained control over data access and can enhance security and privacy by preventing unauthorized users from accessing sensitive information. So I think this is a very attractive feature

@siriume
Copy link
Contributor

siriume commented Feb 7, 2023

mutil group_concat(distinct xxx order by xx) is a function we urgently need now. The current solution is to implement it through the join of multiple subqueries. Because the scanned rows are too large, the speed is very slow now. I don’t know if there is any plan to add this function in the future.

@wangbo
Copy link
Contributor

wangbo commented Feb 12, 2023

How about link related PR/issue to the item in the list?

@subkanthi
Copy link

Im interested in contributing to some of the work, how do I go about doing it, should I create a separate issue and just add it here in the comments, please let me know.

@luzhijing
Copy link
Contributor

Im interested in contributing to some of the work, how do I go about doing it, should I create a separate issue and just add it here in the comments, please let me know.

That's great! Welcome to Apache Doris Community! Feel free to exchange any ideas in the comments

@wangshisan
Copy link

wangshisan commented Mar 29, 2023

Writing Doris data format file externally

What's the status of this task, any update?

@morningman
Copy link
Contributor Author

Writing Doris data format file externally

What's the status of this task, any update?

Still work-in-progress, may be released at mid of this year

@wangshisan
Copy link

Writing Doris data format file externally

What's the status of this task, any update?

Still work-in-progress, may be released at mid of this year

May I know how would you implement this? By implementing a full Java writer, or by JNI and invoking the existing C++ code in the backend?

@morningman
Copy link
Contributor Author

By JNI, possibly

@fakeyanss
Copy link

Full support for K8s deployment

When did this start, or what's the current status?

@HunterPan
Copy link

support time series data ?like ymatrix or greenplum, hope so

@luzhijing luzhijing unpinned this issue Oct 15, 2023
@hqx871
Copy link
Contributor

hqx871 commented Dec 7, 2023

Hi team, any update about MergeSortJoin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discuss kind/community Issues or PRs related to Doris community