Apache Doris’ Post

View organization page for Apache Doris, graphic

2,347 followers

1mo

"While going through some data platform designs shared by companies like Uber and Netflix can be both exciting and intimidating, becoming a data platform engineer is actually much more accessible than these articles might suggest." This is a guide on building a simple data platform with three basic components: 🚢 MySQL 🚢 Apache Doris 🚢 Apache Flink Kudos to Mohamed Amine Turki for the hands-on instructions. This would make a great stepping stone for someone who's new to data engineering. #dataengineering #database #MySQL #ApacheFlink #opensource https://lnkd.in/gYq2JbKQ

Build Your First Data Platform: A Beginner’s Guide

dataplatformhub.medium.com

To view or add a comment, sign in

More Relevant Posts

Akash Gupta

Co-founder at Codefex | Building Digital Products for startups | Implementing AI/ML/DL use cases | Data Infrastructure
1mo
Report this post
Let's connect to set up your Data warehouse.

Apache Doris

2,347 followers
1mo

"While going through some data platform designs shared by companies like Uber and Netflix can be both exciting and intimidating, becoming a data platform engineer is actually much more accessible than these articles might suggest." This is a guide on building a simple data platform with three basic components: 🚢 MySQL 🚢 Apache Doris 🚢 Apache Flink Kudos to Mohamed Amine Turki for the hands-on instructions. This would make a great stepping stone for someone who's new to data engineering. #dataengineering #database #MySQL #ApacheFlink #opensource https://lnkd.in/gYq2JbKQ

Build Your First Data Platform: A Beginner’s Guide

dataplatformhub.medium.com
Like Comment
To view or add a comment, sign in
Decodable

4,478 followers
5mo
Report this post
👩💻 Hands-On with Catalogs in Flink SQL 🔧 In this second post in the series, Robin Moffatt shows how to use Flink SQL with catalogs including #apacheHive, #JDBC, and #apacheIceberg. It also includes a closer look at the data structures within the Hive Metastore. https://lnkd.in/daF6zy-R #dataEngineering #streamingData #openSource #SQL

Catalogs in Flink SQL—Hands On

decodable.co
Like Comment
To view or add a comment, sign in
Douglas Hunley

Automating highly-available database solutions for fun and profit
4mo
Report this post
Tuple shuffling: Postgres CTEs for Moving and Deleting Table Data

Tuple shuffling: Postgres CTEs for Moving and Deleting Table Data

crunchydata.com
Like Comment
To view or add a comment, sign in
Niko Korvenlaita

Dad | Husband | Co-founder & CTO @ reconfigured
10mo
Report this post
My main job currently revolves around analytics, mainly from BI perspective but big portion of my life I’ve spent with productized analytics 🤓 And I often say: “you can choose any database as long as it’s postgres” and I do think for most applications it’s the right choise. 🤠 If you run with postgres, checkout the content from Supabase they have many amazing posts. The latest I checked was about table partitioning 🤓 Had my share of massive partitioned postgres in my previous job working with youtube video _metadata_. Fun times 😄 I do know that postgres does not fit every scenario and there are better alternatives often for OLAP load patterns. But does anyone know any good HTAP database? Or is that still completely unsolved problem https://lnkd.in/d_RXh8vr #databases #postgres

Dynamic Table Partitioning in Postgres

supabase.com

4 Comments
Like Comment
To view or add a comment, sign in
Arnaud Wanet

Senior Software Engineer [Golang | Python]
6mo
Report this post
In this article, we're comparing commonly used Go ORMs to help you answer which one is suitable for... https://lnkd.in/dSWEXKph #golang #sql #orms #databases

Go ORMs Compared

dev.to
Like Comment
To view or add a comment, sign in
Meet Mehta

Software Engineer 💻 | Crafting future-focused Solutions @MorganStanley | Expert in FinTech, Java, Python, Go, and Typescript | Mitacs Research Scholar | System Design Maven | Tech Blogger & Engaging Speaker
6mo
Report this post
Curious how pg_analytics can transform your PostgreSQL database into an OLAP powerhouse, like enabling complex analytics on massive datasets in seconds? 🤔 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 Developers who store billions of data points in Postgres struggle with slow query times and poor data compression. Even with database tuning, complex analytical queries (e.g. counts, window functions, string aggregations) can take anywhere from minutes to hours. Many organizations turn to an external analytical data store like Elasticsearch as a result. This increases operational complexity as data becomes isolated and engineers must learn to use a new database. 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 By speeding up analytical queries directly inside Postgres, pg_analytics is a solution for analytics in Postgres without the need to extract, transform, and load (ETL) data into another system. 𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬 Regular Postgres tables, known as heap tables, organize data by row. While this makes sense for operational data, it is inefficient for analytical queries, which often scan a large amount of data from a subset of the columns in a table. ParadeDB introduces a new kind of table called the 𝐝𝐞𝐥𝐭𝐚𝐥𝐚𝐤𝐞 table. 𝐝𝐞𝐥𝐭𝐚𝐥𝐚𝐤𝐞 tables behave like regular Postgres tables but use a column-oriented layout via 𝐀𝐩𝐚𝐜𝐡𝐞 𝐀𝐫𝐫𝐨𝐰 and leverage 𝐀𝐩𝐚𝐜𝐡𝐞 𝐃𝐚𝐭𝐚𝐅𝐮𝐬𝐢𝐨𝐧, a query engine optimized for column-oriented data. This means that users can choose between row and column-oriented storage at table creation time. Arrow and Datafusion are integrated with Postgres via two features of the Postgres API: 1️⃣ Table access method 2️⃣ Executor hooks The table access method registers deltalake tables with the Postgres catalog and handles data manipulation language (DML) statements like inserts. Executor hooks intercept and reroute queries to DataFusion, which parses the query, constructs an optimal query plan, executes it, and returns the results to Postgres. Data is persisted to disk with 𝐏𝐚𝐫𝐪𝐮𝐞𝐭, a highly compressed file format for column-oriented data. Thanks to Parquet, ParadeDB compacts data 5x more than both regular Postgres and Elasticsearch. #database #postgres #postgresql #apache #dataanalytics #olap #elasticsearch #parquet #systemdesign #technews
Like Comment
To view or add a comment, sign in
pganalyze

1,184 followers
6mo
Report this post
Thanks to UUIDv7, we can do time-based partitioning of Postgres tables fully on the Postgres side without any client side support needed! In E98 of "5mins of Postgres" we talk about partitioning Postgres tables by timestamp based UUIDs. We look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with UUIDv7. Personally, we could see a big benefit of utilizing this technique when most of the data you're accessing is the data you created recently, with recent IDs, because then you can essentially guarantee that that table is going to be kept in cache! Let's imagine you have a bunch of older data. You could then, over time repackage that, move it to an archive, but still keep it for lookup. If you have a given id, you'll be able to know which of the partitions to look for. That makes all the difference! We are really excited about this. https://lnkd.in/dxyqR-d9

How to partition Postgres tables by timestamp based UUIDs

pganalyze.com

1 Comment
Like Comment
To view or add a comment, sign in
Lukas Fittl

Founder at pganalyze
6mo
Report this post
I had some fun yesterday testing out the UUIDv7 patch currently being discussed on the Postgres mailinglists, in order to partition a table by UUID, but split it up by days of when the ID was assigned, thanks to the timestamp prefix in UUIDv7. This is one of these techniques that one wouldn't think of in the first place (and kudos to Chris O'Brien for thinking of this in the context of ULIDs), but will become easier to use once we have the building blocks directly in core #Postgres.

pganalyze

1,184 followers
6mo

Thanks to UUIDv7, we can do time-based partitioning of Postgres tables fully on the Postgres side without any client side support needed! In E98 of "5mins of Postgres" we talk about partitioning Postgres tables by timestamp based UUIDs. We look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with UUIDv7. Personally, we could see a big benefit of utilizing this technique when most of the data you're accessing is the data you created recently, with recent IDs, because then you can essentially guarantee that that table is going to be kept in cache! Let's imagine you have a bunch of older data. You could then, over time repackage that, move it to an archive, but still keep it for lookup. If you have a given id, you'll be able to know which of the partitions to look for. That makes all the difference! We are really excited about this. https://lnkd.in/dxyqR-d9

How to partition Postgres tables by timestamp based UUIDs

pganalyze.com

2 Comments
Like Comment
To view or add a comment, sign in

2,347 followers

View Profile Follow

Apache Doris’ Post

Build Your First Data Platform: A Beginner’s Guide

dataplatformhub.medium.com

More from this author

Less Components, Higher Performance: Apache Doris Instead of ClickHouse, MySQL, Presto, and HBase

Introduction to Apache Doris: A Next-Generation Real-Time Data Warehouse

LLM-Powered OLAP: the Tencent Experience with Apache Doris

Explore topics