"While going through some data platform designs shared by companies like Uber and Netflix can be both exciting and intimidating, becoming a data platform engineer is actually much more accessible than these articles might suggest." This is a guide on building a simple data platform with three basic components: 🚢 MySQL 🚢 Apache Doris 🚢 Apache Flink Kudos to Mohamed Amine Turki for the hands-on instructions. This would make a great stepping stone for someone who's new to data engineering. #dataengineering #database #MySQL #ApacheFlink #opensource https://lnkd.in/gYq2JbKQ
Apache Doris’ Post
More Relevant Posts
-
Co-founder at Codefex | Building Digital Products for startups | Implementing AI/ML/DL use cases | Data Infrastructure
Let's connect to set up your Data warehouse.
"While going through some data platform designs shared by companies like Uber and Netflix can be both exciting and intimidating, becoming a data platform engineer is actually much more accessible than these articles might suggest." This is a guide on building a simple data platform with three basic components: 🚢 MySQL 🚢 Apache Doris 🚢 Apache Flink Kudos to Mohamed Amine Turki for the hands-on instructions. This would make a great stepping stone for someone who's new to data engineering. #dataengineering #database #MySQL #ApacheFlink #opensource https://lnkd.in/gYq2JbKQ
Build Your First Data Platform: A Beginner’s Guide
dataplatformhub.medium.com
To view or add a comment, sign in
-
👩💻 Hands-On with Catalogs in Flink SQL 🔧 In this second post in the series, Robin Moffatt shows how to use Flink SQL with catalogs including #apacheHive, #JDBC, and #apacheIceberg. It also includes a closer look at the data structures within the Hive Metastore. https://lnkd.in/daF6zy-R #dataEngineering #streamingData #openSource #SQL
Catalogs in Flink SQL—Hands On
decodable.co
To view or add a comment, sign in
-
Tuple shuffling: Postgres CTEs for Moving and Deleting Table Data
Tuple shuffling: Postgres CTEs for Moving and Deleting Table Data
crunchydata.com
To view or add a comment, sign in
-
My main job currently revolves around analytics, mainly from BI perspective but big portion of my life I’ve spent with productized analytics 🤓 And I often say: “you can choose any database as long as it’s postgres” and I do think for most applications it’s the right choise. 🤠 If you run with postgres, checkout the content from Supabase they have many amazing posts. The latest I checked was about table partitioning 🤓 Had my share of massive partitioned postgres in my previous job working with youtube video _metadata_. Fun times 😄 I do know that postgres does not fit every scenario and there are better alternatives often for OLAP load patterns. But does anyone know any good HTAP database? Or is that still completely unsolved problem https://lnkd.in/d_RXh8vr #databases #postgres
Dynamic Table Partitioning in Postgres
supabase.com
To view or add a comment, sign in
-
In this article, we're comparing commonly used Go ORMs to help you answer which one is suitable for... https://lnkd.in/dSWEXKph #golang #sql #orms #databases
Go ORMs Compared
dev.to
To view or add a comment, sign in
-
Software Engineer 💻 | Crafting future-focused Solutions @MorganStanley | Expert in FinTech, Java, Python, Go, and Typescript | Mitacs Research Scholar | System Design Maven | Tech Blogger & Engaging Speaker
Curious how pg_analytics can transform your PostgreSQL database into an OLAP powerhouse, like enabling complex analytics on massive datasets in seconds? 🤔 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 Developers who store billions of data points in Postgres struggle with slow query times and poor data compression. Even with database tuning, complex analytical queries (e.g. counts, window functions, string aggregations) can take anywhere from minutes to hours. Many organizations turn to an external analytical data store like Elasticsearch as a result. This increases operational complexity as data becomes isolated and engineers must learn to use a new database. 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 By speeding up analytical queries directly inside Postgres, pg_analytics is a solution for analytics in Postgres without the need to extract, transform, and load (ETL) data into another system. 𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬 Regular Postgres tables, known as heap tables, organize data by row. While this makes sense for operational data, it is inefficient for analytical queries, which often scan a large amount of data from a subset of the columns in a table. ParadeDB introduces a new kind of table called the 𝐝𝐞𝐥𝐭𝐚𝐥𝐚𝐤𝐞 table. 𝐝𝐞𝐥𝐭𝐚𝐥𝐚𝐤𝐞 tables behave like regular Postgres tables but use a column-oriented layout via 𝐀𝐩𝐚𝐜𝐡𝐞 𝐀𝐫𝐫𝐨𝐰 and leverage 𝐀𝐩𝐚𝐜𝐡𝐞 𝐃𝐚𝐭𝐚𝐅𝐮𝐬𝐢𝐨𝐧, a query engine optimized for column-oriented data. This means that users can choose between row and column-oriented storage at table creation time. Arrow and Datafusion are integrated with Postgres via two features of the Postgres API: 1️⃣ Table access method 2️⃣ Executor hooks The table access method registers deltalake tables with the Postgres catalog and handles data manipulation language (DML) statements like inserts. Executor hooks intercept and reroute queries to DataFusion, which parses the query, constructs an optimal query plan, executes it, and returns the results to Postgres. Data is persisted to disk with 𝐏𝐚𝐫𝐪𝐮𝐞𝐭, a highly compressed file format for column-oriented data. Thanks to Parquet, ParadeDB compacts data 5x more than both regular Postgres and Elasticsearch. #database #postgres #postgresql #apache #dataanalytics #olap #elasticsearch #parquet #systemdesign #technews
To view or add a comment, sign in
-
-
Thanks to UUIDv7, we can do time-based partitioning of Postgres tables fully on the Postgres side without any client side support needed! In E98 of "5mins of Postgres" we talk about partitioning Postgres tables by timestamp based UUIDs. We look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with UUIDv7. Personally, we could see a big benefit of utilizing this technique when most of the data you're accessing is the data you created recently, with recent IDs, because then you can essentially guarantee that that table is going to be kept in cache! Let's imagine you have a bunch of older data. You could then, over time repackage that, move it to an archive, but still keep it for lookup. If you have a given id, you'll be able to know which of the partitions to look for. That makes all the difference! We are really excited about this. https://lnkd.in/dxyqR-d9
How to partition Postgres tables by timestamp based UUIDs
pganalyze.com
To view or add a comment, sign in
-
I had some fun yesterday testing out the UUIDv7 patch currently being discussed on the Postgres mailinglists, in order to partition a table by UUID, but split it up by days of when the ID was assigned, thanks to the timestamp prefix in UUIDv7. This is one of these techniques that one wouldn't think of in the first place (and kudos to Chris O'Brien for thinking of this in the context of ULIDs), but will become easier to use once we have the building blocks directly in core #Postgres.
Thanks to UUIDv7, we can do time-based partitioning of Postgres tables fully on the Postgres side without any client side support needed! In E98 of "5mins of Postgres" we talk about partitioning Postgres tables by timestamp based UUIDs. We look at Chris O'Brien's great write-up on how to do time-based partitioning with ULID, and are quite excited about being able to do the same with UUIDv7. Personally, we could see a big benefit of utilizing this technique when most of the data you're accessing is the data you created recently, with recent IDs, because then you can essentially guarantee that that table is going to be kept in cache! Let's imagine you have a bunch of older data. You could then, over time repackage that, move it to an archive, but still keep it for lookup. If you have a given id, you'll be able to know which of the partitions to look for. That makes all the difference! We are really excited about this. https://lnkd.in/dxyqR-d9
How to partition Postgres tables by timestamp based UUIDs
pganalyze.com
To view or add a comment, sign in