Indexing is a powerful tool for maximizing query performance in databases, but one of the traditional maxims of indexing is that it should be done sparingly, and only on columns that are used frequently. That’s not how Hydrolix works. Hydrolix takes a novel approach to indexing where every column is indexed at the partition level—even if a table has thousands of columns. Hydrolix’s indexing approach is part of the reason users see sub-second query performance on trillion row datasets—all while using commodity object storage. In this post, you’ll learn how and why Hydrolix indexes every column by default. This post will cover: + How Hydrolix partitions and prunes data by time + Writing and reading indexes at the partition level + The benefits of indexing everything https://lnkd.in/eRegh3jB #bigdata #data #devops #cloud #database
Alexandra Keith’s Post
More Relevant Posts
-
⚙️Tech Post - How to bring ingestion data loss to 0% and reduce source load by up to 30%? ❓ Problem: Imagine losing data after reading from a source or repeatedly pulling data from (legacy) sources, instead of just reading it once and safely archiving it in the Cloud. 🎯 Solution: Meet Dataphos Persistor – Your Resilient Backup and Resubmission System. Persistor organizes data seamlessly, enhancing data quality and agility. Real-time indexing and resubmission options save you time and effort. Say goodbye to manual pipeline rerouting. #Syntio #Persistor #DataManagement #DataSolution #PersistentStorage
Data Orchestration with Dataphos Persistor: Technical Overview
https://www.syntio.net
To view or add a comment, sign in
-
Instead of organizing our data through partitioning, we should rethink how we organize our data when working with cloud object stores. In this post, we discuss how data skipping and Z-Order reduce the amount of data that needs to be scanned. This translates to significant query performance improvements and reductions in cost using clustering techniques with Delta Lake. https://lnkd.in/gVktxYsC Delta Lake, #DeltaLake, #Clustering, #ZOrder, #dataskipping, #performance
Optimize by Clustering not Partitioning Data with Delta Lake
http://dennyglee.com
To view or add a comment, sign in
-
New Databricks feature: Lakehouse Federation A new killer feature 👑 almost hard to believe. Do you know what happens when you enable that? Imagine this scenario: you create a wormhole between your Databricks and any SQL source, on-prem or cloud, allowing you to explore and query your SQL source directly and instantly, as if you were inside the source while actually being in Databricks. At the same time, the Databricks catalog mirrors and displays the source catalog one-to-one on the left sidebar catalog. There is no need for any data movement (Extract or Ingest) whatsoever. Sounds both crazy and incredible? Blog post: Introducing Lakehouse Federation Capabilities in Unity Catalog https://lnkd.in/dGub7fmx 6-minute video: Lakehouse Federation is Here https://lnkd.in/dFsfYgdp #databricks #lakehouse #datadiscovery #dataexploration #dataengineering #sql #etl
Introducing Lakehouse Federation Capabilities in Unity Catalog
databricks.com
To view or add a comment, sign in
-
There are a lot of questions Hive-style partitioning and its initial roots. Here I go back in time to discuss the roots of data partitioning - optimal disk layout to improve disk I/O throughput. Today, cloud object stores abstract away from the physical location of the data. Partitioning can still be useful to organize data, but not for its initial file structure reasons. https://lnkd.in/dHA6S79Y cc #DeltaLake #partitioning #ApacheSpark #ApacheHive #S3
Organizing Data: Partition by Disk to Hive-style Partitioning
http://dennyglee.com
To view or add a comment, sign in
-
A successful #data strategy in the modern era requires 2 things: 1️⃣ A multimodel #database to bring together the differing shapes of data into a single model 2️⃣ A cloud database provider that significantly reduces the time spent managing the database operations Read more from Ted Neward and The New Stack ✍️ https://bit.ly/43RwhOk
Reducing Complexity with a Multimodel Database
https://thenewstack.io
To view or add a comment, sign in
-
Modern cloud data platforms like Hydrolix use commodity object storage that’s pay-as-you-go and horizontally scalable, giving you long-term, cost-effective storage. Hydrolix maximizes the strengths of object storage and uses massive parallelism, high-density compression, and advanced query features such as micro-indexing to ensure that ingest and query are performant at any scale. https://lnkd.in/eAymscvs #data #bigdata #cloud
Maximizing Query Performance for 100+ Billion Row Data Sets
https://hydrolix.io
To view or add a comment, sign in
-
Data platform migration stories are always fascinating. No matter what the source and target databases are, each migration brings new challenges to solve. 5 major factors that make every data migration unique. ✅ Databases involved ✅ Infrastructure environment ✔️ Completely on-prem, ✔️ Completely on cloud ✔️ on-prem to cloud, ✔️ cloud to on-prem. ✅ Management objectives of the migration ✅ End user use cases ✅ prog. languages used in managing data. Here is a very high level data migration story. Spoiler Alert: This article doesn't go deep into the technical aspects of it and is written by Snowflake team. #data #datamigration #datawarehouse #snowflake https://lnkd.in/geEG8jDV
Vertica to Snowflake Migration: Lessons Learned
david-ruthven.medium.com
To view or add a comment, sign in
-
A successful #data strategy in the modern era requires 2 things: 1. A multimodel #database to bring together the differing shapes of data into a single model 2. A cloud database provider that significantly reduces the time spent managing the database operations Read more from Ted Neward and The New Stack at https://bit.ly/43RwhOk #datastrategy #datamodernization #couchbase
Reducing Complexity with a Multimodel Database
https://thenewstack.io
To view or add a comment, sign in
-
Great article from Tomer Shiran on the rise of Apache Iceberg as Data Lakehouse format and its adoption by major players on the Data industry. Extremely relevant, specially if your are hosting your data platform on Amazon Web Services (AWS) or GCP, and using Snowflake and Confluent Kafka. #apacheiceberg #aws #gcp #snowflake #confluent https://lnkd.in/dhNYk59r
How Iceberg Became the Industry Standard for Data Lakehouse Platforms
medium.com
To view or add a comment, sign in
-
During the weekend, I put all my thoughts into a coherent way. Here comes my talk with fewer jokes and, in a bit more serious way, covering what I talked about at PyCon Lithuania. I feel WAP, as well as other good things like indexes, check constraints and many others, will be added to our beloved cloud DWH, and we could have the best of both worlds (RDBMS and Cloud DWH) You can read the full blog post here: https://lnkd.in/djDK4eDB #pycon #pipelines #dataengineering #Iceberg #Spark
Write-Audit-Publish Pattern in Modern Data Pipelines
uncledata.substack.com
To view or add a comment, sign in