Alexandra Keith’s Post

1mo Edited

Indexing is a powerful tool for maximizing query performance in databases, but one of the traditional maxims of indexing is that it should be done sparingly, and only on columns that are used frequently. That’s not how Hydrolix works. Hydrolix takes a novel approach to indexing where every column is indexed at the partition level—even if a table has thousands of columns. Hydrolix’s indexing approach is part of the reason users see sub-second query performance on trillion row datasets—all while using commodity object storage. In this post, you’ll learn how and why Hydrolix indexes every column by default. This post will cover: + How Hydrolix partitions and prunes data by time + Writing and reading indexes at the partition level + The benefits of indexing everything https://lnkd.in/eRegh3jB #bigdata #data #devops #cloud #database

Why Hydrolix Indexes All Columns

https://hydrolix.io

To view or add a comment, sign in

More Relevant Posts

Syntio

2,527 followers
9mo
Report this post
⚙️Tech Post - How to bring ingestion data loss to 0% and reduce source load by up to 30%? ❓ Problem: Imagine losing data after reading from a source or repeatedly pulling data from (legacy) sources, instead of just reading it once and safely archiving it in the Cloud. 🎯 Solution: Meet Dataphos Persistor – Your Resilient Backup and Resubmission System. Persistor organizes data seamlessly, enhancing data quality and agility. Real-time indexing and resubmission options save you time and effort. Say goodbye to manual pipeline rerouting. #Syntio #Persistor #DataManagement #DataSolution #PersistentStorage

Data Orchestration with Dataphos Persistor: Technical Overview

https://www.syntio.net
Like Comment
To view or add a comment, sign in
Denny Lee

Developer Relations at Databricks
6mo
Report this post
Instead of organizing our data through partitioning, we should rethink how we organize our data when working with cloud object stores. In this post, we discuss how data skipping and Z-Order reduce the amount of data that needs to be scanned. This translates to significant query performance improvements and reductions in cost using clustering techniques with Delta Lake. https://lnkd.in/gVktxYsC Delta Lake, #DeltaLake, #Clustering, #ZOrder, #dataskipping, #performance

Optimize by Clustering not Partitioning Data with Delta Lake

http://dennyglee.com

5 Comments
Like Comment
To view or add a comment, sign in
Umut Tekakca

Senior Solutions Engineer at Databricks
1y
Report this post
New Databricks feature: Lakehouse Federation A new killer feature 👑 almost hard to believe. Do you know what happens when you enable that? Imagine this scenario: you create a wormhole between your Databricks and any SQL source, on-prem or cloud, allowing you to explore and query your SQL source directly and instantly, as if you were inside the source while actually being in Databricks. At the same time, the Databricks catalog mirrors and displays the source catalog one-to-one on the left sidebar catalog. There is no need for any data movement (Extract or Ingest) whatsoever. Sounds both crazy and incredible? Blog post: Introducing Lakehouse Federation Capabilities in Unity Catalog https://lnkd.in/dGub7fmx 6-minute video: Lakehouse Federation is Here https://lnkd.in/dFsfYgdp #databricks #lakehouse #datadiscovery #dataexploration #dataengineering #sql #etl

Introducing Lakehouse Federation Capabilities in Unity Catalog

databricks.com
Like Comment
To view or add a comment, sign in
Denny Lee

Developer Relations at Databricks
6mo
Report this post
There are a lot of questions Hive-style partitioning and its initial roots. Here I go back in time to discuss the roots of data partitioning - optimal disk layout to improve disk I/O throughput. Today, cloud object stores abstract away from the physical location of the data. Partitioning can still be useful to organize data, but not for its initial file structure reasons. https://lnkd.in/dHA6S79Y cc #DeltaLake #partitioning #ApacheSpark #ApacheHive #S3

Organizing Data: Partition by Disk to Hive-style Partitioning

http://dennyglee.com

8 Comments
Like Comment
To view or add a comment, sign in
Couchbase

50,931 followers
11mo Edited
Report this post
A successful #data strategy in the modern era requires 2 things: 1️⃣ A multimodel #database to bring together the differing shapes of data into a single model 2️⃣ A cloud database provider that significantly reduces the time spent managing the database operations Read more from Ted Neward and The New Stack ✍️ https://bit.ly/43RwhOk

Reducing Complexity with a Multimodel Database

https://thenewstack.io
Like Comment
To view or add a comment, sign in
Alexandra Keith
2mo
Report this post
Modern cloud data platforms like Hydrolix use commodity object storage that’s pay-as-you-go and horizontally scalable, giving you long-term, cost-effective storage. Hydrolix maximizes the strengths of object storage and uses massive parallelism, high-density compression, and advanced query features such as micro-indexing to ensure that ingest and query are performant at any scale. https://lnkd.in/eAymscvs #data #bigdata #cloud

Maximizing Query Performance for 100+ Billion Row Data Sets

https://hydrolix.io
Like Comment
To view or add a comment, sign in
Jwala Vedantam Jwala Vedantam is an Influencer

#CloudComputing | #AWS | #DataCloud | #Snowflake | #INDIA
8mo
Report this post
Data platform migration stories are always fascinating. No matter what the source and target databases are, each migration brings new challenges to solve. 5 major factors that make every data migration unique. ✅ Databases involved ✅ Infrastructure environment ✔️ Completely on-prem, ✔️ Completely on cloud ✔️ on-prem to cloud, ✔️ cloud to on-prem. ✅ Management objectives of the migration ✅ End user use cases ✅ prog. languages used in managing data. Here is a very high level data migration story. Spoiler Alert: This article doesn't go deep into the technical aspects of it and is written by Snowflake team. #data #datamigration #datawarehouse #snowflake https://lnkd.in/geEG8jDV

Vertica to Snowflake Migration: Lessons Learned

david-ruthven.medium.com
Like Comment
To view or add a comment, sign in
Mike Generalovich

Facilitating Business Outcomes Through Customer Obsession
11mo
Report this post
A successful #data strategy in the modern era requires 2 things: 1. A multimodel #database to bring together the differing shapes of data into a single model 2. A cloud database provider that significantly reduces the time spent managing the database operations Read more from Ted Neward and The New Stack at https://bit.ly/43RwhOk #datastrategy #datamodernization #couchbase

Reducing Complexity with a Multimodel Database

https://thenewstack.io
Like Comment
To view or add a comment, sign in
Gabriel dos Santos Gonçalves

Data Engineer @ Perform
2mo
Report this post
Great article from Tomer Shiran on the rise of Apache Iceberg as Data Lakehouse format and its adoption by major players on the Data industry. Extremely relevant, specially if your are hosting your data platform on Amazon Web Services (AWS) or GCP, and using Snowflake and Confluent Kafka. #apacheiceberg #aws #gcp #snowflake #confluent https://lnkd.in/dhNYk59r

How Iceberg Became the Industry Standard for Data Lakehouse Platforms

medium.com
Like Comment
To view or add a comment, sign in
Tomas Peluritis

Data Engineer @ Wix | Data Engineering, Technical Leadership | Uncle Data
4mo
Report this post
During the weekend, I put all my thoughts into a coherent way. Here comes my talk with fewer jokes and, in a bit more serious way, covering what I talked about at PyCon Lithuania. I feel WAP, as well as other good things like indexes, check constraints and many others, will be added to our beloved cloud DWH, and we could have the best of both worlds (RDBMS and Cloud DWH) You can read the full blog post here: https://lnkd.in/djDK4eDB #pycon #pipelines #dataengineering #Iceberg #Spark

Write-Audit-Publish Pattern in Modern Data Pipelines

uncledata.substack.com
Like Comment
To view or add a comment, sign in

2,107 followers

1,748 Posts

View Profile Follow

Alexandra Keith’s Post

More Relevant Posts

Explore topics