Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hash): Support hash field expiration #2402

Open
wants to merge 21 commits into
base: unstable
Choose a base branch
from

Conversation

jjz921024
Copy link
Contributor

@jjz921024 jjz921024 commented Jul 11, 2024

This pr want to close #2269

  1. Encoding
    To implement hash expiration by field level, we encode the expiration time into the first 8 bytes of value.
    For example:

                         +-------------------------+
    key|version|field => | expire (8 byte) | value |
                         +-------------------------+
    

    And we use the second bit of the flags in metadata to indicate whether encode the expiration into value.
    If all the fields in the hash have no encode expiration time, the flags will be 1 0 0 0 | 0 0 1 0.
    If any field in the hash is encoded into the expiration time, the flags will be 1 1 0 0 | 0 0 1 0.

  2. Flags conversion
    For the HEXPIRE command, it will set an expiration on one or more fields.
    If we execute hexpire command on a hash key that has not been set to expire, this will set the secode bit of flags and encode expiration for all fields (fields that not appear in the command will be set to 0)

  3. Add commands
    According to https://redis.io/docs/latest/commands add a series of related commands:

    • HEXPIRE , HPEXPIRE , HEXPIREAT , HPEXPIREAT
    • HEXPIRETIME , HPEXPIRETIME , HTTL , HPTTL
    • HPERSIST
@jjz921024 jjz921024 changed the title Support hash field expiration Jul 11, 2024
@jjz921024 jjz921024 marked this pull request as ready for review July 11, 2024 12:36
@PragmaTwice
Copy link
Member

PragmaTwice commented Jul 11, 2024

And we use the second bit of the flags in metadata to indicate whether encode the expiration into value.

As we specified in https://kvrocks.apache.org/community/data-structure-on-rocksdb#encoding-version-and-data-type, the first and second (and maybe more bits from the MSB) of flags is used to represent the "encoding version". flags 1 1 0 0 | 0 0 1 0 indicates that the encoding version is 3.

So it should not be used for field expiration. (Also, not all data types need the field expiration feature.)

Instead, you can add a new enum value of data type, or add a new field in the metadata of hash.

@git-hulk
Copy link
Member

@jjz921024 Thanks for your contributions. As @PragmaTwice mentioned, we cannot put the expired flag in the version field and we also need to adopt the SubkeyFilter in compact_filter.cc to recycle expired subkeys.

@jjz921024
Copy link
Contributor Author

jjz921024 commented Jul 12, 2024

we also need to adopt the SubkeyFilter in compact_filter.cc to recycle expired subkeys.

@git-hulk Thanks for you reminding. I have added this feature in latest commit.

About encoding, my previous thought was to use each bit for a different meaning, rather than as an incremental version.
For example:

  • For all type, the first bit indicate use 64bit or 32bit encoding
  • For hash type, the second bit of the flags to indicate whether encode the expiration into value (not affect for other types)

I see this issues #2292. Maybe we should finish it first?

@PragmaTwice
Copy link
Member

PragmaTwice commented Jul 12, 2024

For all type, the first bit indicate use 64bit or 32bit encoding
For hash type, the second bit of the flags to indicate whether encode the expiration into value (not affect for other types)

No. It's too ad hoc and will introduce additional maintanance effort. (the encoding of HASH will be inconsistent with other types, which brings huge understanding, maintanance cost and additional code logic)

And also, it breaks our whole design of "encoding version", e.g. HASH cannot have encoding version larger than 1.

You can consider these options, as I mentioned before.

Instead, you can add a new enum value of data type, or add a new field in the metadata of hash.


-1 for this change if no further commits.

@jjz921024
Copy link
Contributor Author

@PragmaTwice Got it, i will change it.

src/commands/cmd_hash.cc Outdated Show resolved Hide resolved
src/types/redis_hash.cc Outdated Show resolved Hide resolved
src/storage/redis_db.cc Outdated Show resolved Hide resolved
src/storage/redis_metadata.h Outdated Show resolved Hide resolved
src/storage/redis_db.cc Outdated Show resolved Hide resolved
@torwig
Copy link
Contributor

torwig commented Jul 14, 2024

@ jjz921024 Please let me know if I'm wrong. If the data structure contains at least 1 key that can be expired, we set a special bit to true and later check it.

src/types/redis_hash.cc Outdated Show resolved Hide resolved
src/types/redis_hash.cc Outdated Show resolved Hide resolved
src/types/redis_hash.h Outdated Show resolved Hide resolved
src/types/redis_hash.h Outdated Show resolved Hide resolved
src/types/redis_hash.h Outdated Show resolved Hide resolved

private:
rocksdb::Status GetMetadata(Database::GetOptions get_options, const Slice &ns_key, HashMetadata *metadata);
static rocksdb::Status decodeFieldValue(const HashMetadata &metadata, std::string *value, uint64_t &expire);
static rocksdb::Status encodeValueExpire(std::string *value, uint64_t expire);
static bool isMeetCondition(HashFieldExpireType type, uint64_t new_expire, uint64_t old_expire);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name of this function to describe more clearly what it checks? What question does it answer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with what @torwig mentioned, for:

decodeFieldAndTTL should be separated into two methods: decodeExpireFromValue and checkIfFieldExpired, and encodeFieldAndTTL can rename to encodeExpireToValue. @torwig What do you think?

@jjz921024
Copy link
Contributor Author

@ jjz921024 Please let me know if I'm wrong. If the data structure contains at least 1 key that can be expired, we set a special bit to true and later check it.

Yes. Due to the expiration time is encoded in the field, the time complexity of some commands changes from O(1) to O(n), such as type and hlen. If we can check this flag before execution, then the some commands complexity will remains the same when
this feature is not used

@jjz921024
Copy link
Contributor Author

jjz921024 commented Jul 16, 2024

For all type, the first bit indicate use 64bit or 32bit encoding
For hash type, the second bit of the flags to indicate whether encode the expiration into value (not affect for other types)

No. It's too ad hoc and will introduce additional maintanance effort. (the encoding of HASH will be inconsistent with other types, which brings huge understanding, maintanance cost and additional code logic)

And also, it breaks our whole design of "encoding version", e.g. HASH cannot have encoding version larger than 1.

You can consider these options, as I mentioned before.

Instead, you can add a new enum value of data type, or add a new field in the metadata of hash.

-1 for this change if no further commits.

@PragmaTwice Hi, I fix it about encoding version. and add a new field in the metadata of hash to indicate whether the expiration time of field is encoded. please review. thank

@jjz921024
Copy link
Contributor Author

jjz921024 commented Jul 19, 2024

@git-hulk Hi, I had renamed the decodeFieldAndTTL to decodeExpireFromValue, encodeFieldAndTTL to encodeExpireToValue. And rewrite the IsFieldExpired() to make it clearer. please review.

}
};

class CommandHPExpireTime : public CommandFieldExpireBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command HExpireTime/HTTL/HPTTL doesn't support NX/XX/LT/GT flags. By the way, it should be good to remove the ttlExpireExecute/expireFieldExecute and only add a reply helper function like redis::IntArray. @torwig @PragmaTwice What do you think?

src/storage/redis_db.cc Outdated Show resolved Hide resolved
src/types/redis_hash.cc Outdated Show resolved Hide resolved
return storage_->Write(storage_->DefaultWriteOptions(), batch->GetWriteBatch());
}

rocksdb::Status Hash::PersistFields(const Slice &user_key, const std::vector<Slice> &fields, std::vector<int8_t> *ret) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not quite understand why we need to copy ExpireFields to implement the persist fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you meat. Persist a field is equivalent to setting the expiration of the field to 0.

src/types/redis_hash.cc Outdated Show resolved Hide resolved
@jjz921024
Copy link
Contributor Author

jjz921024 commented Jul 27, 2024

Modification:

  1. In hlen command, count the number of fields at hash expiration enable, replace the previous method to avoid loading all of fields into memory.
  2. For most Generic type commands (eg: ttl, type, copy, expire, scan...). If the type is hash, we still need to check each fields. If all of fields was expired, the GetMetadata should return not found.
  3. In del command, if delete a hash object that all of fields expired, so this hash object should be treated as empty and should not affect the deleted_cnt.
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, we can add a config option (e.g. hash-field-expiration yes/no) for users to decide if they want to enable field expiration.

Also our testing can benefit from this option since we can cover the old hash encoding.

@jjz921024
Copy link
Contributor Author

As I mentioned before, we can add a config option (e.g. hash-field-expiration yes/no) for users to decide if they want to enable field expiration.

Also our testing can benefit from this option since we can cover the old hash encoding.

@PragmaTwice Thank for you tip, I have finished this feature. please review.

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments : )

@jjz921024
Copy link
Contributor Author

jjz921024 commented Aug 10, 2024

@git-hulk @PragmaTwice @torwig Sorry to bother your. I have changed the code as review required. May I ask your review it again? I think this is a very useful feature and hope it can be merge to kvrocks.
If there are any changes, please let me know. Thank.

@jjz921024
Copy link
Contributor Author

I had fixed the gofmt error and passed all CI in my fork repo.

https://github.com/jjz921024/kvrocks/actions/runs/10337719500

@@ -167,6 +167,9 @@ struct Config {
int json_max_nesting_depth = 1024;
JsonStorageFormat json_storage_format = JsonStorageFormat::JSON;

// hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name is self-explanatory so you can remove this comment. Otherwise, it would be best if you elaborated on the goal of this variable/option.

@PragmaTwice
Copy link
Member

Seems we need to solve the conflict first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants