Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-175: Extend time based release process #15966

Closed
merlimat opened this issue Jun 7, 2022 · 5 comments
Closed

PIP-175: Extend time based release process #15966

merlimat opened this issue Jun 7, 2022 · 5 comments
Assignees

Comments

@merlimat
Copy link
Contributor

merlimat commented Jun 7, 2022

Motivation

In PIP-47 (https://github.com/apache/pulsar/wiki/PIP-47:-Time-Based-Release-Plan), we have adopted a time-based release plan. This was the first attempt at establishing a new principle on how releases should b

The main two benefits of this approach have been:

  1. Clarity for users and developers on when to expect a release
  2. Breaking a hard relationship between feature and release: a particular feature will be included in the release if it is completed in time. Otherwise, it will be bubbled up to the next release.

The motivation for the current proposal is to extend the existing process to address the issues that we have seen and that were left out of the scope of PIP-47.

Summary of existing issues in the process

Short maintenance cycles for releases

Since we're doing a 3 months release cycle, we are ending with 4 releases done per year, even though it's more close to 3 releases.

There is a high cost to maintain a lot of old releases, backport bug fixes, and security patches. In general, we actively support the last 3 minor releases while continuing to develop the next release. E.g., 2.8, 2.9, and 2.10, while 2.11 is under development.

The result is that a user adopting a particular release is forced to upgrade in a < 1-year timeframe to keep up to date and use a supported release. This timeframe is too short for many users as it imposes a lot of forced upgrades, for which they are not prepared in terms of available time and required effort.

Live Upgrade/Downgrade compatibility path

In Pulsar, we guarantee that users have a way to do live upgrades and downgrades with zero downtime.

This is very powerful because it gives them the freedom to upgrade to a new release with the assurance of being able to roll back to the previous release in case any functional or performance regressions are encountered.

Today, this compatibility is guaranteed across minor versions. Eg: I can do 2.7 -> 2.8 -> 2.7 as a live upgrade.

What is not guaranteed is to "skip" releases. E.g.: 2.7 -> 2.9 might work or not, but it's not guaranteed. In that case an intermediated upgrade would be required: 2.7 -> 2.8 -> 2.9.

The reasons for which the "skip" upgrade might not work are multiple:

  1. Incompatible upgrade of some dependency (e.g., ZooKeeper) that might not be compatible with an older version.
  2. Adoption of a new metadata format or data format on disk.
    Every time we introduce a new incompatible format change (outside of a regular Protobuf field addition), we do it in a 2 steps way:
    • In a new release, we introduce the new feature/format, disabled by default. The new release can read both old and new formats, though it keeps writing the old format by default.
    • In a subsequent release, we change the default to the new format

Note that this consideration is separate from the compatibility between clients and brokers, where we never break compatibility. The oldest available Pulsar client can still talk with the newest Pulsar broker, and vice versa, a new client, will be perfectly fine with an older broker (except the new features won't be working).

Releases getting delayed

Another problem we have been experiencing is that release cycles have been stretching considerably. Part of this has been because we have been reaching the end of the release window, preparing a candidate, and then taking a long time to flush out all issues found at the last minute in the new release.

We need to ensure that we have a date set in stone to deliver the release to users.

Proposal

The proposal to address the above issues is composed of 2 parts.

1. Establish Long Term Support releases

We need to provide a way for users to quickly understand the expected lifecycle timeline of a given release and for that timeline to be long enough not to be a constant update mandate.

At the same time, we need to ensure that we maintainers are not spending all the time just maintaining a huge list of old releases.

For that, we can use the established concept of "Long Term Releases" or LTS.

We will perform LTS releases at a fixed cadence every 18 months, and we will keep doing regular feature releases every 3 months as we're currently doing.

The LTS releases will be identified by being a .0 version. For example:

  • 3.0 -> LTS
  • 3.1 -> regular release
  • 3.2 -> regular release
  • 4.0 -> LTS

The major version bump will not carry any special meaning in terms of "big features" included in the release or breaking API changes. Instead, it would simply signal the type of the release.

Compatibility between releases

It will be guaranteed to be able to do a live upgrade/downgrade between one LTS and the next one.

For example:

  • 3.0 -> 4.0 -> 3.0 : OK
  • 3.2 -> 4.0 -> 3.2 : OK
  • 3.2 -> 4.4 -> 3.2 : OK
  • 3.2 -> 5.0 : Not OK

Release support expectation

We will publish clear guidelines on the Pulsar website regarding the expected timeline for which each release is supported and when the new feature and LTS releases will be available.

The support model will be:

  • LTS
    • Released every 18 months
    • Support for 24 months
    • Security patches for 36 months
  • Feature releases
    • Released every 3 months
    • Support for 6 months
    • Security patches for 6 months

This can be translated into:

  • We support the last 2 LTS releases and the last 2 feature releases
  • Security patches are provided for the past 3 LTS releases and 2 feature releases

Users are therefore encouraged to stay in an LTS release until they are ready to jump into the next LTS unless they want to have access to some of the features included in the latest feature releases.

2. Introduce a code-freeze period in the release cycle

To address the problem with delayed release cycles, we are introducing a code freeze period that will give us time to stabilize the release code while not blocking new changes from being merged into master for the subsequent version.

This code-freeze will only be adopted for LTS/feature releases, not for any patch release.

In a 3 months release cycle, the last 3 weeks will be marked as a code freeze period. The release manager will branch off from master, and will be responsible for selecting the changes that will be cherry-picked in the release branch.

From the code-freeze point, to minimize the risk of delaying the release, only bug fixes involving a regression of behavior compared to a previous release should be allowed. Occasional exceptions will be possible after higher scrutiny of the change.

At the moment of the code freeze, the release manager will also prepare a release candidate in the same way we are doing today. Committers, contributors, and users will test this RC to detect issues as early as possible.

A formal vote by the PMC will not be required at this stage (though any disagreement should be sent out ASAP).

After 1 week, if there are any changes, the release manager will provide a new RC release that the community will test again.

After 1 more week, if there are any changes, a third RC will be prepared, and this will be submitted to vote to the PMC. Otherwise, the vote will be held on an earlier RC release if no issues are found.

The last 1 week will be used for the voting process and for updating Pulsar website and the blog post announcing the release, which should (hopefully) happen on the scheduled day.

@merlimat merlimat self-assigned this Jun 7, 2022
@merlimat merlimat changed the title PIP-175: Extend time release process Jun 7, 2022
@github-actions
Copy link

github-actions bot commented Jul 8, 2022

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 8, 2022
@dave2wave dave2wave removed the Stale label Jul 11, 2022
@dave2wave
Copy link
Member

@merlimat Are you ready to make this official?

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@cbornet

This comment was marked as resolved.

@tisonkun
Copy link
Member

Closed as accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants