[08:17:30] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Observability-Logging, and 2 others: Fix Kafka replicas skew - https://phabricator.wikimedia.org/T407185#11418283 (10brouberol) I've generated a rebalancing plan on `kafka-main2008` using ` brouberol@kafka-main2008:~/T407185$ topicmappr rebalanc... [08:36:01] 06serviceops, 10observability: Create a visual representation of where each service is active from, any given time - https://phabricator.wikimedia.org/T327663#11418347 (10MLechvien-WMF) That's nice and should solve the initial need captured by @Clement_Goubert @jijiki when incidents occur. My understanding... [09:02:08] 06serviceops, 06Growth-Team, 10PageViewInfo, 06Content-Transform-Team (Work In Progress), and 3 others: Determine the source of internal requests going through the API gateway. - https://phabricator.wikimedia.org/T410198#11418382 (10akosiaris) >>! In T410198#11417623, @daniel wrote: >>>! In T410198#1141603... [10:18:55] 06serviceops: Proof of Concept: SquareOne Dashboards - https://phabricator.wikimedia.org/T411202#11418728 (10jijiki) [11:49:11] Hi all! I'm struggling with the version number in Chart.yaml files when rebasing patches. On at least one occasion, the version bump vanished in a rebase, and caused confusion during the deployment, because helmfile assumed that the chart didn't change. I'd like to propose a way to mitigate this issue: [11:50:18] Instead of using plain semver strings like 1.20.2, we add a suffix that describes the latest change and matches the topic tag on the gerrit change, e.g. 1.20.2-add-jwt-support. [11:50:29] What do you think of that idea? Any objections? [11:51:26] The Helm docs say that full semver syntax is supported, including suffixes. I tested it locally with minikube, it works fine. I didn't test with helmfile, but I expect helmfile just relies on helm for handling the chart versions. [11:53:01] including the suffix would 1) avoid confusing during manual rebase 2) avoid vanishing bumps during "clean" rebase 3) make it obvious in helm list what change was deployed. [11:56:17] duesen: IIUC adding a "dash suffix" makes the version a pre-release per semver2 definition. That probably adds some confusion as 1.20.2 is "never" than 1.20.2-add-jwt-support given the latter is considered a pre-release [12:00:46] jayme: hm... that wouldn't be a problem if all releases had suffixes. But I can see that it might become problem if it is accidentally omitted. But... does anything actually rely on the ordering of versions? [12:02:32] helmfile will order the versions to check if there is a new version of the chart to deploy, not sure if it even considers pre-release versions without extra flags/config [12:03:47] jayme: `helm install` seems to just install whatever version i point it to, newer or older. What command would take the version into account? [12:04:18] I think helmfile apply does, but I'm not 100% sure [12:05:45] ok thanks. [12:05:51] hnowlan: --^ [12:06:33] * duesen is afk for a hour or so [13:31:14] 06serviceops, 06Data-Engineering, 06Machine-Learning-Team: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11419269 (10achou) [13:31:44] 06serviceops, 06Data-Engineering, 06Machine-Learning-Team: Enable ChangeProp to consume mediawiki.page_content_change.v1 - https://phabricator.wikimedia.org/T409469#11419273 (10achou) 05Open→03Declined [14:21:42] jayme, hnowlan: I realized that the "pre-release" semantics shouldn't be a problem, since we would *also* bump the patch version. So there wouldn't be two chart version that differ only in the dash suffic. At least that's how I envisioned us to use this. [14:22:46] also, as far as I could find out, helmfile only uses the "semantics" of semver to find the latest version of a chart when running `helmfile dep`. [14:33:03] 06serviceops, 10DNS, 06SRE, 06Traffic, 07Language codes: Redirect legacy language codes for Toki Pona to tok.wikipedia.org - https://phabricator.wikimedia.org/T404507#11419541 (10taavi) 05Open→03Resolved a:03taavi [15:29:29] 06serviceops, 06Infrastructure-Foundations, 07OKR-Work: rest gateway: Record x-trusted-request and x-provenance headers in access logs - https://phabricator.wikimedia.org/T411250#11419797 (10CDanis) > Question: are there any restrictions about recording this information in logs for some time (e.g. 90 days)?... [15:29:50] 06serviceops, 07OKR-Work: rest gateway: Record x-trusted-request and x-provenance headers in access logs - https://phabricator.wikimedia.org/T411250#11419798 (10CDanis) [17:00:25] 06serviceops: Draft hCaptcha SLOs, document SLIs - https://phabricator.wikimedia.org/T411256#11420246 (10Raine) p:05Triage→03Medium [17:01:59] 06serviceops: Draft hCaptcha SLOs, document SLIs - https://phabricator.wikimedia.org/T411256#11420268 (10Raine) [17:04:25] /buffer/buffer 17 [17:52:05] rzl: what do you think of the idea of adding suffixes to chart versions (see my message from about 6 hours ago) [17:52:46] mmm I agree with you about the problem but I'm not sure about the solution [17:53:41] I think you can still get into confusing versioning semantics with a race condition if 0.0.1 is live, I write 0.0.2-rzl, and you write 0.0.2-duesen [17:54:39] if we don't do that, i.e. if I write 0.0.2-rzl and you write 0.0.3-duesen, then I think we'd get the same results as with just 0.0.2 and 0.0.3, right? [17:57:21] broadly though I'd like us to find some way to move away from having a chart version at all, although that requires a bit more engineering -- the fact is we already have a timestamped linear progression of chart versions because we keep them in version control, so we shouldn't have to also manage this by hand! like I say though it's a bigger project [17:57:46] semver is nice in principle, it's good to distinguish between major, minor, and patch versions -- but in practice I think it costs us more bookkeeping than it gets us value [17:58:03] ^ [17:58:28] for dependencies between our own charts, we have a monorepo and should treat it as such [18:01:00] (afk briefly for an errand, but still interested) [18:05:52] maybe it's just the charts I've worked on but I don't think I've seen a single chart that consistently does proper semver. which is not exactly unusual for semver-aligned things in general, but it's very prominent in Chart.yaml [18:07:16] heh we have 76 charts with a major and minor of 0 [18:13:14] yeah [18:13:32] and I also don't think that's losing us anything, really [18:27:39] ^ [18:27:56] I don't know of a case where the version bump hasn't just been added friction [18:56:56] 06serviceops, 13Patch-For-Review: hcaptcha-proxy: update service catalog - https://phabricator.wikimedia.org/T411148#11420884 (10Raine) 05Open→03Declined Not needed with the current setup. [19:06:55] we use the suffix and a changelog for opensearch-cluster, it helps us keep track of the differences from upstream https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/opensearch-operator/CHANGELOG.md [19:10:53] I don't know if it's worth the effort for anyone else though [19:12:23] yeah, if you're using an upstream chart and tracking its version number, that's the reasonable exception I think [19:12:37] (does that version number get you some value that the changelog alone wouldn't?) [19:13:07] doubtful ;) . I think we change it because otherwise `helmfile apply` doesn't notice changes [19:13:56] got it, so that part's the same [19:15:12] 06serviceops: Allow dash-suffixes for chart versions - https://phabricator.wikimedia.org/T411411 (10daniel) 03NEW [19:15:47] ... also in that changelog the upstream version is 2.0.0 but our fork is 0.0.9-wmf? [19:16:10] rzl, hnowlan, jayme, cdanis: I belatedly realized this discussion should be on phab. I made a ticket: https://phabricator.wikimedia.org/T411411 [19:16:10] Could you summarize your thought there? [19:17:22] duesen: briefly, after reading the subsequent conversation do you still want to do it? as long as the answer is yes I can take the time to write that up :) [19:18:14] yeah, we started from zero when we forked. No idea if that is best practice. Eventually we will fork from a newer version of upstream, I guess that will make the changelog a bit more interesting ;p [19:18:59] inflatador: that's fine, but if you're numbering back in time, I no longer think it's a good example of a time when semantic chart versioning is adding value :) [19:19:24] nothing wrong with doing it anyway of course [19:20:03] I thought you meant that 2.0.0-wmf was the wmf fork of upstream's 2.0.0 [19:22:40] rzl nah, that would probably been a better idea. But to your point: we aren't really following semver [20:30:14] rzl, hnowlan, jayme, cdanis: ...or I can just copy our conversation from here. [20:30:28] in the middle of doing that now :) [20:30:53] ok thanks :) There's no rush, I'm just about to end my day. [20:38:18] 06serviceops: Allow dash-suffixes for chart versions - https://phabricator.wikimedia.org/T411411#11421223 (10RLazarus) The conversation in #wikimedia-serviceops when this was raised: mmm I agree with you about the problem but I'm not sure about the solution I think you can still get into confusing v... [21:28:47] 06serviceops, 06Connection-Team: MediaWiki periodic job campaignevents-aggregateanswers-metawiki failed - https://phabricator.wikimedia.org/T411417#11421409 (10Daimona) Previously: T410748, T411331, T411383. Like last time (T411383#11420846), I'm not seeing failed run logs in logstash for the last hour, and th...