[01:31:01] Project mediawiki-core-doxygen build #18505: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18505/ [01:41:40] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: setup 2 contint machines for jenkins - https://phabricator.wikimedia.org/T418521#11704978 (10Dzahn) The following is a known issue that we have when puppet runs for the very first time on a ma... [02:31:22] Yippee, build fixed! [02:31:22] Project mediawiki-core-doxygen build #18506: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18506/ [08:59:15] 10Beta-Cluster-Infrastructure: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946 (10dom_walden) 03NEW [09:13:53] !log upgrade kafka-jumbo and kafka-main to Confluent 7.7 (pre-requisite before being able to upgrade to Trixie) [09:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:16:13] elukey: wrong channel? [09:18:37] taavi: nono I just forgot to add "deployment-prep" :D [09:19:00] fixed the sal [09:19:31] it is all prep work to upgrade production at some point, but I won't do it on a Friday :D [09:19:37] boooring [09:52:23] (03CR) 10Hashar: "recheck" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) (owner: 10Hashar) [10:21:06] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Migration to MariaDB operator: Shared environment DB - https://phabricator.wikimedia.org/T408115#11705657 (10jnuche) 05Open→03Resolved [11:16:26] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11705977 (10A_smart_kitten) Can repro (tried on beta-enwiki in a private b... [12:14:27] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11706148 (10Tgr) Logstash: https://beta-logs.wmcloud.org/goto/fe9ab61fa8f3... [12:43:38] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for Spandan1104 - https://phabricator.wikimedia.org/T419977 (10Spandan1104) 03NEW [13:04:01] !log re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) [13:04:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:04:28] FIRING: InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:04:37] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-kafka-logging01 is down - https://phabricator.wikimedia.org/T419981 (10wmcs-alerts) 03NEW [13:09:28] RESOLVED: InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:39:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [13:39:36] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T419988 (10wmcs-alerts) 03NEW [13:41:28] FIRING: InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:46:28] RESOLVED: InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:49:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:18:54] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Put a limit on demos created by ci - https://phabricator.wikimedia.org/T417304#11706820 (10jnuche) As an interesting note, here's an histogram of Catalyst environment usage by CI over time. We saw a significant jump ea... [14:31:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:31:34] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://phabricator.wikimedia.org/T420001 (10wmcs-alerts) 03NEW [14:52:21] (03PS1) 10Jforrester: Zuul: Temporarily make wikilambda-catalyst-end-to-end non-voting again [integration/config] - 10https://gerrit.wikimedia.org/r/1251377 [14:52:26] (03PS2) 10Jforrester: Zuul: Temporarily make wikilambda-catalyst-end-to-end non-voting again [integration/config] - 10https://gerrit.wikimedia.org/r/1251377 [15:00:48] 06Release-Engineering-Team (Priority Backlog 📥), 07Essential-Work, 05Release, 05Train Deployments: 1.46.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T413810#11707104 (10thcipriani) 05Open→03Resolved [15:13:21] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Bitu, 10CAS-SSO, and 3 others: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#11707220 (10Arendpieter) 05Open→03Resolved [15:15:06] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Bitu, 10CAS-SSO, and 3 others: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#11707227 (10taavi) @Arendpieter AFAICS the patch here is still open, how is this Resolved? [15:15:54] 10Continuous-Integration-Infrastructure, 10Bitu, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#11707230 (10Arendpieter) 05Resolved→03Open [15:15:58] 10Continuous-Integration-Infrastructure, 10Bitu, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#11707232 (10taavi) [15:16:07] 10Continuous-Integration-Infrastructure, 10Bitu, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: Update basedn in CAS - https://phabricator.wikimedia.org/T371930#11707233 (10Arendpieter) Sorry, my bad. [16:02:30] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11707459 (10bd808) Possibly related to my reboot of deployment-sessionstor... [16:10:49] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11707526 (10bd808) `lang=shell-session bd808@deployment-sessionstore06:~$... [16:48:36] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11707708 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/in... [16:51:08] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11707711 (10bd808) 05Open→03Resolved a:03bd808 I am able to log... [16:52:34] dwalden: I think that I got beta logins working again. [16:55:29] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-AuthManager, 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: Cannot login to beta: "There was an unexpected error logging in" - https://phabricator.wikimedia.org/T419946#11707750 (10A_smart_kitten) Confirming I can now also log in/out :] [17:16:28] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:16:33] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420033 (10wmcs-alerts) 03NEW [17:18:01] 10Beta-Cluster-Infrastructure: deployment-kafka-logging01 is down for maintenance - https://phabricator.wikimedia.org/T420034 (10elukey) 03NEW p:05Triage→03High [17:21:28] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:22:35] 10Beta-Cluster-Infrastructure: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11707956 (10bd808) [17:23:35] Project beta-code-update-eqiad build #591483: 04FAILURE in 35 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591483/ [17:23:55] Project mediawiki-core-doxygen build #18525: 04FAILURE in 32 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18525/ [17:24:31] FIRING: [2x] ProbeDown: Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit2003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:24:32] Project mediawiki-core-doxygen build #18526: 04STILL FAILING in 36 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18526/ [17:24:43] 06Release-Engineering-Team, 06collaboration-services: ProbeDown - https://phabricator.wikimedia.org/T420037 (10phaultfinder) 03NEW [17:27:36] 10Beta-Cluster-Infrastructure: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11708015 (10bd808) [17:27:38] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-kafka-logging01 is down - https://phabricator.wikimedia.org/T419981#11708018 (10bd808) →14Duplicate dup:03T420034 [17:29:31] RESOLVED: [2x] ProbeDown: Service gerrit2003:443 has failed probes (http_gerrit_tls_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#gerrit2003:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:32:43] 10MediaWiki-Releasing, 10MediaWiki-extensions-CodeMirror: Bundle Extension:CodeMirror with MediaWiki core - https://phabricator.wikimedia.org/T391926#11708037 (10MusikAnimal) [17:35:12] Yippee, build fixed! [17:35:12] Project beta-code-update-eqiad build #591484: 09FIXED in 2 min 12 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/591484/ [17:40:35] (03update) 10jforrester: Provide initial trixie images [repos/releng/dev-images] - 10https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/81 [17:41:28] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:46:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:47:47] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708088 (10Aklapper) @Tarrow, @Ollie.Shotton_WMDE and @aklapper had an in-person conversation (with additional input fr... [17:54:57] brennen (and FYI thcipriani): Hej, WMDE folks at the Hackathon tricked me into discussing "listing open GitHub patches in a Phab task". They're interested in working on some prototype code here and now. Would appreciate another brain and pair of eyes take a quick look at https://phabricator.wikimedia.org/T415903#11708088 so they're not completely off and don't waste time. TIA <3 [17:56:44] hrm [17:57:23] sorry, I wouldn't ping if they weren't like "we want to code on this" and I wasn't like "I want another opinion on this" :-/ [17:57:33] yeah, just thinking it over [17:57:49] what _kind_ of thing do people want to work on? [17:58:33] in my understanding, "maybe poking some prototype code" kind of [17:58:48] but yeah it's a bit of a chicken and egg problem, also because which approach to take [17:58:49] i am still pretty wary of yet another thing in the page load loop that talks to a (third party) API but i get what you're saying about standing up a whole service feeling like a big lift [17:59:04] yeah [17:59:19] doing it right versus getting something done :-/ [18:00:19] maybe the "bot to write into a custom field" approach could be an interesting try... [18:00:44] what about: roll us a library that very definitely gets the results you want from github? [18:01:16] e.g. here's a chunk of code that talks to that API and hands back all the stuff you want to see in phab. [18:02:08] typing things in the task here: my big worry is rate limits [18:02:20] and then we can decide later whether that makes the most sense to attach to phorge directly or to encapsulate in some service. [18:02:23] hmm, I guess that's what I kinda expected in the bot code itself, but maybe could also work [18:03:17] we all agreed that push (on-demand) would be much nicer than regular pull, but not sure how to switch to that model [18:04:02] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708135 (10thcipriani) >>! In T415903#11708088, @Aklapper wrote: > @Tarrow, @Ollie.Shotton_WMDE and @aklapper had an in... [18:04:30] i see mention of a couple of bots [18:05:23] in any case, I should finally get dinner, and again sorry to drop this on a Friday but those motivated people here somehow managed to track me down, and I was "I want further thoughts on this" :) [18:05:56] push on demand is interesting. Some kind of phab bot as a github action. [18:07:10] it would be nice if like... there were some sort of background service that phorge used to collect stuff in a rate-limited way, and a push from various services could prompt it to put things in its queue [18:07:18] or something like that [18:09:39] running out of battery on this laptop but going to stick around here on my phone; in any case thanks for any thoughts on that ticket! <3 [18:09:44] I mean, seems possible to add some phd daemon, though I'm unaware of anyone having done that [18:09:53] enjoy dinner, andre! [18:10:04] will put some stuff on the ticket [18:10:19] like that's what handles git repos already, could handle external forges, too [18:10:25] enjoy dinner! [18:10:42] yeah, i don't know what that would entail. [18:14:35] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708162 (10thcipriani) other random thought: I wonder if this is a good use-case for a [[https://we.phorge.it/book/phor... [18:15:30] yeah, same, threw it out on ^. Hackathon might be a good time to explore it ¯\_(ツ)_/¯ [18:16:14] or maybe it's a cursed idea. I have many of those. [18:24:38] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708177 (10brennen) Prompted by @Aklapper, who's currently at a hackathon with folks wanting to work on this, some Frid... [18:30:59] Yippee, build fixed! [18:30:59] Project mediawiki-core-doxygen build #18527: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/18527/ [18:40:04] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708196 (10brennen) One other thought that just came to mind. We could: - Only show this stuff to logged-in users. - O... [19:07:42] 10Continuous-Integration-Config, 06Release-Engineering-Team (Priority Backlog 📥), 10Browser Test Platform, 10MediaWiki-Core-Tests, and 9 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730#11708317 (10hashar) [19:10:50] 10Continuous-Integration-Config, 06Release-Engineering-Team (Priority Backlog 📥), 10Browser Test Platform, 10MediaWiki-Core-Tests, and 9 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730#11708340 (10hashar) When looking at T418369, I found GrowthExpe... [19:25:37] (03PS3) 10Hashar: Split BrowserTests duration reports [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250584 (https://phabricator.wikimedia.org/T419683) [19:26:38] (03CR) 10Hashar: [C:03+2] "Thank you JAmes!" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250608 (https://phabricator.wikimedia.org/T419675) (owner: 10Hashar) [19:42:21] (03Merged) 10jenkins-bot: build: replace flake8-logging-format by flake8-logging [integration/quibble] - 10https://gerrit.wikimedia.org/r/1250608 (https://phabricator.wikimedia.org/T419675) (owner: 10Hashar) [19:54:11] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Web Task Creation Form - https://phabricator.wikimedia.org/T420050 (10Jdlrobson-WMF) 03NEW [19:54:16] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Support relative size testing in bundlesize test - https://phabricator.wikimedia.org/T420050#11708494 (10Jdlrobson-WMF) [19:57:19] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Support relative size testing in bundlesize test - https://phabricator.wikimedia.org/T420050#11708497 (10taavi) [20:11:28] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:16:55] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420033#11708563 (10bd808) →14Duplicate dup:03T415021 [20:16:56] 10Beta-Cluster-Infrastructure: Caassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11708561 (10bd808) [20:19:45] 10Beta-Cluster-Infrastructure: Caassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11708585 (10bd808) `lang=shell-session bd808@deployment-sessionstore06:~$ w 20:18:51 up 22:55, 2 users, load average: 32.13, 31.... [20:21:28] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:22:03] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Caassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11708593 (10bd808) [20:23:47] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-kafka-logging01 is down for maintenance because Trixie is not yet well supported - https://phabricator.wikimedia.org/T420034#11708604 (10bd808) [20:23:49] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-kafka-logging01 on project deployment-prep - https://phabricator.wikimedia.org/T420001#11708606 (10bd808) →14Duplicate dup:03T420034 [20:27:53] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T419988#11708613 (10bd808) 05Open→03Invalid No stale certs found by `clean-stale-puppet-certs`. [20:39:05] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708685 (10Tarrow) > I'd bet the current approach with Gerrit/GitLab would get us caught up in rate limits (I certainly... [20:48:12] 10Phabricator: Phabricator GitLab widget gives a lot of attention to gerritlab generated branch names - https://phabricator.wikimedia.org/T420059 (10taavi) 03NEW [20:50:13] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061 (10bd808) 03NEW [20:52:26] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708794 (10bd808) 05Open→03In progress a:03bd808 There are several hiera settings that need to be updated: `lang=shell-session bd808@mbp03:~/projects/wmf... [20:52:39] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708800 (10bd808) [20:52:41] 10Beta-Cluster-Infrastructure, 10Scap, 10Observability-Logging: Setup service name for Beta Cluster access to logstash service in logging project - https://phabricator.wikimedia.org/T409363#11708801 (10bd808) [21:08:08] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708833 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/6221767781effd315894de5c1d1df4d14fdf05b9%5E%21/#F0 ` diff --git a/depl... [21:20:08] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708885 (10bd808) All good except for the fact that `logs-api.svc.logging.eqiad1.wikimedia.cloud` is the actual service name. :facepalm: Will fix. [21:26:34] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708902 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/863a7bf4a1be4aaa724c436b92c66266ae49efbc%5E%21/#F0 ` diff --git a/depl... [21:30:14] 10Phabricator, 06Release-Engineering-Team, 10Wikibase Cloud, 10Wikimedia-Phabricator-Extensions: Create Phabricator CustomField for GitHub PRs - https://phabricator.wikimedia.org/T415903#11708903 (10Ollie.Shotton_WMDE) I was also having some discussions over dinner. I had originally ruled out using webhook... [21:33:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:33:38] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420064 (10wmcs-alerts) 03NEW [21:43:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [21:54:39] 10Beta-Cluster-Infrastructure, 10Scap, 10Observability-Logging: Setup service name for Beta Cluster access to logstash service in logging project - https://phabricator.wikimedia.org/T409363#11708970 (10bd808) [21:59:54] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11708985 (10bd808) More old cruft: `lang=shell-session bd808@mbp03:~/projects/wmf/cloud/instance-puppet/deployment-prep$ git grep deployment-logstash deployment... [22:02:07] 10Beta-Cluster-Infrastructure: Delete orphaned host-specific and unused prefix hiera settings to reduce confusion about valid and active config - https://phabricator.wikimedia.org/T420068 (10bd808) 03NEW [22:03:33] 10Beta-Cluster-Infrastructure: Delete orphaned host-specific and unused prefix hiera settings to reduce confusion about valid and active config - https://phabricator.wikimedia.org/T420068#11709006 (10bd808) I knew this sounded familiar: {T409989} [22:04:38] (03CR) 10C. Scott Ananian: [C:03+2] Non-MFE: Render images as transparent PNG to support better diffing [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1249079 (owner: 10Subramanya Sastry) [22:06:06] (03Merged) 10jenkins-bot: Non-MFE: Render images as transparent PNG to support better diffing [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1249079 (owner: 10Subramanya Sastry) [22:10:03] 10Beta-Cluster-Infrastructure: Cassandra on deployment-sessionstore06 trying to log to deployment-logstash03 - https://phabricator.wikimedia.org/T420061#11709018 (10bd808) 05In progress→03Resolved [22:12:07] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420064#11709031 (10bd808) →14Duplicate dup:03T415021 [22:12:08] 10Beta-Cluster-Infrastructure, 10Cassandra, 06Data-Persistence: Caassandra killed by oom-killer and prometheus scrapes failing intermittently on deployment-sessionstore06 - https://phabricator.wikimedia.org/T415021#11709029 (10bd808) [22:53:55] 10GitLab (Integrations), 10Phabricator: Phabricator GitLab widget gives a lot of attention to gerritlab generated branch names - https://phabricator.wikimedia.org/T420059#11709110 (10A_smart_kitten) [23:07:39] (03PS1) 10Umherirrender: Zuul: [mediawiki/extensions/HoneyPot] Add quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/1251572 [23:11:32] FIRING: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [23:11:42] 10Beta-Cluster-Infrastructure: Project deployment-prep instance deployment-sessionstore06 is down - https://phabricator.wikimedia.org/T420071 (10wmcs-alerts) 03NEW [23:21:32] RESOLVED: InstanceDown: Project deployment-prep instance deployment-sessionstore06 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown