[00:59:59] (03PS3) 10Krinkle: build: Use disableProcessTimeout() for serve commands only [integration/docroot] - 10https://gerrit.wikimedia.org/r/850647 [01:00:01] (03PS2) 10Krinkle: zuul: remove symlink indirection [integration/docroot] - 10https://gerrit.wikimedia.org/r/850648 [01:00:03] (03PS2) 10Krinkle: zuul: Remove unused code, simplify logic, apply code conventions [integration/docroot] - 10https://gerrit.wikimedia.org/r/850649 [01:00:05] (03PS1) 10Krinkle: zuul: Convert from Bootstrap to WMUI [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 [01:00:07] (03PS1) 10Krinkle: zuul: Improve separation of concerns in HTML formatting code [integration/docroot] - 10https://gerrit.wikimedia.org/r/850676 [01:00:40] (03CR) 10CI reject: [V: 04-1] zuul: Remove unused code, simplify logic, apply code conventions [integration/docroot] - 10https://gerrit.wikimedia.org/r/850649 (owner: 10Krinkle) [01:00:48] (03CR) 10CI reject: [V: 04-1] zuul: Convert from Bootstrap to WMUI [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 (owner: 10Krinkle) [02:36:02] (03PS3) 10Krinkle: zuul: Remove unused code, simplify logic, apply code conventions [integration/docroot] - 10https://gerrit.wikimedia.org/r/850649 [02:36:04] (03PS2) 10Krinkle: zuul: Convert from Bootstrap to WMUI [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 [02:36:06] (03PS2) 10Krinkle: zuul: Improve separation of concerns in HTML formatting code [integration/docroot] - 10https://gerrit.wikimedia.org/r/850676 [02:36:37] (03CR) 10CI reject: [V: 04-1] zuul: Convert from Bootstrap to WMUI [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 (owner: 10Krinkle) [02:39:22] (03CR) 10Krinkle: "recheck" [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 (owner: 10Krinkle) [02:54:02] (03PS3) 10Krinkle: zuul: Convert from Bootstrap to WMUI [integration/docroot] - 10https://gerrit.wikimedia.org/r/850675 [02:54:04] (03PS3) 10Krinkle: zuul: Improve separation of concerns in HTML formatting code [integration/docroot] - 10https://gerrit.wikimedia.org/r/850676 [03:10:17] 10Beta-Cluster-Infrastructure: File on betacommons shows usage on production and links nonexistent beta project - https://phabricator.wikimedia.org/T301997 (10matmarex) There used to be a beta ptwikibooks until 2013: 0d109905e08cf3ec48c8df00067976a6895ff666 It wasn't documented until 2017 that when deleting a w... [04:35:41] 10Continuous-Integration-Config, 10PHP 8.1 support, 10Patch-For-Review: Make PHP 8.1 voting on MW master - https://phabricator.wikimedia.org/T316078 (10tstarling) >>! In T316078#8341694, @Jdforrester-WMF wrote: > CI has 8.1 voting for MediaWiki core itself and MediaWiki vendor, but it's not running for exten... [11:34:35] (03PS6) 10FNegri: tox-poetry-buster: upgrade poetry install script and upgrade to 1.2.2 [integration/config] - 10https://gerrit.wikimedia.org/r/850515 (https://phabricator.wikimedia.org/T321915) (owner: 10David Caro) [12:46:05] 10Continuous-Integration-Infrastructure, 10CheckUser, 10Browser-Tests, 10User-zeljkofilipin: Allow the jenkins selenium tests to grant the test user the checkuser group/right for testing - https://phabricator.wikimedia.org/T321965 (10zeljkofilipin) [12:49:53] 10Continuous-Integration-Infrastructure, 10CheckUser, 10Browser-Tests, 10User-zeljkofilipin: Allow the jenkins selenium tests to grant the test user the checkuser group/right for testing - https://phabricator.wikimedia.org/T321965 (10Dreamy_Jazz) In theory the maintenance script "createAndPromote.php" coul... [14:02:37] Hi there! The Campaigns team would like to deploy the CampaignEvents extension to the test wikis this week, and according to the documentation we need a dedicated deployment window. We were thinking of scheduling that for Wednesday at 15:00 UTC. Would there be some deployer who's willing to help and could deploy our patches? We can also agree on a different time. Pinging thcipriani too, as suggested in the docs. Thanks in advance! [14:14:34] And sorry, I forgot to link the task: https://phabricator.wikimedia.org/T318592 [15:17:03] Daimona: thanks for reaching out! Wednesdday at 15:00 is the tech department all-hands meeting, so that might be tricky for us. What are you deploying in that window, specifically? [15:18:37] Whoops. As I said, we're flexible with the time :-) We would like to first create the extension schema, and this is something we can do ourselves (but we would appreciate some guidance). Then there are the two config patches at https://phabricator.wikimedia.org/T318592 [15:20:57] the config patches look standard enough that I'd say we could fit these into our backport window if you're ok with that arrangement? For the schema changes, I'd recommend getting with data persistence/dba folks---they're much more expert than we are and may spot things we miss. [15:21:21] s/may/will/ :D [15:23:45] Yes, a "standard" window works for us! I'll add the patches to one that works for us, then. Re schema, sure, I'll coordinate with DBAs on phab. Thank you! [15:24:07] thank you for reaching out. [15:33:19] Krinkle: I'm open to ideas on where to move that meeting. It's tricky. Originally it was on Wednesday, pre-train (sometimes), and that didn't give us much signal about the train. Current train-log-triage is in a nice timeslot, except that this is such a tempting timeslot for everything else :\ [15:44:13] 10Continuous-Integration-Config: Re-enable PHP 8.1 CI on RemexHTML - https://phabricator.wikimedia.org/T311450 (10Jdforrester-WMF) [15:44:43] (03PS1) 10Thcipriani: Calendar: empty schedule should be ok [tools/release] - 10https://gerrit.wikimedia.org/r/851096 [15:45:41] (03CR) 10Thcipriani: [C: 03+2] Calendar: empty schedule should be ok [tools/release] - 10https://gerrit.wikimedia.org/r/851096 (owner: 10Thcipriani) [15:45:50] (03PS1) 10Jforrester: Zuul: [mediawiki/libs/RemexHtml] Re-nable PHP 8.1 CI [integration/config] - 10https://gerrit.wikimedia.org/r/851097 (https://phabricator.wikimedia.org/T311450) [15:46:26] (03Merged) 10jenkins-bot: Calendar: empty schedule should be ok [tools/release] - 10https://gerrit.wikimedia.org/r/851096 (owner: 10Thcipriani) [15:47:42] (03PS2) 10Jforrester: Zuul: [mediawiki/libs/RemexHtml] Re-enable PHP 8.1 CI [integration/config] - 10https://gerrit.wikimedia.org/r/851097 (https://phabricator.wikimedia.org/T311450) [15:47:45] (03CR) 10Jforrester: [C: 03+2] Zuul: [mediawiki/libs/RemexHtml] Re-enable PHP 8.1 CI [integration/config] - 10https://gerrit.wikimedia.org/r/851097 (https://phabricator.wikimedia.org/T311450) (owner: 10Jforrester) [15:47:47] deployment-calendar-bot fixed \o/ [15:49:31] (03Merged) 10jenkins-bot: Zuul: [mediawiki/libs/RemexHtml] Re-enable PHP 8.1 CI [integration/config] - 10https://gerrit.wikimedia.org/r/851097 (https://phabricator.wikimedia.org/T311450) (owner: 10Jforrester) [15:50:05] !log Zuul: [mediawiki/libs/RemexHtml] Re-enable PHP 8.1 CI for T311450 [15:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:50:07] T311450: Re-enable PHP 8.1 CI on RemexHTML - https://phabricator.wikimedia.org/T311450 [15:51:22] 10Continuous-Integration-Config, 10Patch-For-Review: Re-enable PHP 8.1 CI on RemexHTML - https://phabricator.wikimedia.org/T311450 (10Jdforrester-WMF) 05Stalled→03Resolved a:03Jdforrester-WMF [15:51:27] 10Continuous-Integration-Config, 10PHP 8.1 support, 10Patch-For-Review: Address PHP 8.1 job failures on various PHP libs - https://phabricator.wikimedia.org/T307506 (10Jdforrester-WMF) [15:51:36] 10Continuous-Integration-Config, 10PHP 8.1 support, 10Patch-For-Review: Address PHP 8.1 job failures on various PHP libs - https://phabricator.wikimedia.org/T307506 (10Jdforrester-WMF) [15:51:38] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T320515 (10thcipriani) p:05Triage→03Medium a:03brennen [15:52:04] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T320517 (10thcipriani) p:05Triage→03Medium a:03dancy [15:52:34] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T320518 (10thcipriani) p:05Triage→03Medium a:03demon [15:53:08] 10Release-Engineering-Team (Priority Backlog 📥), 10Release, 10Train Deployments: 1.40.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T320519 (10thcipriani) p:05Triage→03Medium a:03hashar [15:53:32] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Stretch Deprecation): Cloud VPS "deployment-prep" project Stretch deprecation - https://phabricator.wikimedia.org/T306068 (10Andrew) [15:54:27] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Stretch Deprecation): Cloud VPS "deployment-prep" project Stretch deprecation - https://phabricator.wikimedia.org/T306068 (10Andrew) [15:56:37] 10Continuous-Integration-Config, 10PHP 8.1 support, 10Patch-For-Review: Make PHP 8.1 voting on MW master - https://phabricator.wikimedia.org/T316078 (10Jdforrester-WMF) >>! In T316078#8355578, @tstarling wrote: >>>! In T316078#8341694, @Jdforrester-WMF wrote: >> CI has 8.1 voting for MediaWiki core itself an... [15:57:02] 10Continuous-Integration-Config, 10PHP 8.1 support, 10Patch-For-Review: Make PHP 8.1 voting on development (master) branch of MW ecosystem (core, extensions, skins, libraries) - https://phabricator.wikimedia.org/T316078 (10Jdforrester-WMF) [16:01:51] thcipriani: ack, Monday or Wednesday might work, agreed that more group1 exposure is useful but catching stuff from group2 of prior week is useful too, and the regular administration or the board/triage keeping it up, better than not doing it for a week maybe [16:04:36] hrm, Monday might be useful for train hand-off as well. Although that's the day where most holidays seem to land. I'll squint at the calendar today and try to get a sense of whether/how much that improves things. [16:18:04] (03CR) 10Hashar: [C: 03+2] Zuul: [mediawiki/extensions/HTMLPurifier] Add [integration/config] - 10https://gerrit.wikimedia.org/r/850544 (owner: 10Umherirrender) [16:19:58] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/HTMLPurifier] Add [integration/config] - 10https://gerrit.wikimedia.org/r/850544 (owner: 10Umherirrender) [16:21:00] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/850544 (owner: 10Umherirrender) [16:23:10] (03CR) 10Hashar: [C: 03+2] build: Use disableProcessTimeout() for serve commands only [integration/docroot] - 10https://gerrit.wikimedia.org/r/850647 (owner: 10Krinkle) [16:23:56] (03Merged) 10jenkins-bot: build: Use disableProcessTimeout() for serve commands only [integration/docroot] - 10https://gerrit.wikimedia.org/r/850647 (owner: 10Krinkle) [16:55:06] PROBLEM - Host contint1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:02:22] oh, this is not good but the better part is that 2001 is production [17:04:03] thcipriani: concint1001 died. luckily contint2001 is prod [17:04:27] not sure if this should be a more obvious alert though [17:04:32] or an email [17:04:37] died!? eek. [17:04:43] 16:55 < icinga-wm> PROBLEM - Host contint1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:04:46] just went away [17:05:10] I can try mgmt console [17:05:47] yes please [17:07:38] as long as gearman is alive on contint1001, I think CI should be fine, zuul merger on 1001 will presumably stop claiming jobs, but CI will be slower with just one merger. [17:07:40] ack, trying [17:07:56] er, as long as gearman is alive on contint2001, that is [17:10:52] got on mgmt, joined console, no output except: [17:10:56] [* [17:11:00] can't leave console :p [17:11:09] connecting again, resetting console to get on [17:11:13] to run powercycle [17:12:34] ERROR: The syntax of the command specified is not correct. [17:12:50] wtf, it's always been the same [17:13:31] Server power operation successful [17:13:43] admin1-> racadm serveraction powercycle [17:14:14] seeing BIOS / bootup [17:15:09] * thcipriani holds on to butts [17:15:46] Debian GNU/Linux 10 contint1001 ttyS1 [17:15:46] contint1001 login: [17:16:08] RECOVERY - Host contint1001 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [17:16:11] thcipriani: I can ssh again [17:16:23] same [17:18:02] thcipriani: it's one of those where you can't do much and there is nothing in logs [17:18:15] syslog is just like "yea, normal..and then reboot" [17:18:28] and this is "happens sometimes". especially on older hosts [17:19:04] it's happened before.. host just goes offline, mgmt console is frozen, you powercycle.. and it's like it never happened [17:19:39] now we can check if that log for hardware issues has anything in DRAC [17:19:51] if it does.. we can ask dcops/Dell. but if that also has nothing it's just shrug [17:20:58] :\ [17:21:04] yeah, digging in logs I'm also coming up empty [17:21:26] ah, it's RAM [17:21:32] we can ask for a replacement DIMM [17:21:39] Date/Time: 10/31/2022 16:37:12 [17:21:39] Source: system [17:21:39] Severity: Critical [17:21:41] Description: Multi-bit memory errors detected on a memory device at location(s) DIMM_A1. [17:22:00] checks warranty status [17:22:43] nope, too old [17:23:19] where did we land with: https://phabricator.wikimedia.org/T294276 ? [17:23:39] thcipriani: yea, best that can be done is if that would be prioritized [17:23:51] what I know is on the ticket [17:25:45] 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) contint1001 went down today unexpectedly. just went offline with 100% packet loss. We were able to bring it back with a powercycle via mgmt console. `racadm getsel` shows it... [17:26:46] 10Release-Engineering-Team (Seen), 10serviceops-collab: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) 16:55 < icinga-wm> PROBLEM - Host contint1001 is DOWN: PING CRITICAL - Packet loss = 100% 17:02 < mutante> oh, this is not good but the better part is that 2001 is production .... [20:20:36] Do we support running puppet CI tests on bullseye? I have a test which I know will fail on buster but work on bullseye... despite my attempts I keep getting buster-specific failure notices. https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/53519/console [20:26:47] it doesn't looks like it grepping around the integration/config repo. "buster-docker" is the only test there. And the image we build for CI is based on buster, too: https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/refs/heads/master/dockerfiles/operations-puppet/Dockerfile.template#1 [21:55:25] hm, so I can choose between making new puppet classes just for testing or removing the tests. Neither will be popular :( [21:57:14] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T320512 (10thcipriani) 05Open→03Resolved [21:59:38] we could update the tests, if production is ready for that. SRE have mostly been keepers of puppet tests. It seems like we'll want to upgrade to test image bullseye soon. But I don't know the current production status if I'm honest.