[08:00:59] <_joe_> ori: patch merged, now let's see if we managed to break production like in the old times [08:03:17] <_joe_> jayme: are yo aware of this error? [08:03:19] <_joe_> kubernetes1006 rsyslogd[2833055]: error during parsing file /etc/rsyslog.d/30-output-kafka.conf, on or before line 105: queue directory '/var/spool/rsyslog' and file name prefix 'output_kafka_json' already used. This is not possible. Please make it unique. [v8.2102.0 try https://www.rsyslog.com/e/2207 ] [08:58:21] hmm, no [09:02:31] I wonder if it's "true" :-D [09:03:42] <_joe_> oh I can tell you, rsyslog will still load in presence of such an error [09:03:46] <_joe_> and not fail [09:03:51] obviously... [09:04:05] <_joe_> it just ignores that stanza [09:04:50] I meant more like there is an if/else block around those actions, to only one can actually trigger [09:04:57] unrelated, the base version of rsyslog in component/rsyslog-k8s looks odd? The changelog entry which precedes the +wmf1 build mentions an ipv6 patch, but the 8.2102.0-2+deb11u1 in the archive is actually the latest security upload: https://tracker.debian.org/news/1328926/accepted-rsyslog-821020-2deb11u1-source-into-stable-security-embargoed-stable-security/ [09:05:08] but well...what do I know about rsyslog... [09:06:17] moritzm: there is a mysterious fog around the rsyslog version on kubernetes nodes [09:06:52] I think it just materialized out of that [09:10:35] I think that's the most reasonable explanation we can find here [09:13:03] moritzm: is the subtext of your previous comment "go, update your stuff"? :) In that case, I'll venture into the mists again [09:14:43] https://phabricator.wikimedia.org/T277739 is what we have on this matter IIRC [09:17:25] buster can be ignored for k8s or is there anything left? for bullseye I think I'll simply apply the latest security fixes from DSA-5150-1 on top of your existing 8.2102.0-2+deb11u1+wmf1 package, then [09:22:12] FWIW I'm hoping we can converge on >= 8.2102.0-2 on the fleet over time so we won't need our custom enablement of mmkubernetes [09:22:43] moritzm: correct. We don't have k8s nodes on buster anymore [09:24:49] but 8.2102.0-2 still needs the custom mmkubernetes patch? that's what is getting applied in 8.2102.0-2+deb11u1+wmf1 [09:26:51] according to the changelog mmkubernetes is enabled starting 8.2112.0-2, I'm reading https://tracker.debian.org/media/packages/r/rsyslog/changelog-8.2204.1-1 [09:27:12] though I think we were carrying some patches on top of that too [09:28:26] ok, but then we should converge to >= 8.2112.0-2 not >= 8.2102.0-2 :-) [09:28:38] * moritzm disappears into the rsyslog fog again [09:37:50] gah, yes sorry! it must be this damn fog [09:37:57] * godog waves hand in front of face [12:14:43] 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Jelto) Checklist for todays gitlab-replica migation from `gitlab2001` to `gitlab1003`: **Preparations before downtime:** [x] register second service... [12:39:41] 10serviceops, 10Infrastructure-Foundations, 10Scap, 10Release-Engineering-Team (Priority Backlog 📥): Use scap to deploy itself to scap targets - https://phabricator.wikimedia.org/T303559 (10jnuche) [16:07:05] The developer.wikimedia.org service is ready to be connected to the CDN edge -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/800181 [16:16:04] <_joe_> bd808: If you're asking for a review, #-traffic is probably the right place [16:29:58] already reviewed :) [16:31:44] <_joe_> bd808: want me to deploy the change? [16:32:19] <_joe_> vgutierrez: <3 [16:32:30] 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Jelto) Checklist for gitlab migation from `gitlab1001` to `gitlab1004`: **Preparations before downtime:** [x] register second service IPs for `gitlab... [16:43:44] _joe_: oh I can deploy the change as well [16:44:01] <_joe_> vgutierrez: I assumed you were gone by now :) [16:44:19] let me proceed then [16:48:03] thanks both. I'm stuck in a meeting right now :) [16:51:33] ran puppet manually on cp6016 and it's looking good [17:12:56] * bd808 is seeing the site live and is very happy about that [17:12:58] <_joe_> bd808: I can see https://developer.wikimedia.org/ fine in europe, but I'd wait another 10 minutes [17:13:07] <_joe_> ahah well if someone sees it from the wrong cache [17:13:26] <_joe_> echo 'https://developer.wikimedia.org' | mwscript purgeList.php from mwmaint1002 [17:15:36] *nod* we are not announcing widely for about 2 more weeks so a bit of edge stale is fine. My urgency was mostly that I am leaving for vacation at the end of the week and the 'official' launch will happen while I'm offline. [17:16:12] I will spend some time this afternoon trying to verify that things really work :) [17:17:18] thank you very much for the +2 and rollout vgutierrez! [17:17:52] 10serviceops, 10Observability-Logging, 10WMF-General-or-Unknown, 10Patch-For-Review, and 2 others: Ingest logs from scheduled maintenance scripts at WMF in Logstash - https://phabricator.wikimedia.org/T285896 (10Tgr) The patch does not work, see @Joe's [[https://gerrit.wikimedia.org/r/c/operations/puppet/+... [17:17:57] 10serviceops, 10Release-Engineering-Team, 10SRE, 10Scap: Deploy Scap version 4.8.0 - https://phabricator.wikimedia.org/T309116 (10thcipriani) [17:18:15] no problem bd808 [17:18:49] 10serviceops, 10Release-Engineering-Team (GitLab-a-thon 🦊): Debianize releng/jwt-authorizer - https://phabricator.wikimedia.org/T309646 (10dduvall) [17:43:03] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) [17:46:02] 10serviceops, 10SRE, 10Toolhub, 10Patch-For-Review, 10Service-deployment-requests: New Service Request Toolhub - https://phabricator.wikimedia.org/T280881 (10bd808) [17:50:11] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10bd808) [18:45:14] 10serviceops, 10MW-on-K8s, 10Performance-Team (Radar): mw-on-k8s apache config missing cache-control for /static/ files - https://phabricator.wikimedia.org/T309358 (10Krinkle) 05Open→03Resolved [22:24:00] 10serviceops, 10Release-Engineering-Team (GitLab-a-thon 🦊): Debianize releng/jwt-authorizer - https://phabricator.wikimedia.org/T309646 (10Dzahn) We will need an upstream tarball that includes a ./vendor/ directory with all the needed artifacts. "go mod vendor" created such a directory for me when I tried. It... [23:32:14] 10serviceops, 10Data-Persistence-Backup, 10GitLab (Infrastructure), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) Also, after manually starting the "backup-restore" service on gitlab2001, which was still alerting in Icinga, we now have: 23:2... [23:36:59] 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: gitlab-restore: version detection fail / restore fail - https://phabricator.wikimedia.org/T308089 (10Dzahn) After T274463#7966543 ff we now have a working backup-restore service again: <+icinga-wm> RECOVERY - Check systemd state on gitlab2001 is O... [23:37:21] 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: gitlab-restore: version detection fail / restore fail - https://phabricator.wikimedia.org/T308089 (10Dzahn) 05In progress→03Resolved