[00:06:36] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] service::catalog: add 'team' attribute [puppet] - 10https://gerrit.wikimedia.org/r/1214473 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[00:21:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:24:44] <wikibugs>	 (03CR) 10Cwhite: Followup I81a2c4de77: Verify stats label values are not empty (031 comment) [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1214647 (https://phabricator.wikimedia.org/T411585) (owner: 10Jforrester)
[00:30:39] <wikibugs>	 (03CR) 10Catrope: [C:03+1] OATHAuth: Remove wmgOATHAuthDisableRight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214659 (https://phabricator.wikimedia.org/T399664) (owner: 10Mstyles)
[00:31:12] <icinga-wm>	 PROBLEM - dump of s5 in codfw on backupmon1001 is CRITICAL: dump for s5 at codfw (db2201) taken more than a week ago: Most recent backup 2025-11-25 00:00:05 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[00:31:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[00:40:11] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[00:40:20] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1214682
[00:40:20] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1214682 (owner: 10TrainBranchBot)
[00:49:02] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] Blackbox/check: strengthen suffix matching regex in generated rules [puppet] - 10https://gerrit.wikimedia.org/r/1208365 (https://phabricator.wikimedia.org/T410745) (owner: 10Tiziano Fogli)
[00:49:38] <wikibugs>	 (03CR) 10Cwhite: [C:03+1] sre: multi-team ProbeDown [alerts] - 10https://gerrit.wikimedia.org/r/1214478 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[00:51:27] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] Add astein to analytics-privatedata-users. [puppet] - 10https://gerrit.wikimedia.org/r/1214658 (https://phabricator.wikimedia.org/T411679) (owner: 10Andrea Denisse)
[00:51:42] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] Add astein to analytics-privatedata-users. [puppet] - 10https://gerrit.wikimedia.org/r/1214658 (https://phabricator.wikimedia.org/T411679) (owner: 10Andrea Denisse)
[00:53:19] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1214682 (owner: 10TrainBranchBot)
[00:56:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:00:40] <logmsgbot>	 !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
[01:06:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:10:52] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1214690
[01:10:52] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1214690 (owner: 10TrainBranchBot)
[01:18:28] <logmsgbot>	 !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 17m 47s)
[01:20:11] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[01:23:14] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
[01:23:22] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86394 and previous config saved to /var/cache/conftool/dbconfig/20251204-012321-ladsgroup.json
[01:23:25] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[01:32:06] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1214690 (owner: 10TrainBranchBot)
[01:35:11] <jinxer-wm>	 FIRING: [8x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[01:41:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:46:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[01:49:20] <wikibugs>	 (03PS1) 10Pppery: Remove old list of translated languages [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1214701
[01:56:46] <wikibugs>	 (03PS1) 10Pppery: Add .gitreview [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1214702
[01:57:04] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214659 (https://phabricator.wikimedia.org/T399664) (owner: 10Mstyles)
[01:57:51] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:xe-3/2/1 (Transport: cr1-esams:xe-0/0/7 (Colt, 445419311 80ms 10Gbps wave) {#2013}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[01:58:10] <jinxer-wm>	 FIRING: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.59.149 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[01:58:39] <jinxer-wm>	 FIRING: [3x] CoreBGPDown: Core BGP session down between cr1-esams and cr2-eqiad (185.15.59.148) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[01:58:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[02:02:51] <jinxer-wm>	 RESOLVED: [2x] CoreRouterInterfaceDown: Core router interface down - cr1-esams:xe-0/0/7 (Transport: cr2-eqiad:xe-3/2/1 (Colt, 445419311 80ms 10Gbps wave) {#30385}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[02:03:10] <jinxer-wm>	 RESOLVED: [2x] BFDdown: BFD session down between cr2-eqiad and 185.15.59.149 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[02:03:39] <jinxer-wm>	 RESOLVED: [4x] CoreBGPDown: Core BGP session down between cr1-esams and cr2-eqiad (185.15.59.148) - group Confed_eqiad - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[02:03:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[02:10:22] <wikibugs>	 (03PS1) 10Pppery: Replace "libphutil" with "Arcanist" [phabricator/translations] (wmf/stable) - 10https://gerrit.wikimedia.org/r/1214708
[02:29:42] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:29:42] <icinga-wm>	 PROBLEM - OSPF status on cr2-magru is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:31:39] <jinxer-wm>	 FIRING: CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-magru (2a02:ec80:700:fe0b::2) - group Confed_magru - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_magru&var-bgp_neighbor=cr2-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[02:31:42] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:31:42] <icinga-wm>	 RECOVERY - OSPF status on cr2-magru is OK: OSPFv2: 2/2 UP : OSPFv3: 2/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[02:36:39] <jinxer-wm>	 RESOLVED: [2x] CoreBGPDown: Core BGP session down between cr2-eqdfw and cr2-magru (195.200.68.153) - group Confed_magru - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://grafana.wikimedia.org/d/ed8da087-4bcb-407d-9596-d158b8145d45/bgp-neighbors-detail?orgId=1&var-site=codfw&var-device=cr2-eqdfw:9804&var-bgp_group=Confed_magru&var-bgp_neighbor=cr2-magru - https://alerts.wikimedia.org/?q=alertname%3DCoreBGPDown
[02:55:11] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[03:20:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:25:53] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[03:30:11] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[03:51:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:01:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[04:33:10] <jinxer-wm>	 FIRING: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:38:10] <jinxer-wm>	 RESOLVED: BFDdown: BFD session down between cr2-magru and fe80::ee38:73ff:fee8:9c58 - https://wikitech.wikimedia.org/wiki/Network_monitoring#BFD_status - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-magru:9804 - https://alerts.wikimedia.org/?q=alertname%3DBFDdown
[04:40:11] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[04:56:12] <icinga-wm>	 PROBLEM - snapshot of s3 in eqiad on backupmon1001 is CRITICAL: Last snapshot for s3 at eqiad (db1150) taken on 2025-12-04 04:06:37 is 872 GiB, but the previous one was 1145 GiB, a change of -23.8 % https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Rerun_a_failed_backup
[05:10:00] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:18:37] <wikibugs>	 (03PS1) 10Jdlrobson: Filter another client adding noise [puppet] - 10https://gerrit.wikimedia.org/r/1214759
[05:20:11] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[05:35:00] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:35:11] <jinxer-wm>	 FIRING: [8x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[05:51:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[05:56:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[05:58:20] <wikibugs>	 (03PS1) 10Marostegui: installserver: Add UEFI recipe to future clouddb* [puppet] - 10https://gerrit.wikimedia.org/r/1214777
[06:00:00] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06cloud-services-team (Hardware): Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#11431559 (10Marostegui)
[06:08:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1229 crashed - Broken memory module at B7 - https://phabricator.wikimedia.org/T411652#11431564 (10Marostegui) p:05Triage→03Medium
[06:12:17] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs1014:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:25:38] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] service::catalog: add 'team' attribute [puppet] - 10https://gerrit.wikimedia.org/r/1214473 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[06:27:19] <wikibugs>	 06SRE, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), 13Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11431570 (10RKemper) an-worker* partially done. made https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214664 to al...
[06:28:25] <wikibugs>	 06SRE, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), 13Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11431572 (10RKemper) Oh, with respect to the patch, we should also get https://gerrit.wikimedia.org/r/c/operations/cookboo...
[06:30:08] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+1] sre: multi-team ProbeDown [alerts] - 10https://gerrit.wikimedia.org/r/1214478 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[06:53:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[06:55:11] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[06:57:17] <jinxer-wm>	 FIRING: [5x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[06:59:02] <jinxer-wm>	 FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T0700)
[07:00:05] <jouncebot>	 marostegui, Amir1, and federico3: #bothumor I � Unicode. All rise for Primary database switchover deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T0700).
[07:02:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[07:03:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[07:08:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[07:09:02] <jinxer-wm>	 FIRING: [3x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:13:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[07:14:02] <jinxer-wm>	 FIRING: [7x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:19:02] <jinxer-wm>	 FIRING: [9x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[07:25:00] <jinxer-wm>	 FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[07:27:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove obsolete Hiera entries [puppet] - 10https://gerrit.wikimedia.org/r/1214556 (https://phabricator.wikimedia.org/T311407) (owner: 10Muehlenhoff)
[07:30:11] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[07:35:56] <wikibugs>	 (03PS1) 10Arthur taylor: Enable the MEX / wbui2025 beta feature on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015)
[07:36:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good and verified out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1214665 (https://phabricator.wikimedia.org/T411730) (owner: 10Brennen Bearnes)
[07:36:43] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] admin: add fido backed ssh key for brennen [puppet] - 10https://gerrit.wikimedia.org/r/1214665 (https://phabricator.wikimedia.org/T411730) (owner: 10Brennen Bearnes)
[07:40:32] <wikibugs>	 (03PS2) 10Arthur taylor: Enable the MEX / wbui2025 beta feature on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015)
[07:43:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2012:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[07:46:15] <wikibugs>	 06SRE, 07SRE-Unowned, 10Maps, 13Patch-For-Review: Setup a maps staging DB - https://phabricator.wikimedia.org/T409528#11431623 (10MoritzMuehlenhoff) The initial imposm catchup sync after the PBF import has just completed.
[07:48:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2012:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[07:51:48] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: kubernetes::deployment_server: add files for configuring conftool [puppet] - 10https://gerrit.wikimedia.org/r/1214524
[07:55:58] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2007:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[08:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: OwO what's this, a deployment window?? UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T0800). nyaa~
[08:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[08:02:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:07:17] <jinxer-wm>	 FIRING: [16x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:09:02] <jinxer-wm>	 FIRING: [9x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[08:09:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove platform-engineering POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215058
[08:10:00] <jinxer-wm>	 RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[08:10:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[08:10:54] <wikibugs>	 (03PS1) 10Gehel: Druid: open firewall access to Druid from the FRTech network [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740)
[08:10:57] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2007:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[08:12:17] <jinxer-wm>	 FIRING: [20x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:13:21] <wikibugs>	 (03PS1) 10Zoranzoki21: Add Serbian Latin draft namespace and talk namespace aliases [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750)
[08:14:53] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove piwik-roots POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215061
[08:14:54] <Kizule>	 Hi, is someone here to deploy one mediawiki-config patch, as the backport window is ongoing right now?
[08:14:59] <Kizule>	 Or I should add it for the next one?
[08:16:20] <Kizule>	 urbanecm?
[08:16:24] <Kizule>	 urbanecm: ?
[08:16:37] <wikibugs>	 (03PS2) 10Gehel: Druid: open firewall access to Druid from the FRTech network [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740)
[08:16:58] <Kizule>	 Tag is not working for some reason.. Nevermind, I'll add it for the next deployment window..
[08:17:17] <jinxer-wm>	 FIRING: [20x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:17:21] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750) (owner: 10Zoranzoki21)
[08:17:54] <wikibugs>	 (03CR) 10Gehel: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740) (owner: 10Gehel)
[08:18:28] <wikibugs>	 (03PS1) 10Slyngshede: data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063
[08:19:02] <jinxer-wm>	 FIRING: [9x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[08:19:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063 (owner: 10Slyngshede)
[08:21:38] <wikibugs>	 (03CR) 10Jelto: "I was not aware of Ie50e2f89b0dddd62e7206dff185545e0242fa6a5, we can use your patch and add the ssh service later on." [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[08:21:44] <wikibugs>	 (03Abandoned) 10Jelto: service::catalog: add gerrit-https and gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[08:22:17] <jinxer-wm>	 FIRING: [20x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:22:55] <wikibugs>	 (03PS2) 10Slyngshede: data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063
[08:23:29] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove notebook-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215065
[08:23:39] <wikibugs>	 (03CR) 10CI reject: [V:04-1] data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063 (owner: 10Slyngshede)
[08:23:53] <wikibugs>	 (03PS3) 10Gehel: Druid: open firewall access to Druid from the FRTech network [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740)
[08:24:02] <jinxer-wm>	 FIRING: [9x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[08:24:11] <wikibugs>	 (03CR) 10Gehel: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740) (owner: 10Gehel)
[08:25:36] <wikibugs>	 (03PS3) 10Slyngshede: data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063
[08:27:02] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067
[08:27:16] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Phabricator: Allow users to link Phabricator and developer accounts [software/bitu] - 10https://gerrit.wikimedia.org/r/1196919 (https://phabricator.wikimedia.org/T406495) (owner: 10Slyngshede)
[08:27:17] <wikibugs>	 (03CR) 10Jelto: service: add gerrit-https service to service catalog (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1202842 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[08:27:49] <jinxer-wm>	 FIRING: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs2021:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[08:28:09] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove platform-engineering POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215058 (owner: 10Muehlenhoff)
[08:28:34] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove piwik-roots POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215061 (owner: 10Muehlenhoff)
[08:30:18] <wikibugs>	 (03Merged) 10jenkins-bot: Phabricator: Allow users to link Phabricator and developer accounts [software/bitu] - 10https://gerrit.wikimedia.org/r/1196919 (https://phabricator.wikimedia.org/T406495) (owner: 10Slyngshede)
[08:30:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[08:30:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[08:31:10] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove eventbus-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215068
[08:32:17] <jinxer-wm>	 FIRING: [18x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:32:43] <jinxer-wm>	 RESOLVED: BlazegraphFreeAllocatorsDecreasingRapidly: Blazegraph instance wdqs2021:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[08:33:57] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:34:02] <jinxer-wm>	 FIRING: [7x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[08:34:54] <icinga-wm>	 PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:35:00] <jinxer-wm>	 FIRING: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:35:16] <tappof>	 ^^ looking at thanos
[08:35:38] <hnowlan>	 thanks 
[08:35:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/1215063 (owner: 10Slyngshede)
[08:35:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069
[08:36:20] <jayme>	 hnowlan: o/ sup
[08:36:52] <hnowlan>	 yoyo
[08:37:10] <jayme>	 hmm, looks like my bouncer is broken not replaying messages :/
[08:37:25] <jayme>	 was the p.age just a fart?
[08:38:06] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove platform-engineering POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215058 (owner: 10Muehlenhoff)
[08:38:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:38:59] <hnowlan>	 not sure, still looking -  definitely see a big increase in most metrics on the thanos hosts 
[08:39:15] <hnowlan>	 so kinda looks like a big query? 
[08:39:43] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067 (owner: 10Muehlenhoff)
[08:39:57] <tappof>	 I've depooled titan2001 
[08:40:00] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:40:11] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove eventbus-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215068 (owner: 10Muehlenhoff)
[08:40:11] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[08:40:36] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] data.yaml: Offboarding arinaigum [puppet] - 10https://gerrit.wikimedia.org/r/1215063 (owner: 10Slyngshede)
[08:40:50] <tappof>	 The disk is probably full due to the compactor.
[08:41:01] <jayme>	 tappof: would that cause extra load on thanos hosts?
[08:41:06] <hnowlan>	 tappof: fwiw it looks like the issue was in eqiad 
[08:41:29] <hnowlan>	 https://grafana.wikimedia.org/goto/XDjaRsWvR?orgId=1
[08:41:34] <jayme>	 yeah, right
[08:41:47] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069 (owner: 10Muehlenhoff)
[08:42:02] <wikibugs>	 (03PS1) 10Muehlenhoff: Stop applying the os-installers group on cumin* and cloudcumin* nodes [puppet] - 10https://gerrit.wikimedia.org/r/1215073 (https://phabricator.wikimedia.org/T358361)
[08:42:17] <jinxer-wm>	 FIRING: [14x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[08:43:54] <tappof>	 Well, I need to check, but IIRC, the Query Frontend is spreading queries across all the instances, including titan2001, which has its /srv partition 100% full. I'm checking..
[08:44:02] <jinxer-wm>	 FIRING: [4x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[08:44:13] <hnowlan>	 tappof: ah, okay
[08:44:27] <jayme>	 but cross-dc?
[08:44:54] <icinga-wm>	 RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:46:10] <hnowlan>	 either way that disk free is worrying yeah
[08:47:04] <hnowlan>	 given that the page has resolved I'm going afk for a little bit, but I'm nearby so message if needed 
[08:47:12] <jayme>	 ack
[08:54:29] <wikibugs>	 (03CR) 10Arnaudb: "yes both are useful, I will not merge 1211551 until we are successfully switched over and this one will be merged right before" [puppet] - 10https://gerrit.wikimedia.org/r/1196792 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[08:55:44] <wikibugs>	 (03CR) 10Arnaudb: "indeed! I'll add that to our team meeting agenda. thanks for raising that concern!" [puppet] - 10https://gerrit.wikimedia.org/r/1211549 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb)
[08:58:29] <hashar>	 I am looking at the backend error logs before running the train
[08:58:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[09:00:04] <jouncebot>	 hashar and jnuche: Deploy window MediaWiki train - Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T0900)
[09:02:43] <wikibugs>	 (03CR) 10Hashar: Followup I81a2c4de77: Verify stats label values are not empty (031 comment) [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1214647 (https://phabricator.wikimedia.org/T411585) (owner: 10Jforrester)
[09:04:05] <wikibugs>	 (03PS1) 10Hashar: REST: add explicit cast to sitemapSize calcuation to avoid warning [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215078 (https://phabricator.wikimedia.org/T411580)
[09:04:49] <wikibugs>	 (03CR) 10Hashar: [C:03+2] "That is trivial enough for a deployment and will cut log spam 😊" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215078 (https://phabricator.wikimedia.org/T411580) (owner: 10Hashar)
[09:04:58] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Followup I81a2c4de77: Verify stats label values are not empty [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1214647 (https://phabricator.wikimedia.org/T411585) (owner: 10Jforrester)
[09:05:19] <hashar>	 some backports to cut on the log spam
[09:06:04] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by hashar@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215078 (https://phabricator.wikimedia.org/T411580) (owner: 10Hashar)
[09:06:05] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by hashar@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1214647 (https://phabricator.wikimedia.org/T411585) (owner: 10Jforrester)
[09:06:13] <hashar>	 hmm
[09:06:25] <hashar>	 I guess it does not matter to have multiple CR+2
[09:06:43] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Remove notebook-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215065 (owner: 10Muehlenhoff)
[09:08:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067 (owner: 10Muehlenhoff)
[09:09:35] <wikibugs>	 (03CR) 10Majavah: [C:03+2] openstack: puppet: Remove support for X-Enc-Edit-Git [puppet] - 10https://gerrit.wikimedia.org/r/1214490 (owner: 10Majavah)
[09:10:51] <wikibugs>	 (03PS1) 10Brouberol: growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752)
[09:12:08] <wikibugs>	 (03PS1) 10Brouberol: growthbook: setup OIDC for both the production and next instance [puppet] - 10https://gerrit.wikimedia.org/r/1215082 (https://phabricator.wikimedia.org/T411752)
[09:13:46] <wikibugs>	 (03CR) 10Jelto: [C:03+1] gerrit: re-enable backups on gerrit2003 [puppet] - 10https://gerrit.wikimedia.org/r/1211551 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[09:15:12] <wikibugs>	 (03PS1) 10Brouberol: growthbook: grant frontend access to the IDP servers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215083 (https://phabricator.wikimedia.org/T411752)
[09:15:50] <wikibugs>	 (03Abandoned) 10Slyngshede: C:tomcat10 hide stacktrace and server info [puppet] - 10https://gerrit.wikimedia.org/r/1207874 (owner: 10Slyngshede)
[09:16:05] <tappof>	 > but cross-dc? 
[09:16:25] <tappof>	 Yes jayme hnowlan, for the Ruler component. Anyway, I found a "query of death" that started at 8:15, requesting 45 days of data for 4,400 series.
[09:17:08] <jayme>	 tappof: ok, thanks!
[09:17:49] <wikibugs>	 (03PS2) 10Brouberol: growthbook: grant frontend access to the IDP servers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215083 (https://phabricator.wikimedia.org/T411752)
[09:19:06] <wikibugs>	 (03Merged) 10jenkins-bot: REST: add explicit cast to sitemapSize calcuation to avoid warning [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215078 (https://phabricator.wikimedia.org/T411580) (owner: 10Hashar)
[09:19:12] <wikibugs>	 (03Merged) 10jenkins-bot: Followup I81a2c4de77: Verify stats label values are not empty [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1214647 (https://phabricator.wikimedia.org/T411585) (owner: 10Jforrester)
[09:19:39] <logmsgbot>	 !log slyngshede@cumin1003 DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arinaigum out of all services on: 2419 hosts
[09:20:05] <arnoldokoth>	 !log upgrade envoyproxy on vrts T405808
[09:20:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:20:08] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[09:20:11] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[09:20:30] <logmsgbot>	 !log hashar@deploy2002 Started scap sync-world: Backport for [[gerrit:1215078|REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580)]], [[gerrit:1214647|Followup I81a2c4de77: Verify stats label values are not empty (T411585)]]
[09:20:34] <stashbot>	 T411580: SitemapFileHandler: PHP Deprecated: Implicit conversion from float 33333.333333333336 to int loses precision - https://phabricator.wikimedia.org/T411580
[09:20:35] <stashbot>	 T411585: PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value. - https://phabricator.wikimedia.org/T411585
[09:21:29] <tappof>	 jayme: hnowlan The overlapping alert for disk saturation was just a matter of unlucky timing: I tried depooling titan2001 because I was blind—neither Grafana nor Thanos were working. A few seconds later, it started replying again, so I put the blame on titan2001… but the outage was definitely due to a “query of death” on eqiad.
[09:21:51] <hnowlan>	 tappof: good to know, thanks! 
[09:22:16] <arnoldokoth>	 !log upgrade envoyproxy on lists T405808
[09:22:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:23:00] <logmsgbot>	 !log hashar@deploy2002 jforrester, hashar: Backport for [[gerrit:1215078|REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580)]], [[gerrit:1214647|Followup I81a2c4de77: Verify stats label values are not empty (T411585)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[09:26:19] <logmsgbot>	 !log hashar@deploy2002 jforrester, hashar: Continuing with sync
[09:26:37] <wikibugs>	 (03PS11) 10AOkoth: vrts: add high inode usage alert [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452)
[09:27:06] <wikibugs>	 (03CR) 10AOkoth: vrts: add high inode usage alert (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[09:29:02] <jinxer-wm>	 FIRING: [2x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[09:30:19] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11431822 (10ayounsi) From that comment : T410989#11429115 cloudcephosd1052 still needs to be migrated.  Both interfaces are still doing significant traffic : https://libre...
[09:30:29] <logmsgbot>	 !log hashar@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215078|REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580)]], [[gerrit:1214647|Followup I81a2c4de77: Verify stats label values are not empty (T411585)]] (duration: 09m 59s)
[09:30:33] <stashbot>	 T411580: SitemapFileHandler: PHP Deprecated: Implicit conversion from float 33333.333333333336 to int loses precision - https://phabricator.wikimedia.org/T411580
[09:30:34] <stashbot>	 T411585: PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value. - https://phabricator.wikimedia.org/T411585
[09:30:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[09:31:22] <wikibugs>	 (03CR) 10AOkoth: "Yeah, I think we can." [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[09:31:44] <hashar>	 I am doing the train now
[09:32:07] <wikibugs>	 (03PS1) 10TrainBranchBot: group2 to 1.46.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215085 (https://phabricator.wikimedia.org/T408275)
[09:32:10] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Initiated by hashar@deploy2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215085 (https://phabricator.wikimedia.org/T408275) (owner: 10TrainBranchBot)
[09:32:58] <wikibugs>	 (03Merged) 10jenkins-bot: group2 to 1.46.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215085 (https://phabricator.wikimedia.org/T408275) (owner: 10TrainBranchBot)
[09:33:30] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] Tox: remove old python support [cookbooks] - 10https://gerrit.wikimedia.org/r/1214532 (owner: 10Ayounsi)
[09:33:38] <wikibugs>	 (03CR) 10Ayounsi: [C:03+2] sre.network.tls: add timeout to get_server_certificate [cookbooks] - 10https://gerrit.wikimedia.org/r/1161337 (owner: 10Ayounsi)
[09:35:11] <jinxer-wm>	 FIRING: [8x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[09:35:34] <moritzm>	 !log cleanup lingering sessions of offboarded user T389324
[09:35:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:37] <stashbot>	 T389324: /etc/wikimedia/logout.d/50-systemdlogoutd sometimes fails to terminate user session on stat hosts - https://phabricator.wikimedia.org/T389324
[09:37:04] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[09:37:45] <wikibugs>	 (03PS1) 10Elukey: Move ml-serve1013 to a ML k8s worker [puppet] - 10https://gerrit.wikimedia.org/r/1215088 (https://phabricator.wikimedia.org/T403697)
[09:38:43] <wikibugs>	 (03Merged) 10jenkins-bot: Tox: remove old python support [cookbooks] - 10https://gerrit.wikimedia.org/r/1214532 (owner: 10Ayounsi)
[09:38:54] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069 (owner: 10Muehlenhoff)
[09:39:00] <wikibugs>	 (03Merged) 10jenkins-bot: sre.network.tls: add timeout to get_server_certificate [cookbooks] - 10https://gerrit.wikimedia.org/r/1161337 (owner: 10Ayounsi)
[09:39:21] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove piwik-roots POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215061 (owner: 10Muehlenhoff)
[09:39:33] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove notebook-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215065 (owner: 10Muehlenhoff)
[09:39:47] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067 (owner: 10Muehlenhoff)
[09:40:00] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Remove eventbus-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215068 (owner: 10Muehlenhoff)
[09:43:01] <logmsgbot>	 !log hashar@deploy2002 rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.5  refs T408275
[09:43:05] <stashbot>	 T408275: 1.46.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T408275
[09:43:22] <wikibugs>	 (03CR) 10AOkoth: [C:03+2] vrts: re-enable cache cleanup timer [puppet] - 10https://gerrit.wikimedia.org/r/1214129 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[09:43:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[09:43:54] <wikibugs>	 (03PS1) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215089 (https://phabricator.wikimedia.org/T409528)
[09:44:02] <jinxer-wm>	 RESOLVED: [2x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2010:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[09:46:33] <wikibugs>	 06SRE, 06Infrastructure-Foundations: /etc/wikimedia/logout.d/50-systemdlogoutd sometimes fails to terminate user session on stat hosts - https://phabricator.wikimedia.org/T389324#11431872 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[09:48:56] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] service::catalog: add 'team' attribute [puppet] - 10https://gerrit.wikimedia.org/r/1214473 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[09:48:58] <moritzm>	 !log upgrade Envoy on an-launcher T405808
[09:49:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:49:01] <stashbot>	 T405808: Upgrade Envoy to v1.32.12 - https://phabricator.wikimedia.org/T405808
[09:50:09] <hashar>	 it looks quiet
[09:50:33] <hashar>	 there is some warning raised but that is more or less the same as T411585
[09:50:34] <stashbot>	 T411585: PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value. - https://phabricator.wikimedia.org/T411585
[09:50:43] <hashar>	 PHP Warning: Stats: (action_api_modules_latency) Cannot add labels to a metric containing samples
[09:50:52] <hashar>	 I'll update the task after a coffee break
[09:50:55] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove piwik-roots POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215061
[09:53:33] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove piwik-roots POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215061 (owner: 10Muehlenhoff)
[09:53:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove notebook-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215065
[09:55:33] <wikibugs>	 (03PS1) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[09:57:20] <wikibugs>	 (03Abandoned) 10Elukey: services: enable ingress for Kartotherian [deployment-charts] - 10https://gerrit.wikimedia.org/r/1133389 (https://phabricator.wikimedia.org/T391457) (owner: 10Elukey)
[09:57:47] <wikibugs>	 (03Abandoned) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214526 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[10:00:21] <wikibugs>	 (03CR) 10Klausman: [C:03+1] Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069 (owner: 10Muehlenhoff)
[10:01:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:04:25] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:04:52] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Chatted with Tobias, below my recommendation:" [puppet] - 10https://gerrit.wikimedia.org/r/1214530 (https://phabricator.wikimedia.org/T394778) (owner: 10Klausman)
[10:05:08] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook: setup OIDC for both the production and next instance [puppet] - 10https://gerrit.wikimedia.org/r/1215082 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:05:33] <wikibugs>	 (03CR) 10Btullis: [C:03+1] growthbook: grant frontend access to the IDP servers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215083 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:05:39] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/1214537 (https://phabricator.wikimedia.org/T407959) (owner: 10Ayounsi)
[10:08:47] <wikibugs>	 (03CR) 10JMeybohm: services: add maps-next.w.o as FQDN for kartotherian staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[10:09:25] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:09:27] <wikibugs>	 (03CR) 10Brouberol: [V:03+2 C:03+2] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:09:44] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook: grant frontend access to the IDP servers [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215083 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:09:52] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] growthbook: setup OIDC for both the production and next instance [puppet] - 10https://gerrit.wikimedia.org/r/1215082 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol)
[10:12:54] <wikibugs>	 (03CR) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[10:14:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] sre: multi-team ProbeDown [alerts] - 10https://gerrit.wikimedia.org/r/1214478 (https://phabricator.wikimedia.org/T399807) (owner: 10Filippo Giunchedi)
[10:19:28] <wikibugs>	 (03PS1) 10Jelto: sre.gitlab.upgrade: mask ldap group sync during upgrades [cookbooks] - 10https://gerrit.wikimedia.org/r/1215111 (https://phabricator.wikimedia.org/T411240)
[10:20:49] <wikibugs>	 (03CR) 10Gehel: [C:04-1] "Probably a bad idea given that this would open all of druid." [puppet] - 10https://gerrit.wikimedia.org/r/1215059 (https://phabricator.wikimedia.org/T411740) (owner: 10Gehel)
[10:21:35] <wikibugs>	 (03PS1) 10Bartosz Wójtowicz: ml-services: Deploy experimental CPU-only revise-tone-task-generator. [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215112 (https://phabricator.wikimedia.org/T411758)
[10:22:38] <wikibugs>	 (03CR) 10Muehlenhoff: installserver: Add UEFI recipe to future clouddb* (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214777 (owner: 10Marostegui)
[10:29:28] <wikibugs>	 (03CR) 10Marostegui: installserver: Add UEFI recipe to future clouddb* (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214777 (owner: 10Marostegui)
[10:30:45] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: enable paging for labweb-ssl service and route to wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470)
[10:33:44] <wikibugs>	 (03PS2) 10Marostegui: installserver: Add UEFI recipe to future clouddb* [puppet] - 10https://gerrit.wikimedia.org/r/1214777
[10:34:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/1214777 (owner: 10Marostegui)
[10:35:31] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove notebook-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215065 (owner: 10Muehlenhoff)
[10:35:38] <wikibugs>	 (03CR) 10Marostegui: [C:03+2] installserver: Add UEFI recipe to future clouddb* [puppet] - 10https://gerrit.wikimedia.org/r/1214777 (owner: 10Marostegui)
[10:36:21] <wikibugs>	 (03PS1) 10Isabelle Hurbain-Palatin: Activate postprocessing cache on testwiki, test2wiki, officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255)
[10:37:21] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Late, but thanks." [puppet] - 10https://gerrit.wikimedia.org/r/1215065 (owner: 10Muehlenhoff)
[10:39:47] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067
[10:41:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:42:13] <wikibugs>	 (03PS3) 10Tchanders: Enable temporary accounts on enwikinews and ptwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214489 (https://phabricator.wikimedia.org/T411618)
[10:43:30] <wikibugs>	 (03CR) 10Tiziano Fogli: [C:03+2] Blackbox/check: strengthen suffix matching regex in generated rules [puppet] - 10https://gerrit.wikimedia.org/r/1208365 (https://phabricator.wikimedia.org/T410745) (owner: 10Tiziano Fogli)
[10:44:13] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Stop applying the os-installers group on cumin* and cloudcumin* nodes [puppet] - 10https://gerrit.wikimedia.org/r/1215073 (https://phabricator.wikimedia.org/T358361) (owner: 10Muehlenhoff)
[10:45:20] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214489 (https://phabricator.wikimedia.org/T411618) (owner: 10Tchanders)
[10:47:04] <wikibugs>	 (03CR) 10FNegri: "I have never seen this file before, where is it parsed? I see that the "team:" annotation is not used for any other service, is it going t" [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[10:49:16] <wikibugs>	 (03CR) 10STran: [C:03+1] Enable temporary accounts on enwikinews and ptwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214489 (https://phabricator.wikimedia.org/T411618) (owner: 10Tchanders)
[10:51:13] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11432052 (10fgiunchedi) I took a look at why cloudcephosd1052 still has second nic up, currently:  ` 4: ens1f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq stat...
[10:51:52] <wikibugs>	 (03PS1) 10Federico Ceratto: clone.py: Accept both hostname and FQDN [cookbooks] - 10https://gerrit.wikimedia.org/r/1215116 (https://phabricator.wikimedia.org/T391581)
[10:54:51] <wikibugs>	 (03CR) 10Marostegui: "Can you test it with some hosts?" [cookbooks] - 10https://gerrit.wikimedia.org/r/1215116 (https://phabricator.wikimedia.org/T391581) (owner: 10Federico Ceratto)
[10:55:11] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[10:56:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[10:59:13] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
[11:00:04] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1100)
[11:00:26] <logmsgbot>	 !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
[11:01:37] <wikibugs>	 (03PS1) 10Vgutierrez: cache::haproxy: Get rid of http-request after use_backend warning [puppet] - 10https://gerrit.wikimedia.org/r/1215119
[11:01:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:02:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove labnet-users POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215067 (owner: 10Muehlenhoff)
[11:02:31] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove eventbus-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215068
[11:04:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove eventbus-admins POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215068 (owner: 10Muehlenhoff)
[11:04:46] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069
[11:05:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Remove gpu-testers POSIX group [puppet] - 10https://gerrit.wikimedia.org/r/1215069 (owner: 10Muehlenhoff)
[11:06:09] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "lgtm, small question inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/1215111 (https://phabricator.wikimedia.org/T411240) (owner: 10Jelto)
[11:07:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Stop applying the os-installers group on cumin* and cloudcumin* nodes [puppet] - 10https://gerrit.wikimedia.org/r/1215073 (https://phabricator.wikimedia.org/T358361) (owner: 10Muehlenhoff)
[11:10:00] <wikibugs>	 (03CR) 10Arnaudb: [C:03+1] "lgtm" [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[11:10:50] <wikibugs>	 (03CR) 10Jelto: sre.gitlab.upgrade: mask ldap group sync during upgrades (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1215111 (https://phabricator.wikimedia.org/T411240) (owner: 10Jelto)
[11:14:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: "service::catalog is used in various bits of the infra to configure e.g. the load balancers and alerting. It is going to work as per https:" [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:14:49] <wikibugs>	 (03CR) 10Isabelle Hurbain-Palatin: [C:04-2] "OR: let's not do that just yet, I think there's a bug in the previous patch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[11:17:41] <wikibugs>	 (03PS1) 10Hashar: Add banner for the 2025 developer survey [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215120
[11:20:25] <wikibugs>	 (03CR) 10Majavah: [C:03+1] "lgtm, but see inline" [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:20:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[11:21:44] <moritzm>	 !log rebuild software RAIDs on T410743
[11:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:49] <stashbot>	 T410743: Degraded RAID on ganeti1039 - https://phabricator.wikimedia.org/T410743
[11:24:58] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Drop use of MW_APPSERVER_NETWORKS for ircstream now that mw* servers are gone [puppet] - 10https://gerrit.wikimedia.org/r/1214094 (https://phabricator.wikimedia.org/T411508) (owner: 10Muehlenhoff)
[11:26:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: hieradata: enable paging for labweb-ssl service and route to wmcs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:26:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[11:27:15] <wikibugs>	 (03PS1) 10FNegri: P:toolforge:prometheus: scrape mariadb metrics [puppet] - 10https://gerrit.wikimedia.org/r/1215121 (https://phabricator.wikimedia.org/T410505)
[11:27:17] <jinxer-wm>	 FIRING: [21x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:27:31] <wikibugs>	 06SRE, 10observability: thanos-store OOMing on titan eqiad - https://phabricator.wikimedia.org/T411343#11432122 (10hnowlan) I think the worst of this trend has been reversed by the revert of setting cutoff days to 1: https://grafana.wikimedia.org/goto/rwrkdsWDg?orgId=1  {F70853481}
[11:29:30] <wikibugs>	 (03PS2) 10Vgutierrez: cache::haproxy: Get rid of http-request after use_backend warning [puppet] - 10https://gerrit.wikimedia.org/r/1215119
[11:30:11] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[11:30:47] <wikibugs>	 (03CR) 10Clément Goubert: "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214633 (https://phabricator.wikimedia.org/T410379) (owner: 10Daniel Kinzler)
[11:30:55] <wikibugs>	 (03CR) 10Majavah: [C:03+1] hieradata: enable paging for labweb-ssl service and route to wmcs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:31:31] <moritzm>	 !log installing net-snmp security updates
[11:31:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:43] <jinxer-wm>	 FIRING: [2x] ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[11:32:02] <jinxer-wm>	 FIRING: [4x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[11:32:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:33:35] <wikibugs>	 (03Abandoned) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215089 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[11:37:02] <jinxer-wm>	 FIRING: [10x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[11:37:17] <jinxer-wm>	 FIRING: [23x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:37:22] <wikibugs>	 (03CR) 10Brouberol: [C:03+2] Update documentation for rdf_functions.sh path in dumpwikibaserdf.sh [dumps] - 10https://gerrit.wikimedia.org/r/1204598 (https://phabricator.wikimedia.org/T408800) (owner: 10Itamar Givon)
[11:42:02] <jinxer-wm>	 FIRING: [13x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[11:42:17] <jinxer-wm>	 FIRING: [26x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:45:00] <jinxer-wm>	 FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[11:47:17] <jinxer-wm>	 FIRING: [26x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:50:22] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[11:52:02] <jinxer-wm>	 FIRING: [13x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[11:52:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: hieradata: enable paging for labweb-ssl service and route to wmcs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:53:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] hieradata: enable paging for labweb-ssl service and route to wmcs [puppet] - 10https://gerrit.wikimedia.org/r/1215114 (https://phabricator.wikimedia.org/T411470) (owner: 10Filippo Giunchedi)
[11:56:43] <jinxer-wm>	 FIRING: [2x] ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[11:57:02] <jinxer-wm>	 FIRING: [14x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[11:57:17] <jinxer-wm>	 FIRING: [24x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[11:57:56] <wikibugs>	 06SRE, 06DC-Ops: Remove second network connection for cloudcephosd hosts with single uplink - https://phabricator.wikimedia.org/T410989#11432247 (10cmooney) 05Resolved→03Open Thanks @VRiley-WMF.  I'm gonna re-open this as we still have to deal with  cloudcephosd1052.
[11:58:56] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Cloudcephosd: migrate to single network uplink - https://phabricator.wikimedia.org/T399180#11432253 (10cmooney) >>! In T399180#11432052, @fgiunchedi wrote: > I think the easiest would be to: >  > * Remove the spurious `enp13s0f1np1` config, run puppet to verify...
[12:00:01] <jinxer-wm>	 RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[12:01:38] <wikibugs>	 (03CR) 10Jelto: [C:03+2] sre.gitlab.upgrade: mask ldap group sync during upgrades [cookbooks] - 10https://gerrit.wikimedia.org/r/1215111 (https://phabricator.wikimedia.org/T411240) (owner: 10Jelto)
[12:01:43] <jinxer-wm>	 RESOLVED: [2x] ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[12:02:02] <jinxer-wm>	 FIRING: [14x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[12:02:12] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:02:12] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:02:13] <wikibugs>	 06SRE, 10SRE-Access-Requests: Yubikey-SSH-FIDO for Hugh Nowlan (hnowlan) - https://phabricator.wikimedia.org/T411365#11432264 (10hnowlan) This is resolved, thank you!
[12:02:17] <jinxer-wm>	 FIRING: [23x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:02:25] <wikibugs>	 06SRE, 10SRE-Access-Requests: Yubikey-SSH-FIDO for Hugh Nowlan (hnowlan) - https://phabricator.wikimedia.org/T411365#11432265 (10hnowlan) 05Open→03Resolved a:03andrea.denisse
[12:07:26] <wikibugs>	 (03Merged) 10jenkins-bot: sre.gitlab.upgrade: mask ldap group sync during upgrades [cookbooks] - 10https://gerrit.wikimedia.org/r/1215111 (https://phabricator.wikimedia.org/T411240) (owner: 10Jelto)
[12:09:12] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 03 Feb 2026 07:30:03 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:12:02] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55267 bytes in 0.073 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:12:02] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.165 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:12:02] <jinxer-wm>	 FIRING: [14x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[12:15:01] <jinxer-wm>	 FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[12:16:13] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "The config change looks good to me, but IIUC Product should confirm that we’re ready for deployment before this is deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214986 (https://phabricator.wikimedia.org/T403015) (owner: 10Arthur taylor)
[12:17:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:20:56] <wikibugs>	 (03PS1) 10Vgutierrez: admin: Add backup FIDO key for vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/1215134
[12:22:17] <jinxer-wm>	 FIRING: [23x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:22:29] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] releases: delete now pointless classes for deprecated user groups [puppet] - 10https://gerrit.wikimedia.org/r/1214612 (owner: 10Dzahn)
[12:23:31] <wikibugs>	 (03PS2) 10Vgutierrez: admin: Add backup FIDO key for vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/1215134
[12:30:46] <wikibugs>	 (03PS6) 10Slyngshede: C:mtail update trafficserver_backend_requests_seconds [puppet] - 10https://gerrit.wikimedia.org/r/1214531 (https://phabricator.wikimedia.org/T411584)
[12:30:51] <wikibugs>	 (03CR) 10Slyngshede: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/1214531 (https://phabricator.wikimedia.org/T411584) (owner: 10Slyngshede)
[12:31:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good and verified out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1215134 (owner: 10Vgutierrez)
[12:39:20] <wikibugs>	 (03PS1) 10Gehel: query_service: only alert when individual servers are down for > 2h [puppet] - 10https://gerrit.wikimedia.org/r/1215144 (https://phabricator.wikimedia.org/T411772)
[12:40:11] <jinxer-wm>	 FIRING: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[12:40:19] <wikibugs>	 (03CR) 10Klausman: [C:03+1] Move ml-serve1013 to a ML k8s worker [puppet] - 10https://gerrit.wikimedia.org/r/1215088 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey)
[12:42:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:43:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[12:45:36] <moritzm>	 !log installing postgresql-15 security updates
[12:45:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:45:54] <wikibugs>	 (03PS7) 10Slyngshede: C:mtail update trafficserver_backend_requests_seconds [puppet] - 10https://gerrit.wikimedia.org/r/1214531 (https://phabricator.wikimedia.org/T411584)
[12:48:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] C:mtail update trafficserver_backend_requests_seconds [puppet] - 10https://gerrit.wikimedia.org/r/1214531 (https://phabricator.wikimedia.org/T411584) (owner: 10Slyngshede)
[12:50:01] <jinxer-wm>	 RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[12:51:22] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[12:52:14] <wikibugs>	 (03PS7) 10Dpogorzelski: ml-build: define new machine name/type [puppet] - 10https://gerrit.wikimedia.org/r/1213972 (https://phabricator.wikimedia.org/T394778)
[12:53:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[12:57:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:57:42] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to ops-limited for JavierMonton - https://phabricator.wikimedia.org/T411774 (10JMonton-WMF) 03NEW
[13:00:04] <jouncebot>	 Deploy window Mobileapps/RESTBase/Wikifeeds (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1300)
[13:00:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[13:02:12] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:02:12] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:02:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:05:00] <jinxer-wm>	 FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[13:07:42] <moritzm>	 !log installing waitress security updates
[13:07:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:28] <wikibugs>	 (03CR) 10Daniel Kinzler: [C:03+2] rest gateway: use the new x-trusted-request header [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214633 (https://phabricator.wikimedia.org/T410379) (owner: 10Daniel Kinzler)
[13:12:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:13:10] <wikibugs>	 (03Merged) 10jenkins-bot: rest gateway: use the new x-trusted-request header [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214633 (https://phabricator.wikimedia.org/T410379) (owner: 10Daniel Kinzler)
[13:13:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:14:30] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:14:30] <wikibugs>	 (03PS1) 10JavierMonton: topic: ops-limited access [puppet] - 10https://gerrit.wikimedia.org/r/1215152 (https://phabricator.wikimedia.org/T411774)
[13:15:12] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 03 Feb 2026 07:30:03 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:15:19] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[13:15:54] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] START helmfile.d/services/rest-gateway: apply
[13:15:55] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to ops-limited for JavierMonton - https://phabricator.wikimedia.org/T411774#11432492 (10JMonton-WMF) In case this is approved, I created the patch I believe is needed, to help with the process.  https://gerrit.wikimedia.org/r/c/operations/pu...
[13:16:34] <logmsgbot>	 !log daniel@deploy2002 helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
[13:17:02] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55267 bytes in 0.082 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:17:02] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.195 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:18:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:18:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[13:19:01] <logmsgbot>	 !log daniel@deploy2002 helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
[13:19:25] <logmsgbot>	 !log daniel@deploy2002 helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
[13:20:11] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[13:21:45] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1229 crashed - Broken memory module at B7 - https://phabricator.wikimedia.org/T411652#11432505 (10Jclark-ctr) Memory should be delivered today.  Is this server still in service or can it be replaced any time?
[13:21:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[13:22:13] <logmsgbot>	 !log daniel@deploy2002 helmfile [codfw] START helmfile.d/services/rest-gateway: apply
[13:22:36] <logmsgbot>	 !log daniel@deploy2002 helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
[13:24:00] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] admin: Add backup FIDO key for vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/1215134 (owner: 10Vgutierrez)
[13:25:39] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86402 and previous config saved to /var/cache/conftool/dbconfig/20251204-132539-ladsgroup.json
[13:25:43] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[13:27:02] <jinxer-wm>	 FIRING: [11x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1015:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:28:54] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to ops-limited for JavierMonton - https://phabricator.wikimedia.org/T411774#11432534 (10MoritzMuehlenhoff) ops-limited is very broad access, it grants access to any of our 2400 server, including some very sensitive ones. But if this access i...
[13:29:08] <wikibugs>	 (03CR) 10Majavah: [C:03+1] P:toolforge:prometheus: scrape mariadb metrics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215121 (https://phabricator.wikimedia.org/T410505) (owner: 10FNegri)
[13:31:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1229 crashed - Broken memory module at B7 - https://phabricator.wikimedia.org/T411652#11432550 (10MoritzMuehlenhoff) It's depooled and monitoring disabled, you can replace any time
[13:32:02] <jinxer-wm>	 FIRING: [13x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:32:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:33:47] <wikibugs>	 (03PS1) 10Clément Goubert: mediawiki: Keep cronjobs for a week after completion [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215155
[13:35:11] <jinxer-wm>	 FIRING: [8x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[13:35:36] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to ops-limited for JavierMonton - https://phabricator.wikimedia.org/T411774#11432568 (10BTullis) >>! In T411774#11432534, @MoritzMuehlenhoff wrote: > ops-limited is very broad access, it grants access to any of our 2400 server, including som...
[13:37:02] <jinxer-wm>	 FIRING: [14x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:37:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:39:04] <wikibugs>	 (03PS1) 10Btullis: Add a growthbook system user and grant it access to private data [puppet] - 10https://gerrit.wikimedia.org/r/1215156 (https://phabricator.wikimedia.org/T406593)
[13:40:47] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86403 and previous config saved to /var/cache/conftool/dbconfig/20251204-134046-ladsgroup.json
[13:41:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Create a new access group for access to Jumbo Kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/1215157 (https://phabricator.wikimedia.org/T411774)
[13:41:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:42:02] <jinxer-wm>	 FIRING: [14x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:44:39] <wikibugs>	 (03PS2) 10Muehlenhoff: Create a new access group for access to Jumbo Kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/1215157 (https://phabricator.wikimedia.org/T411774)
[13:45:35] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215157 (https://phabricator.wikimedia.org/T411774) (owner: 10Muehlenhoff)
[13:48:38] <wikibugs>	 (03CR) 10Bearloga: [C:03+1] Add a growthbook system user and grant it access to private data [puppet] - 10https://gerrit.wikimedia.org/r/1215156 (https://phabricator.wikimedia.org/T406593) (owner: 10Btullis)
[13:48:45] <wikibugs>	 (03PS2) 10FNegri: P:toolforge:prometheus: scrape mariadb metrics [puppet] - 10https://gerrit.wikimedia.org/r/1215121 (https://phabricator.wikimedia.org/T410505)
[13:49:22] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[13:51:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[13:52:02] <jinxer-wm>	 FIRING: [13x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs1011:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[13:52:32] <wikibugs>	 (03CR) 10FNegri: P:toolforge:prometheus: scrape mariadb metrics (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215121 (https://phabricator.wikimedia.org/T410505) (owner: 10FNegri)
[13:53:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[13:55:28] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack link to asw2-c2-eqiad xe-2/0/13 - https://phabricator.wikimedia.org/T411781 (10cmooney) 03NEW p:05Triage→03Medium
[13:55:55] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86404 and previous config saved to /var/cache/conftool/dbconfig/20251204-135554-ladsgroup.json
[13:56:05] <wikibugs>	 (03PS1) 10Elukey: service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807)
[13:57:47] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Move ml-serve1013 to a ML k8s worker [puppet] - 10https://gerrit.wikimedia.org/r/1215088 (https://phabricator.wikimedia.org/T403697) (owner: 10Elukey)
[14:00:05] <jouncebot>	 Lucas_WMDE, Urbanecm, and TheresNoTime: #bothumor I � Unicode. All rise for UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1400).
[14:00:05] <jouncebot>	 Kizule and Tchanders: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[14:00:14] <Tchanders>	 o/
[14:00:16] <Lucas_WMDE>	 o/
[14:00:30] <Lucas_WMDE>	 I need to run afk, can someone else deploy? otherwise I should be able to later in the window
[14:01:02] <wikibugs>	 (03PS1) 10Jforrester: RevisionStore: Catch ParameterAssertionException too [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215164 (https://phabricator.wikimedia.org/T351953)
[14:01:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:02:03] <Tchanders>	 I'll get started on mine...
[14:02:12] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:02:12] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:02:18] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by tchanders@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214489 (https://phabricator.wikimedia.org/T411618) (owner: 10Tchanders)
[14:03:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack link to asw2-c2-eqiad xe-2/0/13 - https://phabricator.wikimedia.org/T411781#11432684 (10Vgutierrez) the assessment is OK and the link can be removed safely
[14:03:22] <wikibugs>	 (03Merged) 10jenkins-bot: Enable temporary accounts on enwikinews and ptwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214489 (https://phabricator.wikimedia.org/T411618) (owner: 10Tchanders)
[14:03:28] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:03:42] <logmsgbot>	 !log tchanders@deploy2002 Started scap sync-world: Backport for [[gerrit:1214489|Enable temporary accounts on enwikinews and ptwikibooks (T411618)]]
[14:03:45] <stashbot>	 T411618: Deploy Temporary accounts to the two remaining former LQT wikis - https://phabricator.wikimedia.org/T411618
[14:04:06] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[14:05:28] <wikibugs>	 (03CR) 10Bking: [C:03+1] query_service: only alert when individual servers are down for > 2h [puppet] - 10https://gerrit.wikimedia.org/r/1215144 (https://phabricator.wikimedia.org/T411772) (owner: 10Gehel)
[14:05:53] <wikibugs>	 (03PS1) 10D3r1ck01: Revert "User: Log where the data was loaded when CAS update failed" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215165 (https://phabricator.wikimedia.org/T410652)
[14:06:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad - https://phabricator.wikimedia.org/T405609#11432701 (10Jclark-ctr)
[14:06:10] <wikibugs>	 (03PS1) 10D3r1ck01: Revert "User: Log where the data was loaded when CAS update failed" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1215166 (https://phabricator.wikimedia.org/T410652)
[14:06:13] <logmsgbot>	 !log tchanders@deploy2002 tchanders: Backport for [[gerrit:1214489|Enable temporary accounts on enwikinews and ptwikibooks (T411618)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:06:43] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215165 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:06:56] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#depl" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1215166 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:07:02] <Tchanders>	 testing...
[14:07:10] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55267 bytes in 8.261 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:07:10] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 8.369 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:07:14] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 03 Feb 2026 07:30:03 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[14:07:29] <xSavitar>	 o/
[14:07:53] <Tchanders>	 Looks good, continuing..
[14:08:03] <xSavitar>	 Tchanders, please poke me when you're done. I have some backports to deploy.
[14:08:06] <logmsgbot>	 !log tchanders@deploy2002 tchanders: Continuing with sync
[14:08:19] <Tchanders>	 OK, will do!
[14:08:28] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:09:19] <wikibugs>	 (03PS1) 10D3r1ck01: Fetch user object from primary DB (for writes) not replica DB [extensions/EmailAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215167 (https://phabricator.wikimedia.org/T410652)
[14:09:34] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Traffic: lvs1018: decom links to asw2-c2-eqiad and asw2-d7-eqiad - https://phabricator.wikimedia.org/T410661#11432726 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr duplicate to T411781
[14:09:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Traffic: lvs1018: decom links to asw2-c2-eqiad and asw2-d7-eqiad - https://phabricator.wikimedia.org/T410661#11432732 (10cmooney) 05Resolved→03Declined Duplicate task made in error, will use T411781
[14:10:22] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack link to asw2-c2-eqiad xe-2/0/13 - https://phabricator.wikimedia.org/T411781#11432739 (10Jclark-ctr) https://netbox.wikimedia.org/dcim/interfaces/29150/trace/ https://netbox.wikimedia.org/dcim/interfaces/29151...
[14:11:02] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86405 and previous config saved to /var/cache/conftool/dbconfig/20251204-141101-ladsgroup.json
[14:11:05] <stashbot>	 T410589: Optimize all core tables, late 2025 - https://phabricator.wikimedia.org/T410589
[14:11:13] <Amir1>	 I can take over once you're done
[14:11:17] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting a new group allowing shell access to kafka-jumbo servers - with membership for JavierMonton - https://phabricator.wikimedia.org/T411774#11432740 (10BTullis)
[14:11:17] <logmsgbot>	 !log ladsgroup@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
[14:11:25] <logmsgbot>	 !log ladsgroup@cumin1003 dbctl commit (dc=all): 'Depooling db1166 (T410589)', diff saved to https://phabricator.wikimedia.org/P86406 and previous config saved to /var/cache/conftool/dbconfig/20251204-141124-ladsgroup.json
[14:11:50] <jinxer-wm>	 FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[14:13:04] <xSavitar>	 Amir1, ack! Will ping you.
[14:13:28] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:14:08] * Lucas_WMDE back
[14:14:18] <logmsgbot>	 !log tchanders@deploy2002 Finished scap sync-world: Backport for [[gerrit:1214489|Enable temporary accounts on enwikinews and ptwikibooks (T411618)]] (duration: 10m 36s)
[14:14:21] <stashbot>	 T411618: Deploy Temporary accounts to the two remaining former LQT wikis - https://phabricator.wikimedia.org/T411618
[14:14:36] <Tchanders>	 I'm finished - over to you @Amir1
[14:15:13] <xSavitar>	 Oh, seems Amir1 wants to go first?
[14:15:26] <xSavitar>	 Lucas_WMDE o/
[14:16:12] <Lucas_WMDE>	 IIUC Amir1 was volunteering to do the backport for Kizule?
[14:16:37] <Lucas_WMDE>	 who isn’t around yet, so I’d say xSavitar go ahead with your deployments if nobody objects
[14:17:07] * Lucas_WMDE watches logspam-watch be unusually slow to load o_O
[14:17:16] <logmsgbot>	 !log gehel@cumin2002 conftool action : set/pooled=yes; selector: service=druid-public-coordinator
[14:17:30] <logmsgbot>	 !log gehel@cumin2002 conftool action : set/weight=10; selector: service=druid-public-coordinator
[14:17:31] <xSavitar>	 Okay, I'll go ahead, thanks!
[14:17:47] <Lucas_WMDE>	 oh dear, 90k errors in logstash
[14:17:48] <Lucas_WMDE>	 twice
[14:17:50] <Lucas_WMDE>	 in the past 15 minutes
[14:18:09] <Lucas_WMDE>	 and logspam-watch watches 1h by default, so it would see… 720k messages
[14:18:13] <Lucas_WMDE>	 yeah makes sense that that’s slow
[14:18:14] * Lucas_WMDE searches phab
[14:18:56] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by derick@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215165 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:18:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by derick@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1215166 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:18:57] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by derick@deploy2002 using scap backport" [extensions/EmailAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215167 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:19:50] <taavi>	 seems like T411585?
[14:19:50] <stashbot>	 T411585: PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value. - https://phabricator.wikimedia.org/T411585
[14:20:01] <jinxer-wm>	 RESOLVED: [2x] ConfdResourceFailed: confd resource _srv_config-master_pybal_eqiad_druid-public-coordinator.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[14:20:31] <Lucas_WMDE>	 taavi: yup, left a comment there
[14:20:47] <Lucas_WMDE>	 oh, now logspam-watch loaded
[14:21:00] <Lucas_WMDE>	 not that it’ll be very useful if it only refreshes once every five minutes or so
[14:21:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:23:20] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "User: Log where the data was loaded when CAS update failed" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215165 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:23:25] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "User: Log where the data was loaded when CAS update failed" [core] (wmf/1.46.0-wmf.4) - 10https://gerrit.wikimedia.org/r/1215166 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:23:28] <wikibugs>	 (03Merged) 10jenkins-bot: Fetch user object from primary DB (for writes) not replica DB [extensions/EmailAuth] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215167 (https://phabricator.wikimedia.org/T410652) (owner: 10D3r1ck01)
[14:23:50] <logmsgbot>	 !log derick@deploy2002 Started scap sync-world: Backport for [[gerrit:1215165|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215166|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215167|Fetch user object from primary DB (for writes) not replica DB (T410652)]]
[14:23:54] <stashbot>	 T410652: RuntimeException: CAS update failed on user_touched. The version of the user to be saved is older than the current version. - https://phabricator.wikimedia.org/T410652
[14:26:07] <logmsgbot>	 !log derick@deploy2002 d3r1ck01, derick: Backport for [[gerrit:1215165|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215166|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215167|Fetch user object from primary DB (for writes) not replica DB (T410652)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
[14:26:07] <logmsgbot>	 can now be verified there.
[14:27:16] <xSavitar>	 Nothing to test for now. Verifying it works means errors no longer show up in Logstash.
[14:27:27] <logmsgbot>	 !log derick@deploy2002 d3r1ck01, derick: Continuing with sync
[14:27:58] <Lucas_WMDE>	 good luck looking for errors in logstash at the moment :S
[14:28:21] <xSavitar>	 :(
[14:29:12] <xSavitar>	 https://logstash.wikimedia.org/goto/71373cec585746d63094e22d911053b4
[14:29:20] <xSavitar>	 That's what I'm eyeing
[14:32:11] <xSavitar>	 Facing an issue
[14:32:16] * Lucas_WMDE nods
[14:32:18] <Lucas_WMDE>	 oh?
[14:32:20] <xSavitar>	 14:29:10 Waiting 20 seconds for canary traffic...
[14:32:20] <xSavitar>	 14:29:31 Logstash checker Counted 162 error(s) in the last 20 seconds. The threshold is 10.
[14:32:20] <xSavitar>	 14:29:31 Top 3 errors:
[14:32:20] <xSavitar>	 [81 hits] PHP Warning: Stats: (action_api_modules_latency): Stats: (action_api_modules_latency) Cannot associate label keys with label values - Not all initialized labels have an assigned value.
[14:32:20] <xSavitar>	 [80 hits] PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value.
[14:32:20] <xSavitar>	 [1 hits] Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of {limit} seconds was exceeded
[14:32:34] <Lucas_WMDE>	 hm
[14:32:40] <Lucas_WMDE>	 I would suggest retrying once
[14:32:41] <wikibugs>	 (03CR) 10Gehel: [C:03+1] Add Guillaume as appprover for analytics-search-admins [puppet] - 10https://gerrit.wikimedia.org/r/1212061 (https://phabricator.wikimedia.org/T276465) (owner: 10Muehlenhoff)
[14:32:51] <xSavitar>	 Okay
[14:32:54] <Lucas_WMDE>	 the two errors with 80/81 hits are T411585, okay to ignore
[14:32:55] <stashbot>	 T411585: PHP Warning: Stats: (action_api_modules_hit_total): Stats: (action_api_modules_hit_total) Cannot associate label keys with label values - Not all initialized labels have an assigned value. - https://phabricator.wikimedia.org/T411585
[14:33:00] <Lucas_WMDE>	 the 1 hit timeout is a bit more concerning
[14:33:23] <wikibugs>	 (03CR) 10Gehel: [C:03+1] Add Guillaume as approver for two more analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/1212168 (https://phabricator.wikimedia.org/T276465) (owner: 10Muehlenhoff)
[14:33:24] <Lucas_WMDE>	 I think now it’s okay to proceed
[14:33:33] <Lucas_WMDE>	 but I’ll leave a comment on the task that it’s soft-blocking deplyoments
[14:33:41] <xSavitar>	 Okay, thanks!
[14:33:46] <wikibugs>	 (03CR) 10Gehel: [C:03+2] Hive: alert when query rate is too high (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/1207790 (https://phabricator.wikimedia.org/T410528) (owner: 10Gehel)
[14:34:04] <wikibugs>	 (03PS2) 10Gehel: query_service: only alert when individual servers are down for > 2h [puppet] - 10https://gerrit.wikimedia.org/r/1215144 (https://phabricator.wikimedia.org/T411772)
[14:35:56] <wikibugs>	 (03CR) 10Gehel: [C:03+2] query_service: only alert when individual servers are down for > 2h [puppet] - 10https://gerrit.wikimedia.org/r/1215144 (https://phabricator.wikimedia.org/T411772) (owner: 10Gehel)
[14:36:27] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:37:14] <logmsgbot>	 !log derick@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215165|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215166|Revert "User: Log where the data was loaded when CAS update failed" (T410652)]], [[gerrit:1215167|Fetch user object from primary DB (for writes) not replica DB (T410652)]] (duration: 13m 24s)
[14:37:18] <stashbot>	 T410652: RuntimeException: CAS update failed on user_touched. The version of the user to be saved is older than the current version. - https://phabricator.wikimedia.org/T410652
[14:38:05] <xSavitar>	 Amir1, over to you if you want to deploy. Not sure if Kizule is around though.
[14:38:12] <xSavitar>	 Lucas_WMDE, thanks for the assistance.
[14:38:15] <Lucas_WMDE>	 np
[14:38:19] <Amir1>	 thanks
[14:38:23] <Lucas_WMDE>	 yeah they don’t seem to be around AFAICT :/
[14:39:30] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by ladsgroup@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215164 (https://phabricator.wikimedia.org/T351953) (owner: 10Jforrester)
[14:40:24] <Lucas_WMDE>	 it’s a SpiderPig! \o/
[14:40:40] <Amir1>	 :D
[14:41:27] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:44:09] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] mediawiki: Keep cronjobs for a week after completion [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215155 (owner: 10Clément Goubert)
[14:45:35] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad - https://phabricator.wikimedia.org/T405609#11432921 (10Jclark-ctr) 05Open→03Resolved a:05cmooney→03Jclark-ctr
[14:47:05] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki: Keep cronjobs for a week after completion [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215155 (owner: 10Clément Goubert)
[14:49:36] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/services/mw-cron: apply
[14:50:46] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
[14:52:16] <wikibugs>	 (03Merged) 10jenkins-bot: RevisionStore: Catch ParameterAssertionException too [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215164 (https://phabricator.wikimedia.org/T351953) (owner: 10Jforrester)
[14:52:35] <logmsgbot>	 !log ladsgroup@deploy2002 Started scap sync-world: Backport for [[gerrit:1215164|RevisionStore: Catch ParameterAssertionException too (T351953)]]
[14:52:39] <stashbot>	 T351953: Various old revisions are encoded as Windows-1252 rather than UTF-8, causing "RuntimeException: PCRE failure" when viewing them - https://phabricator.wikimedia.org/T351953
[14:53:21] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+2] wikikube-staging: Bump calico memory requests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214125 (owner: 10Clément Goubert)
[14:53:41] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] "The list of wikis where this is being enabled matches the list in T410469" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214570 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm)
[14:53:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[14:54:39] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Backport for [[gerrit:1215164|RevisionStore: Catch ParameterAssertionException too (T351953)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[14:55:04] <logmsgbot>	 !log ladsgroup@deploy2002 jforrester, ladsgroup: Continuing with sync
[14:55:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[14:56:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[14:58:40] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[14:59:24] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[14:59:35] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
[14:59:54] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
[15:01:00] <wikibugs>	 (03Merged) 10jenkins-bot: wikikube-staging: Bump calico memory requests [deployment-charts] - 10https://gerrit.wikimedia.org/r/1214125 (owner: 10Clément Goubert)
[15:01:11] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] START helmfile.d/admin 'apply'.
[15:01:43] <wikibugs>	 (03CR) 10Michael Große: [C:03+1] [Growth] Sort the list of Add Link wikis alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214571 (https://phabricator.wikimedia.org/T410469) (owner: 10Urbanecm)
[15:01:57] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Add Guillaume as appprover for analytics-search-admins [puppet] - 10https://gerrit.wikimedia.org/r/1212061 (https://phabricator.wikimedia.org/T276465) (owner: 10Muehlenhoff)
[15:02:01] <logmsgbot>	 !log ladsgroup@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215164|RevisionStore: Catch ParameterAssertionException too (T351953)]] (duration: 09m 26s)
[15:02:05] <stashbot>	 T351953: Various old revisions are encoded as Windows-1252 rather than UTF-8, causing "RuntimeException: PCRE failure" when viewing them - https://phabricator.wikimedia.org/T351953
[15:02:12] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[15:02:16] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack links to rows A, C and D - https://phabricator.wikimedia.org/T411781#11433019 (10cmooney)
[15:02:32] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] START helmfile.d/admin 'apply'.
[15:02:42] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack links to rows A, C and D - https://phabricator.wikimedia.org/T411781#11433025 (10cmooney)
[15:02:43] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops, 06Traffic: Eqiad row C/D switch refresh: LVS changes to support migration - https://phabricator.wikimedia.org/T405602#11433026 (10cmooney)
[15:03:06] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[15:03:12] <wikibugs>	 06SRE, 10Infrastructure Security, 06Infrastructure-Foundations, 13Patch-For-Review: puppet admin module: Assign approvers to unix groups - https://phabricator.wikimedia.org/T276465#11433027 (10MoritzMuehlenhoff)
[15:03:32] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[15:03:53] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C:03+1] "Not deployed today because nobody showed up for the window, but the change looks good to me and should be okay to deploy some other time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215060 (https://phabricator.wikimedia.org/T411750) (owner: 10Zoranzoki21)
[15:03:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:03:59] <Lucas_WMDE>	 !log UTC afternoon backport+config window done
[15:04:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:56] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[15:06:23] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[15:06:45] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[15:06:50] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[15:08:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack links to rows A, C and D - https://phabricator.wikimedia.org/T411781#11433060 (10cmooney)
[15:08:14] <logmsgbot>	 !log cgoubert@deploy2002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[15:09:39] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host conf1007.eqiad.wmnet
[15:10:00] <jinxer-wm>	 RESOLVED: [8x] CalicoHighMemoryUsage: Calico container calico-node-2rrk2:calico-node is consistently using three times its memory request - https://wikitech.wikimedia.org/wiki/Calico#Resource_Usage  - https://alerts.wikimedia.org/?q=alertname%3DCalicoHighMemoryUsage
[15:10:01] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:10:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch conf1007 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1214557 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:11:41] <wikibugs>	 (03PS1) 10Slyngshede: P:idm configuration for Phabricator linking [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775)
[15:14:53] <wikibugs>	 (03CR) 10Slyngshede: [C:03+1] Create a new access group for access to Jumbo Kafka brokers [puppet] - 10https://gerrit.wikimedia.org/r/1215157 (https://phabricator.wikimedia.org/T411774) (owner: 10Muehlenhoff)
[15:15:26] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1007.eqiad.wmnet
[15:16:28] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7790/co" [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775) (owner: 10Slyngshede)
[15:16:56] <wikibugs>	 10ops-ulsfo, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11433083 (10cmooney) >>! In T408892#11330727, @cmooney wrote: > Additionally for the rebuild we should aim to: >  > # Convert the existing ganeti hosts to rout...
[15:20:29] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host conf1008.eqiad.wmnet
[15:20:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:21:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch conf1008 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1214558 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:21:52] <wikibugs>	 (03PS26) 10Arnaudb: gerrit: rsync logic extraction from failover [cookbooks] - 10https://gerrit.wikimedia.org/r/1214466 (https://phabricator.wikimedia.org/T387833)
[15:21:52] <wikibugs>	 (03CR) 10Arnaudb: "The output of:" [cookbooks] - 10https://gerrit.wikimedia.org/r/1214466 (https://phabricator.wikimedia.org/T387833) (owner: 10Arnaudb)
[15:22:10] <wikibugs>	 (03PS2) 10Slyngshede: P:idm configuration for Phabricator linking [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775)
[15:22:56] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7791/co" [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775) (owner: 10Slyngshede)
[15:24:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2022:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:24:52] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting a new group allowing shell access to kafka-jumbo servers - with membership for JavierMonton - https://phabricator.wikimedia.org/T411774#11433108 (10elukey) @JMonton-WMF Hi! I have used the kafka tools like topic mapper in the past and if not handle...
[15:26:52] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1008.eqiad.wmnet
[15:27:44] <wikibugs>	 (03PS3) 10Slyngshede: P:idm configuration for Phabricator linking [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775)
[15:28:34] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7792/co" [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775) (owner: 10Slyngshede)
[15:28:46] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host conf1009.eqiad.wmnet
[15:29:11] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch conf1009 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1214561 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:30:04] <jouncebot>	 Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1530)
[15:30:11] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[15:30:55] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host dse-k8s-worker2003.codfw.wmnet
[15:32:34] <wikibugs>	 (03CR) 10Scott French: "Thank you for catching this, Valentin!" [puppet] - 10https://gerrit.wikimedia.org/r/1215119 (owner: 10Vgutierrez)
[15:33:12] <wikibugs>	 (03PS4) 10Slyngshede: P:idm configuration for Phabricator linking [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775)
[15:33:29] <wikibugs>	 (03PS2) 10Volans: service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:33:50] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1009.eqiad.wmnet
[15:33:54] <wikibugs>	 (03CR) 10Slyngshede: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7794/co" [puppet] - 10https://gerrit.wikimedia.org/r/1215186 (https://phabricator.wikimedia.org/T411775) (owner: 10Slyngshede)
[15:34:27] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:35:01] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:35:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:35:58] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host dse-k8s-worker2003.codfw.wmnet
[15:36:11] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host conf2004.codfw.wmnet
[15:36:44] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch conf2004 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1214553 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:37:28] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:38:11] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[15:38:16] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[15:39:16] <wikibugs>	 (03CR) 10Thcipriani: [C:04-1] "2024 -> 2025" [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215120 (owner: 10Hashar)
[15:41:01] <wikibugs>	 (03PS2) 10Hashar: Add banner for the 2025 developer survey [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215120
[15:41:11] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf2004.codfw.wmnet
[15:41:28] <wikibugs>	 (03PS3) 10Volans: service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:41:30] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Add banner for the 2025 developer survey (031 comment) [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215120 (owner: 10Hashar)
[15:41:50] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: dse-k8s-worker2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:42:06] <wikibugs>	 (03PS4) 10Volans: service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:42:13] <wikibugs>	 (03Merged) 10jenkins-bot: Add banner for the 2025 developer survey [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215120 (owner: 10Hashar)
[15:42:20] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting a new group allowing shell access to kafka-jumbo servers - with membership for JavierMonton - https://phabricator.wikimedia.org/T411774#11433192 (10JMonton-WMF) Hi @elukey!  We don't need this often to be honest, maybe it's more about being able to...
[15:42:27] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:43:01] <wikibugs>	 (03CR) 10Volans: "@elukey, I took the liberty to mangle a bit the patch, it should pass CI, as for the default value I think empty string is fine to represe" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:43:13] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@774e2ff]: Ease configuration of the motd banner && Add banner for the 2025 developer survey
[15:43:28] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@774e2ff]: Ease configuration of the motd banner && Add banner for the 2025 developer survey (duration: 00m 15s)
[15:44:35] <wikibugs>	 (03CR) 10Elukey: "Really nice thanks, I was checking the CI failures at the moment :)" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:44:55] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.puppet.migrate-host for host conf2005.codfw.wmnet
[15:45:17] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host dse-k8s-worker2003.codfw.wmnet
[15:45:20] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host dse-k8s-worker2003.codfw.wmnet
[15:45:27] <jinxer-wm>	 FIRING: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:45:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Switch conf2005 to Puppet 7 [puppet] - 10https://gerrit.wikimedia.org/r/1214550 (https://phabricator.wikimedia.org/T349619) (owner: 10Muehlenhoff)
[15:46:50] <jinxer-wm>	 FIRING: [2x] KubernetesCalicoDown: dse-k8s-worker2003.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:47:17] <jinxer-wm>	 FIRING: [22x] ProbeDown: Service wdqs1014:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:48:55] <wikibugs>	 (03PS1) 10Hashar: Remove duplicate [DISMISS] button [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215197
[15:49:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Remove duplicate [DISMISS] button [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215197 (owner: 10Hashar)
[15:50:22] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf2005.codfw.wmnet
[15:50:27] <jinxer-wm>	 FIRING: [2x] SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:50:33] <logmsgbot>	 !log dpogorzelski@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-lab1001.eqiad.wmnet with reason: decomission
[15:50:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[15:51:29] <logmsgbot>	 !log dpogorzelski@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-lab1001.eqiad.wmnet with reason: decomission
[15:52:17] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service wdqs2022:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2022:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[15:52:46] <wikibugs>	 (03PS2) 10Hashar: Remove duplicate [DISMISS] button [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215197
[15:53:06] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Remove duplicate [DISMISS] button [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215197 (owner: 10Hashar)
[15:53:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[15:53:55] <wikibugs>	 (03Merged) 10jenkins-bot: Remove duplicate [DISMISS] button [software/gerrit] (deploy/wmf/stable-3.10) - 10https://gerrit.wikimedia.org/r/1215197 (owner: 10Hashar)
[15:54:16] <wikibugs>	 (03CR) 10Volans: [C:03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[15:55:01] <jinxer-wm>	 RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag   - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
[15:55:25] <logmsgbot>	 !log hashar@deploy2002 Started deploy [gerrit/gerrit@121bd1c]: Remove duplicate [DISMISS] button
[15:55:27] <jinxer-wm>	 RESOLVED: SystemdUnitCrashLoop: prometheus-blazegraph-exporter-wdqs-blazegraph.service crashloop on wdqs2021:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop
[15:55:37] <logmsgbot>	 !log hashar@deploy2002 Finished deploy [gerrit/gerrit@121bd1c]: Remove duplicate [DISMISS] button (duration: 00m 11s)
[16:00:05] <jouncebot>	 hashar and jnuche: Deploy window Train log triage (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1600)
[16:02:02] <jinxer-wm>	 FIRING: [10x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[16:05:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[16:07:02] <jinxer-wm>	 FIRING: [10x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[16:10:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[16:10:57] <wikibugs>	 (03PS1) 10Gehel: query_service: relax alerting on WDQS lag [alerts] - 10https://gerrit.wikimedia.org/r/1215201 (https://phabricator.wikimedia.org/T411772)
[16:12:10] <wikibugs>	 (03CR) 10CI reject: [V:04-1] query_service: relax alerting on WDQS lag [alerts] - 10https://gerrit.wikimedia.org/r/1215201 (https://phabricator.wikimedia.org/T411772) (owner: 10Gehel)
[16:12:57] <wikibugs>	 (03CR) 10Elukey: [C:03+2] service.py: add the team field in the Service's definition [software/spicerack] - 10https://gerrit.wikimedia.org/r/1215162 (https://phabricator.wikimedia.org/T399807) (owner: 10Elukey)
[16:13:37] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove puppetmaster1003 from active Puppet 5 servers [puppet] - 10https://gerrit.wikimedia.org/r/1215202 (https://phabricator.wikimedia.org/T365798)
[16:14:02] <wikibugs>	 (03PS2) 10Gehel: query_service: relax alerting on WDQS lag [alerts] - 10https://gerrit.wikimedia.org/r/1215201 (https://phabricator.wikimedia.org/T411772)
[16:15:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[16:15:43] <wikibugs>	 (03CR) 10CI reject: [V:04-1] query_service: relax alerting on WDQS lag [alerts] - 10https://gerrit.wikimedia.org/r/1215201 (https://phabricator.wikimedia.org/T411772) (owner: 10Gehel)
[16:15:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good and verified out of band" [puppet] - 10https://gerrit.wikimedia.org/r/1213588 (owner: 10Jasmine)
[16:17:02] <jinxer-wm>	 RESOLVED: [10x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
[16:18:37] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] Remove puppetmaster1003 from active Puppet 5 servers [puppet] - 10https://gerrit.wikimedia.org/r/1215202 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[16:31:43] <wikibugs>	 (03CR) 10Elukey: [C:03+1] "Can we also remove puppetmaster2002 ?" [puppet] - 10https://gerrit.wikimedia.org/r/1215202 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[16:33:58] <wikibugs>	 (03CR) 10Muehlenhoff: "Sure, but that's for a separate patch, I'll be coupling these patches with running the decom script." [puppet] - 10https://gerrit.wikimedia.org/r/1215202 (https://phabricator.wikimedia.org/T365798) (owner: 10Muehlenhoff)
[16:40:46] <wikibugs>	 (03PS3) 10Vgutierrez: cache::haproxy: Get rid of http-request after use_backend warning [puppet] - 10https://gerrit.wikimedia.org/r/1215119
[16:41:27] <wikibugs>	 (03CR) 10Vgutierrez: cache::haproxy: Get rid of http-request after use_backend warning (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1215119 (owner: 10Vgutierrez)
[16:42:31] <wikibugs>	 (03CR) 10CDanis: [C:03+2] Filter another client adding noise [puppet] - 10https://gerrit.wikimedia.org/r/1214759 (owner: 10Jdlrobson)
[16:47:28] <wikibugs>	 (03PS2) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[16:48:22] <wikibugs>	 (03PS3) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[16:49:22] <icinga-wm>	 PROBLEM - mailman list info ssl expiry on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:49:57] <wikibugs>	 (03CR) 10CI reject: [V:04-1] services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[16:51:55] <wikibugs>	 (03PS3) 10Cathal Mooney: lvs1019: move row D vlans to primary and add new C/D per-rack vlans [puppet] - 10https://gerrit.wikimedia.org/r/1207891 (https://phabricator.wikimedia.org/T405628)
[16:52:21] <wikibugs>	 (03PS4) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[16:53:52] <wikibugs>	 (03CR) 10CI reject: [V:04-1] services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[16:57:12] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:57:12] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[16:58:45] <wikibugs>	 10SRE-SLO: Sloth: onboard subset of existing SLOs to pilot - https://phabricator.wikimedia.org/T409310#11433516 (10herron)
[16:59:15] <wikibugs>	 10SRE-SLO: Sloth: onboard subset of existing SLOs to pilot - https://phabricator.wikimedia.org/T409310#11433517 (10herron) onboarded wikifunctions today as well with config:  ` # This example shows a simple service level by implementing a single SLO without alerts. # It disables page (critical) and ticket (warni...
[17:00:05] <jouncebot>	 jhathaway and rzl: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet request window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1700).
[17:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[17:03:48] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1229 crashed - Broken memory module at B7 - https://phabricator.wikimedia.org/T411652#11433548 (10Jclark-ctr) Replaced the failed DIMM @MoritzMuehlenhoff. I swapped A7 and B7 after replacing B7 so it’s easier to troubleshoot later if the issue comes back.
[17:05:33] <wikibugs>	 (03PS5) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[17:05:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DBA, 06DC-Ops: db1229 crashed - Broken memory module at B7 - https://phabricator.wikimedia.org/T411652#11433551 (10Jclark-ctr) 05Open→03Resolved
[17:06:00] <logmsgbot>	 !log brett@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: move primary uplink from move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - T405628
[17:06:04] <stashbot>	 T405628: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628
[17:06:05] <topranks>	 !log disable BGP to lvs1019 on eqiad coure routers ahead of switch migration T405628
[17:06:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:06:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433554 (10BCornwall)
[17:07:02] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 55267 bytes in 0.064 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:07:02] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.172 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:07:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528) (owner: 10Elukey)
[17:07:12] <icinga-wm>	 RECOVERY - mailman list info ssl expiry on lists1004 is OK: OK - Certificate lists.wikimedia.org will expire on Tue 03 Feb 2026 07:30:03 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[17:08:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:11:04] <wikibugs>	 (03CR) 10BCornwall: [V:03+1 C:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7795/co" [puppet] - 10https://gerrit.wikimedia.org/r/1207891 (https://phabricator.wikimedia.org/T405628) (owner: 10Cathal Mooney)
[17:14:48] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433580 (10BCornwall)
[17:15:26] <wikibugs>	 (03CR) 10Cathal Mooney: [C:03+2] lvs1019: move row D vlans to primary and add new C/D per-rack vlans [puppet] - 10https://gerrit.wikimedia.org/r/1207891 (https://phabricator.wikimedia.org/T405628) (owner: 10Cathal Mooney)
[17:16:40] <wikibugs>	 (03PS6) 10Elukey: services: add maps-next.w.o as FQDN for kartotherian staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215098 (https://phabricator.wikimedia.org/T409528)
[17:17:50] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.dns.netbox
[17:20:12] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[17:20:28] <logmsgbot>	 !log vriley@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[17:20:53] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:21:10] <logmsgbot>	 !log vriley@cumin1003 START - Cookbook sre.network.configure-switch-interfaces for host franio1004
[17:21:12] <wikibugs>	 (03PS1) 10Santiago Faci: ext.wikimediaEvents: Add xLab impactTest experiment-specific instrument [extensions/WikimediaEvents] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215214 (https://phabricator.wikimedia.org/T407570)
[17:21:14] <logmsgbot>	 !log vriley@cumin1003 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host franio1004
[17:22:43] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433608 (10cmooney)
[17:28:50] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 3 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433621 (10BCornwall)
[17:29:57] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11433626 (10RobH)
[17:30:02] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: lvs1018: remove cross-rack links to rows A, C and D - https://phabricator.wikimedia.org/T411781#11433627 (10RobH)
[17:30:29] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS bullseye
[17:30:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433629 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@c...
[17:31:35] <wikibugs>	 (03CR) 10Clare Ming: [C:03+1] ext.wikimediaEvents: Add xLab impactTest experiment-specific instrument [extensions/WikimediaEvents] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215214 (https://phabricator.wikimedia.org/T407570) (owner: 10Santiago Faci)
[17:32:03] <wikibugs>	 (03PS2) 10Isabelle Hurbain-Palatin: Activate postprocessing cache on testwiki, test2wiki, officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255)
[17:32:09] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11433638 (10RobH) Day 13 Update:  * all hosts in rows C and D migrated ** lvs1018 in row B has links into C and D need removal via T411781 before we can kill...
[17:33:07] <wikibugs>	 (03CR) 10Isabelle Hurbain-Palatin: "I'm removing my own -2 because I think it's not CRITICAL to not merge this, BUT I'd really like I806fa84d5d7837b21709ce8997c2b02a8b9548e2 " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[17:38:37] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, December 09 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc" [extensions/WikimediaEvents] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215214 (https://phabricator.wikimedia.org/T407570) (owner: 10Santiago Faci)
[17:38:51] <jinxer-wm>	 FIRING: SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-eqiad:xe-0/0/32 (Transport: lvs1019:enp94s0f0np0 (Equinix, 21989994) {#20220411}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[17:43:51] <jinxer-wm>	 RESOLVED: SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-eqiad:xe-0/0/32 (Transport: lvs1019:enp94s0f0np0 (Equinix, 21989994) {#20220411}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[17:45:06] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
[17:45:45] <wikibugs>	 (03CR) 10Brouberol: [C:03+1] Add a growthbook system user and grant it access to private data [puppet] - 10https://gerrit.wikimedia.org/r/1215156 (https://phabricator.wikimedia.org/T406593) (owner: 10Btullis)
[17:47:29] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, 10netops: eqiad: rows C/D Upgrade Tracking - https://phabricator.wikimedia.org/T404609#11433701 (10RobH)
[17:47:47] <urbanecm>	 jouncebot: nowandnext
[17:47:47] <jouncebot>	 For the next 0 hour(s) and 12 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1700)
[17:47:47] <jouncebot>	 In 0 hour(s) and 12 minute(s): Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1800)
[17:47:47] <jouncebot>	 In 0 hour(s) and 12 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1800)
[17:48:05] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
[17:59:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:00:05] <jouncebot>	 bd808: #bothumor My software never has bugs. It just develops random features. Rise for Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1800).
[18:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T1800)
[18:01:54] <bd808>	 nothing for my window this week
[18:04:00] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433761 (10Jclark-ctr)
[18:05:54] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS bullseye
[18:06:01] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433764 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin...
[18:09:08] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[18:09:12] <logmsgbot>	 !log bking@deploy2002 helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
[18:13:28] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] admin/releases: deprecate shell user group releasers-mwcli [puppet] - 10https://gerrit.wikimedia.org/r/1213587 (owner: 10Dzahn)
[18:15:02] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] releases: delete now pointless classes for deprecated user groups [puppet] - 10https://gerrit.wikimedia.org/r/1214612 (owner: 10Dzahn)
[18:15:08] <wikibugs>	 (03PS2) 10Dzahn: releases: delete now pointless classes for deprecated user groups [puppet] - 10https://gerrit.wikimedia.org/r/1214612
[18:16:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433805 (10cmooney)
[18:16:55] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433810 (10cmooney)
[18:17:25] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433811 (10Jclark-ctr)
[18:17:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1020: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad - https://phabricator.wikimedia.org/T405609#11433812 (10cmooney)
[18:18:46] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433817 (10cmooney) 05Open→03Resolved
[18:18:54] <wikibugs>	 (03PS1) 10Isabelle Hurbain-Palatin: kartotherian: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215223 (https://phabricator.wikimedia.org/T383328)
[18:18:54] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] releases: delete now pointless classes for deprecated user groups [puppet] - 10https://gerrit.wikimedia.org/r/1214612 (owner: 10Dzahn)
[18:21:12] <logmsgbot>	 !log brett@cumin2002 START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
[18:21:14] <logmsgbot>	 !log brett@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
[18:21:59] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: lvs1019: move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - https://phabricator.wikimedia.org/T405628#11433829 (10BCornwall)
[18:23:13] <wikibugs>	 (03PS1) 10Jforrester: CdxDialog: use-close-button prop needs to be set to true [extensions/WikiLambda] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215224 (https://phabricator.wikimedia.org/T411655)
[18:23:30] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [extensions/WikiLambda] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215224 (https://phabricator.wikimedia.org/T411655) (owner: 10Jforrester)
[18:24:20] <wikibugs>	 (03PS6) 10Dzahn: service: add gerrit-https service to service catalog [puppet] - 10https://gerrit.wikimedia.org/r/1202842 (https://phabricator.wikimedia.org/T408532)
[18:24:24] <wikibugs>	 (03CR) 10Dzahn: service: add gerrit-https service to service catalog (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1202842 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[18:24:39] <wikibugs>	 (03CR) 10Dzahn: "thank you! great" [puppet] - 10https://gerrit.wikimedia.org/r/1211549 (https://phabricator.wikimedia.org/T338470) (owner: 10Arnaudb)
[18:25:13] <wikibugs>	 (03CR) 10Dzahn: "cool :)" [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[18:25:30] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] vrts: add high inode usage alert [alerts] - 10https://gerrit.wikimedia.org/r/1214034 (https://phabricator.wikimedia.org/T411452) (owner: 10AOkoth)
[18:29:28] <wikibugs>	 (03CR) 10Scott French: [C:03+1] "Thanks, Valentin!" [puppet] - 10https://gerrit.wikimedia.org/r/1215119 (owner: 10Vgutierrez)
[18:31:56] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q2:rack/setup/install franio1004 - https://phabricator.wikimedia.org/T405980#11433840 (10VRiley-WMF)
[18:33:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q2:rack/setup/install franio1004 - https://phabricator.wikimedia.org/T405980#11433856 (10VRiley-WMF) Set the IP address for iDRAC, enabled IPMI, and user config information.
[18:33:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 10fundraising-tech-ops: Q2:rack/setup/install franio1004 - https://phabricator.wikimedia.org/T405980#11433857 (10VRiley-WMF) a:05VRiley-WMF→03Jgreen
[18:34:32] <wikibugs>	 (03PS1) 10Dzahn: miscweb: add wikipedia25.org to extra SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215225 (https://phabricator.wikimedia.org/T408592)
[18:35:26] <wikibugs>	 (03PS2) 10Dzahn: miscweb: add wikipedia25.org to extra SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215225 (https://phabricator.wikimedia.org/T408592)
[18:37:09] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+1] kartotherian: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215223 (https://phabricator.wikimedia.org/T383328) (owner: 10Isabelle Hurbain-Palatin)
[18:37:14] <wikibugs>	 (03CR) 10Jgiannelos: [C:03+2] kartotherian: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215223 (https://phabricator.wikimedia.org/T383328) (owner: 10Isabelle Hurbain-Palatin)
[18:38:29] <wikibugs>	 (03PS1) 10Clare Ming: Test Kitchen UI: Deploying v1.1.3 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215226
[18:38:49] <wikibugs>	 (03PS5) 10Aaron Schulz: Remove /data-parsoid/ endpoint per T393557 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214143 (https://phabricator.wikimedia.org/T411517)
[18:39:02] <wikibugs>	 (03Merged) 10jenkins-bot: kartotherian: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215223 (https://phabricator.wikimedia.org/T383328) (owner: 10Isabelle Hurbain-Palatin)
[18:39:24] <wikibugs>	 (03PS1) 10Clare Ming: Test Kitchen UI: Deploying v1.1.3 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215230
[18:39:31] <wikibugs>	 (03PS1) 10Dzahn: miscweb: add wikipedia25 release (WIP) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215231 (https://phabricator.wikimedia.org/T408592)
[18:40:11] <wikibugs>	 (03PS6) 10Aaron Schulz: Remove /data-parsoid/ endpoint from specs per T393557 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214143 (https://phabricator.wikimedia.org/T411517)
[18:40:32] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214143 (https://phabricator.wikimedia.org/T411517) (owner: 10Aaron Schulz)
[18:41:32] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Test Kitchen UI: Deploying v1.1.3 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215226 (owner: 10Clare Ming)
[18:41:42] <wikibugs>	 (03CR) 10Santiago Faci: [C:03+2] Test Kitchen UI: Deploying v1.1.3 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215230 (owner: 10Clare Ming)
[18:43:22] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen UI: Deploying v1.1.3 release to staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215226 (owner: 10Clare Ming)
[18:43:23] <wikibugs>	 (03Merged) 10jenkins-bot: Test Kitchen UI: Deploying v1.1.3 release to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215230 (owner: 10Clare Ming)
[18:45:57] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
[18:46:37] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
[18:50:15] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
[18:50:48] <logmsgbot>	 !log cjming@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
[18:55:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[18:57:01] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+1] openstack: puppet: Do not commit empty role fiels [puppet] - 10https://gerrit.wikimedia.org/r/1214491 (owner: 10Majavah)
[19:02:00] <wikibugs>	 (03PS5) 10CDanis: tcpproxy: include profile::lvs::realserver in role [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:02:29] <wikibugs>	 (03PS1) 10Kosta Harlan: hCaptcha: Persist the captcha consequence in the user session [extensions/ConfirmEdit] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215234 (https://phabricator.wikimedia.org/T410657)
[19:03:01] <wikibugs>	 (03PS6) 10Dzahn: tcpproxy: include profile::lvs::realserver in role [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532)
[19:03:19] <wikibugs>	 (03PS7) 10CDanis: tcpproxy: include profile::lvs::realserver in role [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:03:20] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:04:56] <kostajh>	 I'm going to backport a patch to wmf.5, unless someone else is deploying now
[19:05:47] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] conftool-data: add tcp-proxy gerrit service [puppet] - 10https://gerrit.wikimedia.org/r/1214454 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:06:02] <wikibugs>	 (03PS2) 10Jelto: conftool-data: add tcp-proxy gerrit service [puppet] - 10https://gerrit.wikimedia.org/r/1214454 (https://phabricator.wikimedia.org/T365259)
[19:06:27] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] conftool-data: add tcp-proxy gerrit service [puppet] - 10https://gerrit.wikimedia.org/r/1214454 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:06:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215234 (https://phabricator.wikimedia.org/T410657) (owner: 10Kosta Harlan)
[19:09:02] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] service: add gerrit-https service to service catalog [puppet] - 10https://gerrit.wikimedia.org/r/1202842 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:10:10] <wikibugs>	 (03Restored) 10Dzahn: service::catalog: add gerrit-https and gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:11:43] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics_privatedata_users for Thiemo Kreuz (WMDE) - https://phabricator.wikimedia.org/T411612#11434251 (10andrea.denisse) 05Open→03In progress a:03andrea.denisse
[19:12:42] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [staging] START helmfile.d/services/kartotherian: apply
[19:13:22] <logmsgbot>	 !log jgiannelos@deploy2002 helmfile [staging] DONE helmfile.d/services/kartotherian: apply
[19:18:38] <wikibugs>	 (03Merged) 10jenkins-bot: hCaptcha: Persist the captcha consequence in the user session [extensions/ConfirmEdit] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215234 (https://phabricator.wikimedia.org/T410657) (owner: 10Kosta Harlan)
[19:19:00] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1215234|hCaptcha: Persist the captcha consequence in the user session (T410657)]]
[19:19:03] <stashbot>	 T410657: hCaptcha: Improve support for SiteKey verification - https://phabricator.wikimedia.org/T410657
[19:19:43] <wikibugs>	 (03PS2) 10Dzahn: service::catalog: add gerrit-https and gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:20:02] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service::catalog: add gerrit-https and gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:21:02] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1215234|hCaptcha: Persist the captcha consequence in the user session (T410657)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[19:21:43] <wikibugs>	 (03PS1) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1215240
[19:21:50] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[19:22:23] <wikibugs>	 (03PS3) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:24:29] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[19:26:07] <wikibugs>	 (03PS4) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:27:25] <wikibugs>	 (03PS5) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:27:56] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:28:19] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[19:29:08] <wikibugs>	 (03PS6) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:29:24] <wikibugs>	 (03CR) 10Dzahn: service::catalog: add gerrit-ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:29:38] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:30:12] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[19:30:15] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215234|hCaptcha: Persist the captcha consequence in the user session (T410657)]] (duration: 11m 16s)
[19:30:19] <stashbot>	 T410657: hCaptcha: Improve support for SiteKey verification - https://phabricator.wikimedia.org/T410657
[19:30:42] <wikibugs>	 (03PS2) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1215240
[19:30:47] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[19:32:04] <wikibugs>	 (03PS7) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:32:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:32:48] <wikibugs>	 (03CR) 10Dzahn: service::catalog: add gerrit-ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:35:07] <wikibugs>	 (03CR) 10CDanis: service::catalog: add gerrit-ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:35:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:36:21] <wikibugs>	 (03PS1) 10Andrew Bogott: admin data: update yubikey pubkey for Andrew Bogott [puppet] - 10https://gerrit.wikimedia.org/r/1215242
[19:37:02] <wikibugs>	 (03PS8) 10Dzahn: service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:37:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C:03+2] admin data: update yubikey pubkey for Andrew Bogott [puppet] - 10https://gerrit.wikimedia.org/r/1215242 (owner: 10Andrew Bogott)
[19:38:18] <wikibugs>	 (03PS3) 10CDanis: WIP [puppet] - 10https://gerrit.wikimedia.org/r/1215240
[19:38:24] <wikibugs>	 (03CR) 10Dzahn: service::catalog: add gerrit-ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:38:26] <wikibugs>	 (03CR) 10CDanis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[19:39:00] <wikibugs>	 (03CR) 10CDanis: [C:03+1] service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:39:21] <wikibugs>	 (03PS8) 10CDanis: tcpproxy: include profile::lvs::realserver in role [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:41:40] <wikibugs>	 (03CR) 10CDanis: [C:04-1] "Please use instead: Ic8dc08993269f666b1360defd95abd7fb26813fb" [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:44:37] <wikibugs>	 (03CR) 10CDanis: [C:04-1] WIP (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[19:45:00] <wikibugs>	 (03Abandoned) 10Dzahn: tcpproxy: include profile::lvs::realserver in role [puppet] - 10https://gerrit.wikimedia.org/r/1203157 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[19:45:44] <wikibugs>	 (03CR) 10Dzahn: service::catalog: add gerrit-ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:45:45] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] service::catalog: add gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1214453 (https://phabricator.wikimedia.org/T365259) (owner: 10Jelto)
[19:46:50] <jinxer-wm>	 FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[19:47:14] <wikibugs>	 (03Abandoned) 10Ryan Kemper: elastic: reboot should check uptime not jvm start time [cookbooks] - 10https://gerrit.wikimedia.org/r/1207280 (https://phabricator.wikimedia.org/T410577) (owner: 10Ryan Kemper)
[19:47:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[19:51:55] <wikibugs>	 (03PS1) 10Superpes15: [tokwiki] Allow sysops to grant/remove confirmed status [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215251 (https://phabricator.wikimedia.org/T411683)
[19:52:56] <wikibugs>	 (03PS1) 10Dzahn: service::catalog: fix conftool cluster name and disable paging for gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1215252 (https://phabricator.wikimedia.org/T365259)
[19:53:12] <wikibugs>	 (03PS2) 10Dzahn: service::catalog: fix conftool cluster name and disable paging for gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1215252 (https://phabricator.wikimedia.org/T365259)
[19:53:14] <wikibugs>	 (03CR) 10CI reject: [V:04-1] service::catalog: fix conftool cluster name and disable paging for gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1215252 (https://phabricator.wikimedia.org/T365259) (owner: 10Dzahn)
[19:54:14] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: Inbound errors on interface ssw1-e1-eqiad:xe-0/0/32 (Transport: lvs1020:enp94s0f0np0 (Equinix, 21996479) {#21989994}) - https://phabricator.wikimedia.org/T411684#11434420 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr Closing ticket. this look like on the grafna page...
[19:54:17] <wikibugs>	 (03CR) 10CDanis: [C:03+1] service::catalog: fix conftool cluster name and disable paging for gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1215252 (https://phabricator.wikimedia.org/T365259) (owner: 10Dzahn)
[19:54:27] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] service::catalog: fix conftool cluster name and disable paging for gerrit-ssh [puppet] - 10https://gerrit.wikimedia.org/r/1215252 (https://phabricator.wikimedia.org/T365259) (owner: 10Dzahn)
[19:59:44] <wikibugs>	 (03PS1) 10Kosta Harlan: Use a separate right for Special:SuggestedInvestigations [extensions/CheckUser] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215258 (https://phabricator.wikimedia.org/T411557)
[20:00:22] <kostajh>	 and syncing another patch, unless there are any objections
[20:00:58] <brett>	 !log import libvmod-netmapper 1.10-1~deb13+wmf1 into trixie-wikimedia - T401832
[20:01:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:01] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[20:01:26] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/CheckUser] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215258 (https://phabricator.wikimedia.org/T411557) (owner: 10Kosta Harlan)
[20:01:44] <kostajh>	 this one will take a while, as it has i18n changes
[20:03:18] <wikibugs>	 (03PS1) 10Jforrester: Followup Ie40b9e59a4: Fortify unified metrics method [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215259 (https://phabricator.wikimedia.org/T411793)
[20:08:57] <wikibugs>	 (03PS1) 10Superpes15: [ukwiki] Limit thanks for newbie to 3 per hour [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588)
[20:09:41] <jinxer-wm>	 FIRING: [7x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[20:10:12] <wikibugs>	 (03PS2) 10Superpes15: [ukwiki] Limit thanks for newbie to 3 per hour [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588)
[20:10:17] <wikibugs>	 (03PS1) 10Ejegg: Shorten 'close' cookie wait period for enwiki banners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215263 (https://phabricator.wikimedia.org/T411800)
[20:12:50] <wikibugs>	 (03CR) 10Greg Grossmeier: [C:03+1] "This was discussed in a call with Sam, Elliott, and myself (and a few others) and we agree to push this change out for now to save some of" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215263 (https://phabricator.wikimedia.org/T411800) (owner: 10Ejegg)
[20:13:26] <brett>	 !log import libvmod-querysort 0.4~deb13+wmf1 into trixie-wikimedia - T401832
[20:13:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:13:30] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[20:13:37] <wikibugs>	 (03Merged) 10jenkins-bot: Use a separate right for Special:SuggestedInvestigations [extensions/CheckUser] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215258 (https://phabricator.wikimedia.org/T411557) (owner: 10Kosta Harlan)
[20:13:56] <logmsgbot>	 !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1215258|Use a separate right for Special:SuggestedInvestigations (T411557)]]
[20:14:19] <wikibugs>	 (03PS3) 10Superpes15: [ukwiki] Limit thanks for newbies to 3 per hour [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588)
[20:14:41] <jinxer-wm>	 FIRING: [14x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_gerrit-ssh.toml has errors - https://wikitech.wikimedia.org/wiki/Confd#Monitoring - https://grafana.wikimedia.org/d/OUJF1VI4k/confd - https://alerts.wikimedia.org/?q=alertname%3DConfdResourceFailed
[20:15:03] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215263 (https://phabricator.wikimedia.org/T411800) (owner: 10Ejegg)
[20:16:14] <wikibugs>	 (03Abandoned) 10Andriy.v: Limit thanks for new users at uk.wikipedia to 3 per hour [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214636 (owner: 10Andriy.v)
[20:21:31] <wikibugs>	 (03PS3) 10Andrea Denisse: Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612)
[20:24:48] <wikibugs>	 (03CR) 10A smart kitten: [C:03+1] "Code LGTM :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215251 (https://phabricator.wikimedia.org/T411683) (owner: 10Superpes15)
[20:27:57] <wikibugs>	 (03CR) 10Dzahn: Add Thiemo Kreuz to analytics_privatedata_users (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:28:03] <wikibugs>	 (03CR) 10A smart kitten: "question: You know more than me here so I'll defer to you, but has there been enough time for the community discussion to take place befor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:28:06] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] "lgtm, one nitpick inline" [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:28:19] <brett>	 !log Delete libvmod-netmapper 1.10-1~deb13+wmf1, import libvmod-netmapper 1.10~deb13+wmf1 into trixie-wikimedia - T401832
[20:28:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:22] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[20:28:58] <wikibugs>	 (03PS4) 10Andrea Denisse: Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612)
[20:29:29] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:29:30] <wikibugs>	 (03CR) 10Andrea Denisse: Add Thiemo Kreuz to analytics_privatedata_users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:29:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:29:48] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] Add Thiemo Kreuz to analytics_privatedata_users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:30:27] <wikibugs>	 (03PS5) 10Andrea Denisse: Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612)
[20:32:53] <wikibugs>	 (03CR) 10Superpes15: "Consensus seems clear and change shouldn't harm anyone, but since you've some concerns I'll wait and will schedule this during the next we" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:35:40] <wikibugs>	 (03CR) 10Superpes15: "For total clarity, this concerns LTA activity, it should be a temporary patch, so I thought that wait was more harmful to the project than" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:38:57] <wikibugs>	 (03CR) 10A smart kitten: "Thank you :) Yeah, I imagine it might not change anything; but at least then folks who don't check the wiki every day will get the chance " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:40:11] <wikibugs>	 (03CR) 10A smart kitten: "(I replied before seeing your most recent message here. To be clear, if you think it's better to deploy it today then please feel free to " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:40:29] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] Add Thiemo Kreuz to analytics_privatedata_users [puppet] - 10https://gerrit.wikimedia.org/r/1215264 (https://phabricator.wikimedia.org/T411612) (owner: 10Andrea Denisse)
[20:47:29] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics_privatedata_users for Thiemo Kreuz (WMDE) - https://phabricator.wikimedia.org/T411612#11434569 (10andrea.denisse) 05In progress→03Resolved
[20:50:12] <brett>	 !log import libvmod-wmfuniq 0.2.0~deb13+wmf1 into trixie-wikimedia - T401832
[20:50:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:15] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[20:50:17] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for medelius - https://phabricator.wikimedia.org/T411543#11434576 (10andrea.denisse)
[20:56:51] <wikibugs>	 (03CR) 10Superpes15: "Naah, no rush, you're right and we should wait ;)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[20:57:23] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1215258|Use a separate right for Special:SuggestedInvestigations (T411557)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[20:58:53] <logmsgbot>	 !log kharlan@deploy2002 kharlan: Continuing with sync
[21:00:04] <jouncebot>	 RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: Time to snap out of that daydream and deploy UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T2100).
[21:00:05] <jouncebot>	 maryum, James_F, AaronSchulz, Superpes, and ejegg: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[21:00:16] <James_F>	 Heya.
[21:00:28] <James_F>	 I see kostajh is still deploying.
[21:00:30] <Superpes>	 o/
[21:00:42] <Superpes>	 Yep 
[21:01:25] <ejegg>	 i'm here
[21:01:48] <logmsgbot>	 !log taavi@deploy2002 mwscript-k8s job started: initEditCount --wiki=tokwiki
[21:02:41] <kostajh>	 it's syncing out now 
[21:02:48] <kostajh>	 should be done in a few minutes
[21:03:02] <kostajh>	 https://spiderpig.wikimedia.org/jobs/1043
[21:03:10] * James_F nods.
[21:03:26] <cjming>	 if anyone needs a deployer, happy to help -- otherwise self-deployers can self-queue/organize
[21:03:31] <James_F>	 I can deploy if no-one else volunteers.
[21:03:34] <brett>	 !log import varnishkafka 1.2.0~deb13+wmf1 into trixie-wikimedia - T401832
[21:03:34] <James_F>	 Hah, snap.
[21:03:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:37] <stashbot>	 T401832: Upgrade Traffic hosts to trixie - https://phabricator.wikimedia.org/T401832
[21:04:37] <Superpes>	 Just FTR my patch is very simple, it can be merged together with any other patch, at your own discretion :)
[21:04:49] <James_F>	 Superpes: Yeah, I'll do yours first anyway.
[21:04:56] <wikibugs>	 (03PS1) 10Andrea Denisse: Add Caro Medelius to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1215267 (https://phabricator.wikimedia.org/T411543)
[21:04:57] <wikibugs>	 (03CR) 10Andrea Denisse: "Waiting for manager's explicit approval before merging." [puppet] - 10https://gerrit.wikimedia.org/r/1215267 (https://phabricator.wikimedia.org/T411543) (owner: 10Andrea Denisse)
[21:05:48] * cjming thanks James_F
[21:05:52] * James_F drums fingers waiting for sync.
[21:05:57] <wikibugs>	 (03CR) 10Superpes15: [C:04-1] "Just waiting a few other days to achieve a clear consensus" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215262 (https://phabricator.wikimedia.org/T411588) (owner: 10Superpes15)
[21:06:24] <Superpes>	 Thanks @James_F :)
[21:09:02] <wikibugs>	 06SRE, 10SRE-Access-Requests: Update SSH key for kamila - https://phabricator.wikimedia.org/T411404#11434612 (10Raine) I am leaving this open as a reminder to delete the old key, but I'm currently unable to do that (blocked by T411816). If it's in the way, feel free to close it.
[21:09:24] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for medelius - https://phabricator.wikimedia.org/T411543#11434615 (10VPuffetMichel) @andrea.denisse I approve this access for Caro. Thank you!
[21:11:29] <James_F>	 Okie-dokie, let's do maryum, Superpes, ejegg, and AaronSchulz's patches together.
[21:11:38] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for medelius - https://phabricator.wikimedia.org/T411543#11434622 (10andrea.denisse)
[21:11:41] <logmsgbot>	 !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215258|Use a separate right for Special:SuggestedInvestigations (T411557)]] (duration: 57m 45s)
[21:11:55] <wikibugs>	 10ops-eqiad, 06DC-Ops: Inbound errors on interface ssw1-e1-eqiad:xe-0/0/32 (Transport: lvs1020:enp94s0f0np0 (Equinix, 21996479) {#21989994}) - https://phabricator.wikimedia.org/T411818 (10phaultfinder) 03NEW
[21:11:56] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] Add Caro Medelius to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1215267 (https://phabricator.wikimedia.org/T411543) (owner: 10Andrea Denisse)
[21:12:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215251 (https://phabricator.wikimedia.org/T411683) (owner: 10Superpes15)
[21:12:14] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214659 (https://phabricator.wikimedia.org/T399664) (owner: 10Mstyles)
[21:12:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214143 (https://phabricator.wikimedia.org/T411517) (owner: 10Aaron Schulz)
[21:12:15] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215263 (https://phabricator.wikimedia.org/T411800) (owner: 10Ejegg)
[21:13:27] <wikibugs>	 (03Merged) 10jenkins-bot: [tokwiki] Allow sysops to grant/remove confirmed status [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215251 (https://phabricator.wikimedia.org/T411683) (owner: 10Superpes15)
[21:13:35] <wikibugs>	 (03Merged) 10jenkins-bot: OATHAuth: Remove wmgOATHAuthDisableRight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214659 (https://phabricator.wikimedia.org/T399664) (owner: 10Mstyles)
[21:13:38] <wikibugs>	 (03Merged) 10jenkins-bot: Remove /data-parsoid/ endpoint from specs per T393557 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1214143 (https://phabricator.wikimedia.org/T411517) (owner: 10Aaron Schulz)
[21:13:42] <wikibugs>	 (03Merged) 10jenkins-bot: Shorten 'close' cookie wait period for enwiki banners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215263 (https://phabricator.wikimedia.org/T411800) (owner: 10Ejegg)
[21:13:44] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, December 04 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[21:14:03] <logmsgbot>	 !log jforrester@deploy2002 Started scap sync-world: Backport for [[gerrit:1215251|[tokwiki] Allow sysops to grant/remove confirmed status (T411683)]], [[gerrit:1214659|OATHAuth: Remove wmgOATHAuthDisableRight (T399664)]], [[gerrit:1214143|Remove /data-parsoid/ endpoint from specs per T393557 (T411517)]], [[gerrit:1215263|Shorten 'close' cookie wait period for enwiki banners (T411800)]]
[21:14:10] <cscott>	 i'm going to slip into the tail end of this window if possible.
[21:14:17] <stashbot>	 T411683: Allow tokwiki admins to grant and remove 'confirmed' - https://phabricator.wikimedia.org/T411683
[21:14:17] <stashbot>	 T399664: Expand 2FA Opt-In Privileges - https://phabricator.wikimedia.org/T399664
[21:14:17] <stashbot>	 T393557: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557
[21:14:18] <stashbot>	 T411517: Clean up Math API OpenAPI specs and remove data-parsoid route specs - https://phabricator.wikimedia.org/T411517
[21:14:18] <stashbot>	 T411800: CentralNotice code changes to show a banner to a reader with the 'waitdate: close' status - https://phabricator.wikimedia.org/T411800
[21:14:33] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for medelius - https://phabricator.wikimedia.org/T411543#11434640 (10andrea.denisse) 05In progress→03Resolved Closing as resolved, please let me know if there's anything else I can assist with.
[21:14:34] <James_F>	 cscott: First I've got mine and Moriel's backports, but sure.
[21:14:44] <wikibugs>	 (03PS1) 10Andrea Denisse: Add new SSH key for Zoe. [puppet] - 10https://gerrit.wikimedia.org/r/1215270 (https://phabricator.wikimedia.org/T411506)
[21:15:05] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics_privatedata_users for Thiemo Kreuz (WMDE) - https://phabricator.wikimedia.org/T411612#11434671 (10andrea.denisse) Closing as resolved, please let me know if there's anything else I can assist with.
[21:15:13] <wikibugs>	 (03CR) 10Andrea Denisse: [C:03+2] "I confirmed with Zoe that this is her key." [puppet] - 10https://gerrit.wikimedia.org/r/1215270 (https://phabricator.wikimedia.org/T411506) (owner: 10Andrea Denisse)
[21:17:52] <ejegg>	 thanks James_F, i'm seeing the new value at least on the test servers
[21:18:01] <James_F>	 Cool.
[21:18:11] <James_F>	 Or at least, most of the test servers.
[21:18:17] <logmsgbot>	 !log jforrester@deploy2002 mstyles, aaron, superpes, jforrester, ejegg: Backport for [[gerrit:1215251|[tokwiki] Allow sysops to grant/remove confirmed status (T411683)]], [[gerrit:1214659|OATHAuth: Remove wmgOATHAuthDisableRight (T399664)]], [[gerrit:1214143|Remove /data-parsoid/ endpoint from specs per T393557 (T411517)]], [[gerrit:1215263|Shorten 'close' cookie wait period for enwiki banners (T411800)]] synced to the t
[21:18:18] <logmsgbot>	 estservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:18:18] <James_F>	 Two still haven't synced.
[21:18:20] <James_F>	 Aha.
[21:18:31] <Superpes>	 @James_F Mine works fine too :)
[21:18:36] <James_F>	 Excellent.
[21:19:32] <logmsgbot>	 !log jforrester@deploy2002 mstyles, aaron, superpes, jforrester, ejegg: Continuing with sync
[21:20:12] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release mw-script/utk6lsuw on k8s@codfw in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=codfw&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[21:20:57] <wikibugs>	 (03PS1) 10Andrea Denisse: Remove unused SSH key for Zoe [puppet] - 10https://gerrit.wikimedia.org/r/1215273 (https://phabricator.wikimedia.org/T411506)
[21:20:58] <wikibugs>	 (03CR) 10Andrea Denisse: "To merge once she ensures access with her previous key." [puppet] - 10https://gerrit.wikimedia.org/r/1215273 (https://phabricator.wikimedia.org/T411506) (owner: 10Andrea Denisse)
[21:23:31] <AaronSchulz>	 James_F: thanks
[21:23:50] <James_F>	 Of course. Thanks for getting rid of old API endpoints. :-)
[21:24:07] <logmsgbot>	 !log jforrester@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215251|[tokwiki] Allow sysops to grant/remove confirmed status (T411683)]], [[gerrit:1214659|OATHAuth: Remove wmgOATHAuthDisableRight (T399664)]], [[gerrit:1214143|Remove /data-parsoid/ endpoint from specs per T393557 (T411517)]], [[gerrit:1215263|Shorten 'close' cookie wait period for enwiki banners (T411800)]] (duration: 10m 04s)
[21:24:16] <stashbot>	 T411683: Allow tokwiki admins to grant and remove 'confirmed' - https://phabricator.wikimedia.org/T411683
[21:24:16] <stashbot>	 T399664: Expand 2FA Opt-In Privileges - https://phabricator.wikimedia.org/T399664
[21:24:17] <stashbot>	 T393557: Block external traffic to RESTBase /page/data-parsoid endpoint and investigate internal usage - https://phabricator.wikimedia.org/T393557
[21:24:17] <stashbot>	 T411517: Clean up Math API OpenAPI specs and remove data-parsoid route specs - https://phabricator.wikimedia.org/T411517
[21:24:17] <stashbot>	 T411800: CentralNotice code changes to show a banner to a reader with the 'waitdate: close' status - https://phabricator.wikimedia.org/T411800
[21:24:30] <maryum>	 I'm here for my deploy!
[21:24:32] <maryum>	 is it too late?
[21:24:37] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by jforrester@deploy2002 using scap backport" [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215259 (https://phabricator.wikimedia.org/T411793) (owner: 10Jforrester)
[21:24:41] <Superpes>	 Many thanks for your assistance @James_F :3
[21:24:43] <maryum>	 I can also deploy on my own with spiderpig
[21:24:46] <James_F>	 maryum: Already done. All good.
[21:24:52] <maryum>	 yay thank you so much!
[21:24:58] <James_F>	 Superpes: Happy to help.
[21:25:04] <James_F>	 maryum: Of course! Have a good Thursday.
[21:25:16] <maryum>	 James_F you too!
[21:25:17] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Riku Silvola - https://phabricator.wikimedia.org/T411624#11434692 (10andrea.denisse) 05Open→03In progress a:03andrea.denisse
[21:25:28] <wikibugs>	 (03CR) 10Jforrester: [C:03+2] CdxDialog: use-close-button prop needs to be set to true [extensions/WikiLambda] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215224 (https://phabricator.wikimedia.org/T411655) (owner: 10Jforrester)
[21:26:20] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Riku Silvola - https://phabricator.wikimedia.org/T411624#11434694 (10andrea.denisse)
[21:30:20] * James_F drums fingers again.
[21:37:28] <wikibugs>	 (03Merged) 10jenkins-bot: Followup Ie40b9e59a4: Fortify unified metrics method [core] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215259 (https://phabricator.wikimedia.org/T411793) (owner: 10Jforrester)
[21:37:31] <wikibugs>	 (03Merged) 10jenkins-bot: CdxDialog: use-close-button prop needs to be set to true [extensions/WikiLambda] (wmf/1.46.0-wmf.5) - 10https://gerrit.wikimedia.org/r/1215224 (https://phabricator.wikimedia.org/T411655) (owner: 10Jforrester)
[21:37:48] <logmsgbot>	 !log jforrester@deploy2002 Started scap sync-world: Backport for [[gerrit:1215259|Followup Ie40b9e59a4: Fortify unified metrics method (T411793)]]
[21:37:51] <stashbot>	 T411793: Fortify new API metrics method - https://phabricator.wikimedia.org/T411793
[21:37:53] <James_F>	 Finally.
[21:40:02] <logmsgbot>	 !log jforrester@deploy2002 jforrester: Backport for [[gerrit:1215259|Followup Ie40b9e59a4: Fortify unified metrics method (T411793)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:40:59] <logmsgbot>	 !log jforrester@deploy2002 jforrester: Continuing with sync
[21:43:26] <wikibugs>	 (03CR) 10Kamila Součková: [C:03+1] "yay, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1214509 (owner: 10Majavah)
[21:45:04] <logmsgbot>	 !log jforrester@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215259|Followup Ie40b9e59a4: Fortify unified metrics method (T411793)]] (duration: 07m 16s)
[21:45:08] <stashbot>	 T411793: Fortify new API metrics method - https://phabricator.wikimedia.org/T411793
[21:45:09] <James_F>	 cscott: Over to you.
[21:45:25] <cscott>	 ok, fingers crossed on this one ;)
[21:46:10] <James_F>	 cscott: The last possible deploy slot just before everyone travels and there's no train next week; what could possibly go wrong? ;-)
[21:46:18] <cscott>	 exactly!
[21:47:53] <cscott>	 i'm only touching officewiki and test/test2 wiki though, so i hope the damage i can do is limited
[21:47:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[21:49:22] <wikibugs>	 (03PS3) 10C. Scott Ananian: Activate postprocessing cache on testwiki, test2wiki, officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[21:51:03] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by cscott@deploy2002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[21:52:01] <wikibugs>	 (03Merged) 10jenkins-bot: Activate postprocessing cache on testwiki, test2wiki, officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215115 (https://phabricator.wikimedia.org/T348255) (owner: 10Isabelle Hurbain-Palatin)
[21:52:22] <logmsgbot>	 !log cscott@deploy2002 Started scap sync-world: Backport for [[gerrit:1215115|Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255)]]
[21:52:26] <stashbot>	 T348255: Parser cache infrastructure for OutputTransform - https://phabricator.wikimedia.org/T348255
[21:54:27] <logmsgbot>	 !log cscott@deploy2002 ihurbain, cscott: Backport for [[gerrit:1215115|Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[21:59:40] <jinxer-wm>	 FIRING: SystemdUnitFailed: send_tile_invalidations.service on maps1011:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:59:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[22:00:05] <jouncebot>	 Deploy window Web Team deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251204T2200)
[22:00:56] <cscott>	 finishing up, just testing still
[22:02:40] <sbassett>	 Hey all - would like to get a couple of security patches out if backports are wrapping up and the Web Team won’t be using their window.
[22:02:41] <logmsgbot>	 !log cscott@deploy2002 ihurbain, cscott: Continuing with sync
[22:04:20] <wikibugs>	 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Riku Silvola - https://phabricator.wikimedia.org/T411624#11434841 (10andrea.denisse)
[22:04:25] <cscott>	 yup, i'm just wrapping up (from backports), can't speak for web team.
[22:05:14] <greg-g>	 thanks James_F for always being lovely and a wonderful deployer :)
[22:05:34] <James_F>	 greg-g: Always.
[22:05:50] <James_F>	 greg-g: Thanks to FRT for being wonderful people doing great work.
[22:05:55] <greg-g>	 we try!
[22:06:45] <logmsgbot>	 !log cscott@deploy2002 Finished scap sync-world: Backport for [[gerrit:1215115|Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255)]] (duration: 14m 23s)
[22:06:49] <stashbot>	 T348255: Parser cache infrastructure for OutputTransform - https://phabricator.wikimedia.org/T348255
[22:07:26] <wikibugs>	 (03PS1) 10Hubaishan: [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819)
[22:08:13] <wikibugs>	 (03CR) 10CI reject: [V:04-1] [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan)
[22:09:01] <wikibugs>	 (03PS1) 10Andrea Denisse: Add Riku Silvola to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/1215281 (https://phabricator.wikimedia.org/T411624)
[22:11:08] <logmsgbot>	 !log ryankemper@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat[1008-1011].eqiad.wmnet with reason: T411568
[22:11:13] <stashbot>	 T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568
[22:11:15] <wikibugs>	 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Riku Silvola - https://phabricator.wikimedia.org/T411624#11434872 (10andrea.denisse) 05In progress→03Resolved Closing as resolved, please let me know if there's anything else I can assist with.
[22:12:32] <wikibugs>	 06SRE, 06collaboration-services, 06Traffic, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11434875 (10Dzahn) We have to switch these hosts from nftables back to ferm as firewall provider.  Reason: liberica does not support nftables yet.
[22:14:05] <cscott>	 i'm done, thanks James_F for letting me stretch the window. ;)
[22:14:43] <cscott>	 i learned fun new things about FlaggedRevisions and what's a day like without some surprising new thing to learn?
[22:16:04] <wikibugs>	 (03PS1) 10Dzahn: tcpproxy: switch firewall provider from nftables to ferm [puppet] - 10https://gerrit.wikimedia.org/r/1215284 (https://phabricator.wikimedia.org/T408532)
[22:18:16] <sbassett>	 Ok, starting on the 2 sec deploys unless there are any objections...
[22:18:28] <wikibugs>	 (03PS2) 10Hubaishan: [config] arwiktionary: add 2 namespaces with talks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819)
[22:18:31] <cscott>	 none from me
[22:19:49] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, December 08 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1215280 (https://phabricator.wikimedia.org/T411819) (owner: 10Hubaishan)
[22:20:44] <ryankemper>	 !log T411568 Rebooting `stat*`
[22:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:48] <stashbot>	 T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568
[22:22:16] <logmsgbot>	 !log dzahn@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: T408532
[22:22:20] <stashbot>	 T408532: Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532
[22:23:08] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "puppet disabled on all - downtimed all - .. switching a single one first" [puppet] - 10https://gerrit.wikimedia.org/r/1215284 (https://phabricator.wikimedia.org/T408532) (owner: 10Dzahn)
[22:28:25] <sbassett>	 !log Deployed security fix for T408135
[22:28:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:29:54] <jinxer-wm>	 RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[22:30:41] <wikibugs>	 06SRE, 10SRE-Access-Requests, 06Fundraising-Backlog: Requesting access to analytics-privatedata-users for astein - https://phabricator.wikimedia.org/T411679#11434945 (10andrea.denisse) 05In progress→03Resolved Closing as resolved, feel free to reopen if there's anything else I can assist with.
[22:31:26] <wikibugs>	 06SRE, 10SRE-Access-Requests: Add FIDO-backed SSH key for brennen - https://phabricator.wikimedia.org/T411730#11434954 (10andrea.denisse) Hi folks, the patch for this task is merged. Can we close it as resolved?
[22:35:23] <logmsgbot>	 !log ryankemper@cumin2002 START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
[22:36:26] <wikibugs>	 06SRE, 06Data-Platform-SRE (2025.11.07 - 2025.11.28), 13Patch-For-Review: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts - https://phabricator.wikimedia.org/T411568#11434969 (10RKemper) Stat host reboots completed.  Shifting gears to rebooting `an-test*`. Note there's still lots of `an-...
[22:37:04] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests: Grant Access to analytics-privatedata-users for Silvia G - https://phabricator.wikimedia.org/T411436#11434970 (10andrea.denisse)
[22:37:10] <sbassett>	 !log Deployed security fix for T409226
[22:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:41:23] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests: Grant Access to analytics-privatedata-users for Silvia G - https://phabricator.wikimedia.org/T411436#11434984 (10andrea.denisse) 05In progress→03Stalled >>! In T411436#11426873, @andrea.denisse wrote: >>>! In T411436#11425341, @SEgt-WMF wrote: >> In c...
[22:42:47] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-cluster
[22:42:47] <logmsgbot>	 !log dzahn@cumin2002 END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
[22:44:52] <sbassett>	 Sec deploys done, thanks.
[22:46:14] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy1002.eqiad.wmnet
[22:47:13] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy2001.codfw.wmnet
[22:48:08] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy2002.codfw.wmnet
[22:48:20] <wikibugs>	 14SRE-Sprint-Week-Sustainability-March2023, 10Beta-Cluster-Infrastructure, 06DBA, 10MediaWiki-libs-Rdbms, 07Epic: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#11435001 (10Reedy)
[22:49:00] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy3001.esams.wmnet
[22:50:05] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy1002.eqiad.wmnet
[22:50:24] <wikibugs>	 06SRE, 10SRE-Access-Requests: Add FIDO-backed SSH key for brennen - https://phabricator.wikimedia.org/T411730#11435007 (10brennen) 05Open→03Resolved a:03brennen Confirmed new key is working against production machines. Thanks!
[22:50:42] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy4001.ulsfo.wmnet
[22:51:02] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2001.codfw.wmnet
[22:51:25] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4001.ulsfo.wmnet
[22:51:33] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy3002.esams.wmnet
[22:51:39] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy4002.ulsfo.wmnet
[22:51:49] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2002.codfw.wmnet
[22:52:35] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy5001.eqsin.wmnet
[22:52:55] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3001.esams.wmnet
[22:54:54] <jinxer-wm>	 FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST secrets) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[22:55:12] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[22:55:23] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4002.ulsfo.wmnet
[22:55:28] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3002.esams.wmnet
[22:55:32] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy5002.eqsin.wmnet
[22:55:56] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy6001.drmrs.wmnet
[22:56:14] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy6002.drmrs.wmnet
[22:56:37] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5001.eqsin.wmnet
[22:58:30] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy7001.magru.wmnet
[22:59:34] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5002.eqsin.wmnet
[22:59:52] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6001.drmrs.wmnet
[23:00:11] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6002.drmrs.wmnet
[23:00:22] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.reboot-single for host tcp-proxy7002.magru.wmnet
[23:02:27] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7001.magru.wmnet
[23:04:18] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7002.magru.wmnet
[23:05:59] <wikibugs>	 06SRE, 06collaboration-services, 06Traffic, 06Release-Engineering-Team (Radar): Deploy a TCP proxy across all DCs - https://phabricator.wikimedia.org/T408532#11435041 (10Dzahn) downtimed, ran puppet, rebooted the 14 VMs and verified ferm service is running via cumin/cookbook. they are all on ferm now.
[23:07:38] <wikibugs>	 (03CR) 10Dzahn: "I have switched the tcp-proxy VMs to ferm now." [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:10:28] <wikibugs>	 (03PS4) 10Dzahn: tcpproxy: add lvs::realserver:* to puppet role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:11:04] <wikibugs>	 (03CR) 10Dzahn: tcpproxy: add lvs::realserver:* to puppet role (WIP) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:12:47] <wikibugs>	 (03PS5) 10Dzahn: tcpproxy: add lvs::realserver:* to puppet role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:13:01] <wikibugs>	 (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:15:14] <wikibugs>	 (03PS3) 10Dzahn: miscweb: add wikipedia25.org to extra SANs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215225 (https://phabricator.wikimedia.org/T408592)
[23:16:40] <tzatziki>	 !log removing 5 files for legal compliance
[23:16:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:16:45] <logmsgbot>	 !log ryankemper@cumin2002 END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
[23:17:56] <wikibugs>	 (03CR) 10Dzahn: "for context on the current status of that domain: https://phabricator.wikimedia.org/T408168" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1215225 (https://phabricator.wikimedia.org/T408592) (owner: 10Dzahn)
[23:19:20] <wikibugs>	 (03CR) 10Dzahn: [V:03+1] "compiler output shows noop on all the hosts in the commit message and on tcpproxy itself: https://puppet-compiler.wmflabs.org/output/12152" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:20:11] <jinxer-wm>	 FIRING: Temperature: Temp issue on wdqs1023:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=wdqs1023 - https://alerts.wikimedia.org/?q=alertname%3DTemperature
[23:23:07] <tzatziki>	 !log removing 3 files for legal compliance
[23:23:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:24:00] <wikibugs>	 06SRE, 06collaboration-services, 13Patch-For-Review: setup gerrit2003 with gerrit service (gerrit on bookworm) - https://phabricator.wikimedia.org/T372804#11435088 (10Dzahn) 05In progress→03Resolved I am happy to be convinced otherwise and if you want to reopen it that's not a big deal to me. But all...
[23:25:11] <jinxer-wm>	 RESOLVED: Temperature: Temp issue on wdqs1023:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=wdqs1023 - https://alerts.wikimedia.org/?q=alertname%3DTemperature
[23:27:01] <wikibugs>	 (03CR) 10Dzahn: [C:03+1] Remove unused SSH key for Zoe [puppet] - 10https://gerrit.wikimedia.org/r/1215273 (https://phabricator.wikimedia.org/T411506) (owner: 10Andrea Denisse)
[23:30:12] <jinxer-wm>	 FIRING: [4x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
[23:34:56] <tzatziki>	 !log removing 2 files for legal compliance
[23:34:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:36:57] <wikibugs>	 (03PS1) 10Dzahn: trafficserver: add a map for gerrit as a backend [puppet] - 10https://gerrit.wikimedia.org/r/1215317 (https://phabricator.wikimedia.org/T365259)
[23:38:07] <wikibugs>	 (03CR) 10Dzahn: "[cumin2002:~] $ host gerrit.discovery.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/1215317 (https://phabricator.wikimedia.org/T365259) (owner: 10Dzahn)
[23:44:02] <wikibugs>	 (03CR) 10CDanis: [C:03+1] "LGTM ship it ! puppet should configure the servers with extra loopback addresses with the v4 and v6 in each cluster" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:45:18] <wikibugs>	 (03PS6) 10Dzahn: tcpproxy: add lvs::realserver:* to puppet role [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:47:05] <jinxer-wm>	 FIRING: KubernetesCalicoDown: ml-serve1013.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s-mlserve&var-instance=ml-serve1013.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[23:47:50] <tzatziki>	 !log removing 4 files for legal compliance
[23:47:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:48:18] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "thanks! doing" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)
[23:53:53] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "tested on tcp-proxy1001 first - I got 4 new interfaces: tunl0@NONE, ipip0@NONE, ip6tnl0@NONE and ipip60@NONE. And I got the gerrit-lb IP o" [puppet] - 10https://gerrit.wikimedia.org/r/1215240 (owner: 10CDanis)