[00:03:44] <jinxer-wm>	 FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[00:08:21] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189377
[00:08:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189377 (owner: 10TrainBranchBot)
[00:13:44] <jinxer-wm>	 RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[00:31:44] <jinxer-wm>	 FIRING: RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[00:32:40] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1189377 (owner: 10TrainBranchBot)
[00:36:44] <jinxer-wm>	 FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[00:54:50] <wikibugs>	 (03CR) 10Btullis: [C:03+1] opensearch-operator: fix pod security settings [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189320 (https://phabricator.wikimedia.org/T362978) (owner: 10Bking)
[00:56:44] <jinxer-wm>	 RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[01:00:44] <logmsgbot>	 !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image
[01:12:30] <logmsgbot>	 !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 11m 46s)
[01:24:07] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release airflow-dev/file-export-test-instance on k8s-dse@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=airflow-dev - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[01:34:00] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[02:03:44] <jinxer-wm>	 FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[02:29:11] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:36:41] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:48:44] <jinxer-wm>	 RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[02:55:48] <wikibugs>	 (03PS1) 10KartikMistry: Update Recommendation API to 2025-09-15-194552-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189380 (https://phabricator.wikimedia.org/T404223)
[03:03:18] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2025-09-16-161231-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189381 (https://phabricator.wikimedia.org/T394008)
[03:19:00] <jinxer-wm>	 FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - eqiad - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[03:41:21] <wikibugs>	 06SRE: Update Wikitech "Search Console Data" doc to align with current ITS-first request process - https://phabricator.wikimedia.org/T404927#11192414 (10nshahquinn-wmf) 05Open→03Resolved a:03nshahquinn-wmf I was actually coincidentally working on search console documentation, so I've gone ahead and mad...
[03:44:00] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[03:56:06] <jinxer-wm>	 FIRING: CoreRouterInterfaceDown: Core router interface down - cr1-esams:xe-0/0/8 () - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr1-esams:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[04:03:44] <jinxer-wm>	 FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[04:48:44] <jinxer-wm>	 RESOLVED: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqiad RIPE Atlas anchor: failures over threshold for measurement 96503802 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
[05:03:17] <jinxer-wm>	 FIRING: ProbeDown: Service wdqs2013:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:08:17] <jinxer-wm>	 RESOLVED: ProbeDown: Service wdqs2013:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2013:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[05:09:00] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:16:37] <icinga-wm>	 RECOVERY - mysqld processes on es2027 is OK: PROCS OK: 1 process with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[05:16:43] <icinga-wm>	 RECOVERY - MariaDB read only es3 on es2027 is OK: Version 10.11.13-MariaDB-log, Uptime 7s, read_only: True, event_scheduler: True, 4.15 QPS, connection latency: 0.028914s https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[05:19:30] <logmsgbot>	 !log fceratto@cumin1002 START - Cookbook sre.mysql.pool es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
[05:24:07] <jinxer-wm>	 FIRING: HelmReleaseBadStatus: Helm release airflow-dev/file-export-test-instance on k8s-dse@eqiad in state failed - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-dse&var-namespace=airflow-dev - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus
[05:32:49] <wikibugs>	 (03PS4) 10Arnaudb: gerrit: toggle mod_qos log_only off [puppet] - 10https://gerrit.wikimedia.org/r/1189386 (https://phabricator.wikimedia.org/T402611)
[05:32:49] <wikibugs>	 (03CR) 10Arnaudb: "I'll send a notice on IRC and slack before merging this" [puppet] - 10https://gerrit.wikimedia.org/r/1189386 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb)
[05:34:00] <jinxer-wm>	 RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[05:34:06] <jinxer-wm>	 FIRING: [2x] SwitchCoreInterfaceDown: Switch core interface down - ssw1-f1-codfw:et-0/0/6 (Core: lsw1-f2-codfw:ethernet-1/55 {#130117100025}) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Switch_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=ssw1-f1-codfw:9804 - https://alerts.wikimedia.org/?q=alertname%3DSwitchCoreInterfaceDown
[05:54:58] <kart_>	 Deploying cxserver..
[05:56:16] <wikibugs>	 (03CR) 10KartikMistry: [C:03+2] Update cxserver to 2025-09-16-161231-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189381 (https://phabricator.wikimedia.org/T394008) (owner: 10KartikMistry)
[05:57:56] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2025-09-16-161231-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189381 (https://phabricator.wikimedia.org/T394008) (owner: 10KartikMistry)
[05:59:56] <logmsgbot>	 !log kartik@deploy1003 helmfile [staging] START helmfile.d/services/cxserver: apply
[06:00:05] <jouncebot>	 Deploy window MediaWiki infrastructure (UTC early) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250918T0600)
[06:00:05] <jouncebot>	 marostegui, Amir1, and federico3: Time to snap out of that daydream and deploy Primary database switchover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250918T0600).
[06:00:21] <logmsgbot>	 !log kartik@deploy1003 helmfile [staging] DONE helmfile.d/services/cxserver: apply
[06:04:36] <logmsgbot>	 !log kartik@deploy1003 helmfile [codfw] START helmfile.d/services/cxserver: apply
[06:05:10] <logmsgbot>	 !log kartik@deploy1003 helmfile [codfw] DONE helmfile.d/services/cxserver: apply
[06:05:20] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
[06:05:21] <logmsgbot>	 !log fceratto@cumin1002 END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2027.codfw.wmnet onto es2050.codfw.wmnet
[06:05:29] <logmsgbot>	 !log kartik@deploy1003 helmfile [eqiad] START helmfile.d/services/cxserver: apply
[06:06:02] <logmsgbot>	 !log kartik@deploy1003 helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
[06:08:23] <kart_>	 !log Updated cxserver to 2025-09-16-161231-production (T394008, T404567, T404298, T404181)
[06:08:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:08:32] <stashbot>	 T394008: CXServer doesn't support section suggestions for "be-tarask" language code - https://phabricator.wikimedia.org/T394008
[06:08:33] <stashbot>	 T404567: Post-creation work for tokwiki - https://phabricator.wikimedia.org/T404567
[06:08:35] <stashbot>	 T404298: Can't translate en:Tokyo in Gujarati - https://phabricator.wikimedia.org/T404298
[06:08:35] <stashbot>	 T404181: When templatedata is missing cxserver fails to extract template params from template source code - https://phabricator.wikimedia.org/T404181
[06:12:23] <wikibugs>	 (03CR) 10Brouberol: opensearch-operator: fix pod security settings (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1189320 (https://phabricator.wikimedia.org/T362978) (owner: 10Bking)
[06:29:11] <jinxer-wm>	 FIRING: SystemdUnitFailed: prometheus_amd_rocm_stats.service on ml-serve1013:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:34:37] <logmsgbot>	 !log jynus@cumin1003 dbctl commit (dc=all): 'Depool es2027 T404940', diff saved to https://phabricator.wikimedia.org/P83420 and previous config saved to /var/cache/conftool/dbconfig/20250918-063436-jynus.json
[06:34:42] <stashbot>	 T404940: es2027 database unhealthy - https://phabricator.wikimedia.org/T404940
[06:36:41] <jinxer-wm>	 FIRING: SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:38:16] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Apply installserver role to install1005 [puppet] - 10https://gerrit.wikimedia.org/r/1189169 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[06:38:54] <wikibugs>	 (03CR) 10Majavah: [C:03+2] backy2: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1188825 (owner: 10Majavah)
[06:45:40] <wikibugs>	 (03CR) 10ScheduleDeploymentBot: "Scheduled for deployment in the [Thursday, September 18 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182186 (https://phabricator.wikimedia.org/T401590) (owner: 10Ebernhardson)
[06:46:38] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54829 bytes in 4.912 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:46:40] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9235 bytes in 5.072 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[06:51:33] <wikibugs>	 (03PS1) 10Muehlenhoff: homer: Update the DHCP server in eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/1189389 (https://phabricator.wikimedia.org/T396487)
[06:56:22] <wikibugs>	 (03PS1) 10Slyngshede: Release version 0.1.13 [software/bitu] - 10https://gerrit.wikimedia.org/r/1189390 (https://phabricator.wikimedia.org/T403691)
[07:00:05] <jouncebot>	 Amir1, Urbanecm, and awight: gettimeofday() says it's time for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20250918T0700)
[07:00:05] <jouncebot>	 dcausse: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[07:00:10] <dcausse>	 o/
[07:00:13] <dcausse>	 I can deploy
[07:04:21] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] "Approved by dcausse@deploy1003 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182186 (https://phabricator.wikimedia.org/T401590) (owner: 10Ebernhardson)
[07:05:10] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus: Reduce galleries weight in search on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1182186 (https://phabricator.wikimedia.org/T401590) (owner: 10Ebernhardson)
[07:06:00] <logmsgbot>	 !log dcausse@deploy1003 Started scap sync-world: Backport for [[gerrit:1182186|cirrus: Reduce galleries weight in search on commons (T401590)]]
[07:06:04] <wikibugs>	 (03PS2) 10Majavah: P:toolforge::prometheus: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1188829
[07:06:05] <stashbot>	 T401590: Adjust CirrusSearchNamespaceWeights for Commons - https://phabricator.wikimedia.org/T401590
[07:06:44] <icinga-wm>	 PROBLEM - mailman list info on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:06:44] <icinga-wm>	 PROBLEM - mailman archives on lists1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:11:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C:03+2] Add deprecations to varnish [puppet] - 10https://gerrit.wikimedia.org/r/1180712 (https://phabricator.wikimedia.org/T398161) (owner: 10Giuseppe Lavagetto)
[07:12:10] <logmsgbot>	 !log dcausse@deploy1003 dcausse, ebernhardson: Backport for [[gerrit:1182186|cirrus: Reduce galleries weight in search on commons (T401590)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
[07:12:15] <stashbot>	 T401590: Adjust CirrusSearchNamespaceWeights for Commons - https://phabricator.wikimedia.org/T401590
[07:16:34] <icinga-wm>	 RECOVERY - mailman archives on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 54827 bytes in 0.093 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:16:34] <icinga-wm>	 RECOVERY - mailman list info on lists1004 is OK: HTTP OK: HTTP/1.1 200 OK - 9234 bytes in 0.180 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[07:17:06] <logmsgbot>	 !log dcausse@deploy1003 dcausse, ebernhardson: Continuing with sync
[07:18:55] <wikibugs>	 (03CR) 10Stevemunene: Add a dummy Ceph user keys for the cephcsi plugin to use (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/1189133 (https://phabricator.wikimedia.org/T404576) (owner: 10Stevemunene)
[07:19:00] <jinxer-wm>	 FIRING: [2x] OsmSynchronisationLag: Maps - OSM synchronization lag - eqiad - https://wikitech.wikimedia.org/wiki/Maps/Runbook - https://grafana.wikimedia.org/d/000000305/maps-performances - https://alerts.wikimedia.org/?q=alertname%3DOsmSynchronisationLag
[07:20:51] <jinxer-wm>	 FIRING: [3x] CoreRouterInterfaceDown: Core router interface down - cr1-esams:xe-0/0/8 () - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:21:46] <wikibugs>	 (03PS4) 10Stevemunene: dse-k8s:Enable CSI and the Ceph CSI plugin on dse-k8s-codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/1188754 (https://phabricator.wikimedia.org/T404576)
[07:22:21] <logmsgbot>	 !log dcausse@deploy1003 Finished scap sync-world: Backport for [[gerrit:1182186|cirrus: Reduce galleries weight in search on commons (T401590)]] (duration: 16m 20s)
[07:22:25] <stashbot>	 T401590: Adjust CirrusSearchNamespaceWeights for Commons - https://phabricator.wikimedia.org/T401590
[07:26:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [software/bitu] - 10https://gerrit.wikimedia.org/r/1189390 (https://phabricator.wikimedia.org/T403691) (owner: 10Slyngshede)
[07:27:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Update DHCP server in eqiad to install1005 [puppet] - 10https://gerrit.wikimedia.org/r/1189170 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[07:28:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/1188829 (owner: 10Majavah)
[07:29:00] <wikibugs>	 (03CR) 10Majavah: [C:03+2] P:toolforge::prometheus: Drop buster support [puppet] - 10https://gerrit.wikimedia.org/r/1188829 (owner: 10Majavah)
[07:32:15] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6980/co" [puppet] - 10https://gerrit.wikimedia.org/r/1188827 (owner: 10Majavah)
[07:32:45] <wikibugs>	 (03CR) 10Majavah: [V:03+1 C:03+2] P:toolforge::checker: Remove absent checks [puppet] - 10https://gerrit.wikimedia.org/r/1188827 (owner: 10Majavah)
[07:34:33] <wikibugs>	 (03CR) 10Majavah: [V:03+2 C:03+2] "ignoring typos false positive" [puppet] - 10https://gerrit.wikimedia.org/r/1188828 (owner: 10Majavah)
[07:36:12] <wikibugs>	 (03PS1) 10Majavah: openstack: Drop obsolete linuxbridge config files [puppet] - 10https://gerrit.wikimedia.org/r/1189393
[07:36:12] <wikibugs>	 (03PS1) 10Majavah: P:openstack: nova: Drop obsolete settings [puppet] - 10https://gerrit.wikimedia.org/r/1189394
[07:38:03] <wikibugs>	 (03CR) 10Majavah: [V:03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/6981/console" [puppet] - 10https://gerrit.wikimedia.org/r/1189394 (owner: 10Majavah)
[07:40:34] <wikibugs>	 (03PS1) 10Majavah: O:aptly::server: Remove unused role [puppet] - 10https://gerrit.wikimedia.org/r/1189395 (https://phabricator.wikimedia.org/T399076)
[07:40:51] <jinxer-wm>	 FIRING: [3x] CoreRouterInterfaceDown: Core router interface down - cr1-esams:xe-0/0/8 () - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down  - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
[07:41:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:42:05] <wikibugs>	 (03CR) 10Majavah: [C:03+2] O:aptly::server: Remove unused role [puppet] - 10https://gerrit.wikimedia.org/r/1189395 (https://phabricator.wikimedia.org/T399076) (owner: 10Majavah)
[07:42:18] <icinga-wm>	 PROBLEM - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[07:44:00] <jinxer-wm>	 FIRING: CertAlmostExpired: Certificate for service data-gateway-staging:30443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#data-gateway-staging:30443 - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
[07:45:02] <wikibugs>	 (03CR) 10Slyngshede: "Right now the nda group isn't sync'ed because it's not listed as one of the groups Netbox needs. We did talk about it at a previous Infras" [puppet] - 10https://gerrit.wikimedia.org/r/1189142 (https://phabricator.wikimedia.org/T404494) (owner: 10Slyngshede)
[07:46:41] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] Release version 0.1.13 [software/bitu] - 10https://gerrit.wikimedia.org/r/1189390 (https://phabricator.wikimedia.org/T403691) (owner: 10Slyngshede)
[07:48:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Point webproxy in eqiad to install1005 [dns] - 10https://gerrit.wikimedia.org/r/1189173 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[07:48:39] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm but we should closely monitor metrics and user reports. I recall cloning repos over https opens several connections. So we should mak" [puppet] - 10https://gerrit.wikimedia.org/r/1189386 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb)
[07:48:50] <logmsgbot>	 !log jmm@dns1004 START - running authdns-update
[07:49:20] <wikibugs>	 (03Merged) 10jenkins-bot: Release version 0.1.13 [software/bitu] - 10https://gerrit.wikimedia.org/r/1189390 (https://phabricator.wikimedia.org/T403691) (owner: 10Slyngshede)
[07:50:02] <logmsgbot>	 !log jmm@dns1004 END - running authdns-update
[07:50:41] <wikibugs>	 (03CR) 10Stevemunene: dse-k8s:Enable CSI and the Ceph CSI plugin on dse-k8s-codfw (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1188754 (https://phabricator.wikimedia.org/T404576) (owner: 10Stevemunene)
[07:55:53] <wikibugs>	 (03CR) 10Arnaudb: [C:03+2] "100%! I'll progressively rollout from spare to primary with puppet-agent disabled" [puppet] - 10https://gerrit.wikimedia.org/r/1189386 (https://phabricator.wikimedia.org/T402611) (owner: 10Arnaudb)
[07:57:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] Update the proxies used by cloudcumin to install1005 [puppet] - 10https://gerrit.wikimedia.org/r/1189171 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:08:37] <wikibugs>	 (03CR) 10Elukey: [C:03+1] homer: Update the DHCP server in eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/1189389 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:09:21] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+2] homer: Update the DHCP server in eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/1189389 (https://phabricator.wikimedia.org/T396487) (owner: 10Muehlenhoff)
[08:12:56] <wikibugs>	 (03PS1) 10Slyngshede: IDM: Failover for 0.1.13 upgrade [dns] - 10https://gerrit.wikimedia.org/r/1189434
[08:19:25] <wikibugs>	 07sre-alert-triage, 06Infrastructure-Foundations, 10netops: Alert in need of triage: SwitchCoreInterfaceDown (instance ssw1-f1-codfw:9804) - https://phabricator.wikimedia.org/T404946 (10LSobanski) 03NEW
[08:21:22] <wikibugs>	 (03CR) 10Novem Linguae: "As a recently added volunteer NDA, I find these IDP-protected tools a lot like a second Wikitech. There's great info in some of them, and " [puppet] - 10https://gerrit.wikimedia.org/r/1189142 (https://phabricator.wikimedia.org/T404494) (owner: 10Slyngshede)
[08:25:51] <wikibugs>	 (03CR) 10Elukey: "I am super sorry to review this only now, thanks a lot for the patch :) I totally understand that this is a poc and it needs more refineme" [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/1173425 (https://phabricator.wikimedia.org/T397696) (owner: 10CDanis)
[08:31:00] <wikibugs>	 (03PS1) 10Gmodena: admin: add sk-ssh-ed25519 key for gmodena [puppet] - 10https://gerrit.wikimedia.org/r/1189435
[08:32:18] <icinga-wm>	 RECOVERY - Check unit status of httpbb_kubernetes_mw-parsoid_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-parsoid_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
[08:33:14] <wikibugs>	 (03CR) 10Slyngshede: [C:03+2] IDM: Failover for 0.1.13 upgrade [dns] - 10https://gerrit.wikimedia.org/r/1189434 (owner: 10Slyngshede)
[08:33:35] <logmsgbot>	 !log slyngshede@dns1004 START - running authdns-update
[08:34:53] <logmsgbot>	 !log slyngshede@dns1004 END - running authdns-update
[08:36:26] <jinxer-wm>	 FIRING: [2x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed