Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 1378 items:

2025-11-03 00:03:27 <jinxer-wm> FIRING: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 00:08:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 00:15:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 00:38:09 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1200697'
2025-11-03 00:38:09 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1200697 (owner: ''TrainBranchBot)'
2025-11-03 00:54:15 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - ''https://gerrit.wikimedia.org/r/1200697 (owner: ''TrainBranchBot)'
2025-11-03 00:57:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 01:00:41 <logmsgbot> !log mwpresync@deploy2002 Started scap build-images: Publishing wmf/next image
2025-11-03 01:07:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 01:08:03 <wikibugs> ('PS1) ''TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1200709'
2025-11-03 01:08:03 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1200709 (owner: ''TrainBranchBot)'
2025-11-03 01:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 01:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 01:15:45 <logmsgbot> !log mwpresync@deploy2002 Finished scap build-images: Publishing wmf/next image (duration: 15m 04s)
2025-11-03 01:30:00 <wikibugs> ('Merged) ''jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - ''https://gerrit.wikimedia.org/r/1200709 (owner: ''TrainBranchBot)'
2025-11-03 01:33:52 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 01:38:52 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 01:47:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 01:47:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 02:37:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 02:42:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 02:47:22 <wikibugs> ('PS1) ''Pppery: Remove extended autoconfirmed time for Tor on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200743 (https://phabricator.wikimedia.org/T405080)'
2025-11-03 02:47:27 <jinxer-wm> RESOLVED: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 02:48:09 <wikibugs> ('CR) ''CI reject: [V:''-1] Remove extended autoconfirmed time for Tor on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200743 (https://phabricator.wikimedia.org/T405080) (owner: ''Pppery)'
2025-11-03 02:48:39 <wikibugs> ('PS2) ''Pppery: Remove extended autoconfirmed time for Tor on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200743 (https://phabricator.wikimedia.org/T409022)'
2025-11-03 02:49:22 <wikibugs> ('PS3) ''Pppery: Remove extended autoconfirmed time for Tor on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200743 (https://phabricator.wikimedia.org/T409022)'
2025-11-03 02:49:27 <wikibugs> ('CR) ''CI reject: [V:''-1] Remove extended autoconfirmed time for Tor on enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200743 (https://phabricator.wikimedia.org/T409022) (owner: ''Pppery)'
2025-11-03 02:50:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 02:55:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 02:55:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 03:01:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 03:08:56 <wikibugs> ('PS1) ''MusikAnimal: AbstractRenderer: ensure OutputPage::setDisplayTitle() gets passed safe HTML [extensions/CommunityRequests] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200746'
2025-11-03 03:10:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 03:21:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 03:26:05 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by musikanimal@deploy2002 using scap backport" [extensions/CommunityRequests] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200746 (owner: ''MusikAnimal)'
2025-11-03 03:27:04 <wikibugs> ('Merged) ''jenkins-bot: AbstractRenderer: ensure OutputPage::setDisplayTitle() gets passed safe HTML [extensions/CommunityRequests] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200746 (owner: ''MusikAnimal)'
2025-11-03 03:27:40 <logmsgbot> !log musikanimal@deploy2002 Started scap sync-world: Backport for [[gerrit:1200746|AbstractRenderer: ensure OutputPage::setDisplayTitle() gets passed safe HTML]]
2025-11-03 03:34:01 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 03:52:31 <logmsgbot> !log musikanimal@deploy2002 musikanimal: Backport for [[gerrit:1200746|AbstractRenderer: ensure OutputPage::setDisplayTitle() gets passed safe HTML]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 03:53:27 <logmsgbot> !log musikanimal@deploy2002 musikanimal: Continuing with sync
2025-11-03 04:07:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:07:35 <logmsgbot> !log musikanimal@deploy2002 Finished scap sync-world: Backport for [[gerrit:1200746|AbstractRenderer: ensure OutputPage::setDisplayTitle() gets passed safe HTML]] (duration: 39m 55s)
2025-11-03 04:12:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:12:36 <wikibugs> 'SRE, ''Incident Tooling, ''Traffic: ncredir redirects for status.wiki* --> status.wikimedia.org - https://phabricator.wikimedia.org/T318804#11334175 (''Pppery)'
2025-11-03 04:21:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:23:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 04:26:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:28:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 04:34:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:35:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 04:49:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 04:55:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 04:59:56 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334176 (''Papaul)'
2025-11-03 05:00:25 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:01:21 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30031 bytes in 6.268 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:04:17 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11334177 (''Papaul) @cmooney i update all the IP's to match the other POP sites. I will be re-running the configuration and validation sometimes this week in m...'
2025-11-03 05:06:25 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:07:19 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30037 bytes in 3.183 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:08:52 <jinxer-wm> FIRING: [3x] JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 05:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 05:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 05:16:25 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:19:21 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 4.976 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:25:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:26:17 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30030 bytes in 0.375 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 05:33:52 <jinxer-wm> FIRING: [3x] JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 05:35:11 <wikibugs> ('CR) ''Fabfur: [C:''+1] P:cache::varnish::frontend: render known-client rate limit VCL (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198182 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
2025-11-03 05:36:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 05:38:52 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 05:41:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 05:49:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 05:51:37 <wikibugs> 'SRE: FY 25/26 WE 5.4.3: CDN (text) filtering rationalization - https://phabricator.wikimedia.org/T398161#11334184 (''Joe) ''Open''Resolved'
2025-11-03 05:55:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 06:04:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 06:05:07 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 06:06:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:07:19 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 3.404 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:10:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:11:19 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 3.626 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:11:45 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
2025-11-03 06:14:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:15:06 <wikibugs> 'SRE, ''Hiddenparma, ''Traffic: Collect known client fingerprints for common libraries - https://phabricator.wikimedia.org/T409024 (''Joe) ''NEW'
2025-11-03 06:15:21 <wikibugs> 'SRE, ''Hiddenparma, ''Traffic: Collect known client fingerprints for common libraries - https://phabricator.wikimedia.org/T409024#11334201 (''Joe) p:''Triage''Medium'
2025-11-03 06:16:25 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 7.718 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:18:50 <wikibugs> ('PS1) ''Marostegui: mariadb: Move db1231 to s7 [puppet] - ''https://gerrit.wikimedia.org/r/1200751 (https://phabricator.wikimedia.org/T408829)'
2025-11-03 06:19:08 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1231 T408829', diff saved to https://phabricator.wikimedia.org/P84568 and previous config saved to /var/cache/conftool/dbconfig/20251103-061906-marostegui.json
2025-11-03 06:19:14 <stashbot> T408829: Move one s6 eqiad host to s7 - https://phabricator.wikimedia.org/T408829
2025-11-03 06:20:05 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[1174,1231].eqiad.wmnet with reason: Moving db1231 to s7
2025-11-03 06:20:53 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.clone of db1174.eqiad.wmnet onto db1231.eqiad.wmnet
2025-11-03 06:20:57 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.depool db1174 - Depool db1174.eqiad.wmnet to then clone it to db1231.eqiad.wmnet - marostegui@cumin1003
2025-11-03 06:21:20 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1174 - Depool db1174.eqiad.wmnet to then clone it to db1231.eqiad.wmnet - marostegui@cumin1003
2025-11-03 06:21:30 <wikibugs> ('CR) ''Marostegui: [C:''+2] mariadb: Move db1231 to s7 [puppet] - ''https://gerrit.wikimedia.org/r/1200751 (https://phabricator.wikimedia.org/T408829) (owner: ''Marostegui)'
2025-11-03 06:22:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:23:19 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30033 bytes in 3.270 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 06:25:37 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
2025-11-03 06:25:56 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
2025-11-03 06:26:04 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1156 (T407997)', diff saved to https://phabricator.wikimedia.org/P84570 and previous config saved to /var/cache/conftool/dbconfig/20251103-062603-marostegui.json
2025-11-03 06:26:06 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 06:27:19 <wikibugs> ('PS1) ''Marostegui: db2174: Migration to MariaDB 10.11 [puppet] - ''https://gerrit.wikimedia.org/r/1200752 (https://phabricator.wikimedia.org/T407463)'
2025-11-03 06:28:20 <wikibugs> ('CR) ''Marostegui: [C:''+2] db2174: Migration to MariaDB 10.11 [puppet] - ''https://gerrit.wikimedia.org/r/1200752 (https://phabricator.wikimedia.org/T407463) (owner: ''Marostegui)'
2025-11-03 06:29:16 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
2025-11-03 06:29:20 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db2174 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84571 and previous config saved to /var/cache/conftool/dbconfig/20251103-062919-marostegui.json
2025-11-03 06:36:55 <wikibugs> ('PS9) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 06:37:42 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db2174 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84572 and previous config saved to /var/cache/conftool/dbconfig/20251103-063742-root.json
2025-11-03 06:37:56 <wikibugs> ('CR) ''Fabfur: P:cache:haproxy: introduce ua classes (''2 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 06:38:32 <marostegui> !log Drop afl_ip related triggers from s2 T408780
2025-11-03 06:38:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 06:38:34 <stashbot> T408780: Drop abuse_filter_log trigger for afl_ip column - https://phabricator.wikimedia.org/T408780
2025-11-03 06:38:39 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T407997)', diff saved to https://phabricator.wikimedia.org/P84573 and previous config saved to /var/cache/conftool/dbconfig/20251103-063838-marostegui.json
2025-11-03 06:38:41 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 06:41:51 <wikibugs> 'SRE, ''Hiddenparma, ''Traffic: Collect known client fingerprints for common libraries and browsers - https://phabricator.wikimedia.org/T409024#11334225 (''Joe)'
2025-11-03 06:52:48 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db2174 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84574 and previous config saved to /var/cache/conftool/dbconfig/20251103-065248-root.json
2025-11-03 06:53:47 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P84575 and previous config saved to /var/cache/conftool/dbconfig/20251103-065346-marostegui.json
2025-11-03 06:57:07 <wikibugs> ('PS1) ''Marostegui: db1177: Migration to MariaDB 10.11 [puppet] - ''https://gerrit.wikimedia.org/r/1200753'
2025-11-03 06:57:40 <wikibugs> ('CR) ''Marostegui: [C:''+2] db1177: Migration to MariaDB 10.11 [puppet] - ''https://gerrit.wikimedia.org/r/1200753 (owner: ''Marostegui)'
2025-11-03 06:58:05 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
2025-11-03 06:58:09 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depool db1177 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84576 and previous config saved to /var/cache/conftool/dbconfig/20251103-065808-marostegui.json
2025-11-03 07:06:13 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84577 and previous config saved to /var/cache/conftool/dbconfig/20251103-070612-root.json
2025-11-03 07:07:57 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db2174 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84578 and previous config saved to /var/cache/conftool/dbconfig/20251103-070753-root.json
2025-11-03 07:08:58 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P84579 and previous config saved to /var/cache/conftool/dbconfig/20251103-070853-marostegui.json
2025-11-03 07:15:04 <wikibugs> ('PS2) ''Ryan Kemper: wdqs: detect blazegraph deadlock [alerts] - ''https://gerrit.wikimedia.org/r/1198161 (https://phabricator.wikimedia.org/T389859)'
2025-11-03 07:16:23 <wikibugs> ('PS1) ''Marostegui: installserver: Do not reimage es1054 [puppet] - ''https://gerrit.wikimedia.org/r/1200754'
2025-11-03 07:18:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 07:18:40 <wikibugs> ('CR) ''Marostegui: [C:''+2] installserver: Do not reimage es1054 [puppet] - ''https://gerrit.wikimedia.org/r/1200754 (owner: ''Marostegui)'
2025-11-03 07:21:19 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84580 and previous config saved to /var/cache/conftool/dbconfig/20251103-072118-root.json
2025-11-03 07:23:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db2174 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84581 and previous config saved to /var/cache/conftool/dbconfig/20251103-072303-root.json
2025-11-03 07:23:34 <wikibugs> ('PS1) ''Marostegui: instances.yaml: Remove es1034 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1200755 (https://phabricator.wikimedia.org/T409025)'
2025-11-03 07:24:08 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1156 (T407997)', diff saved to https://phabricator.wikimedia.org/P84582 and previous config saved to /var/cache/conftool/dbconfig/20251103-072405-marostegui.json
2025-11-03 07:24:16 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 07:24:23 <wikibugs> ('CR) ''Marostegui: [C:''+2] instances.yaml: Remove es1034 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1200755 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 07:24:24 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
2025-11-03 07:24:33 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1162 (T407997)', diff saved to https://phabricator.wikimedia.org/P84583 and previous config saved to /var/cache/conftool/dbconfig/20251103-072431-marostegui.json
2025-11-03 07:25:28 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Remove es1034 from dbctl T409025', diff saved to https://phabricator.wikimedia.org/P84584 and previous config saved to /var/cache/conftool/dbconfig/20251103-072527-marostegui.json
2025-11-03 07:25:34 <stashbot> T409025: decommission es1034.eqiad.wmnet - https://phabricator.wikimedia.org/T409025
2025-11-03 07:26:49 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T407997)', diff saved to https://phabricator.wikimedia.org/P84585 and previous config saved to /var/cache/conftool/dbconfig/20251103-072647-marostegui.json
2025-11-03 07:27:32 <wikibugs> ('PS1) ''Marostegui: backup1013.cnf.erb: Replace es1034 with es1057 [puppet] - ''https://gerrit.wikimedia.org/r/1200756 (https://phabricator.wikimedia.org/T409025)'
2025-11-03 07:28:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 07:29:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 07:29:49 <wikibugs> ('CR) ''Marostegui: "Jaime, this is a NOOP so I am merging it without waiting for you. es1057 was cloned from es1034, but neither of them have the dump user. D" [puppet] - ''https://gerrit.wikimedia.org/r/1200756 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 07:29:52 <wikibugs> ('CR) ''Marostegui: [C:''+2] backup1013.cnf.erb: Replace es1034 with es1057 [puppet] - ''https://gerrit.wikimedia.org/r/1200756 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 07:35:23 <wikibugs> ('CR) ''Marostegui: [C:''+2] "Just checked, none of the RO (es1-es5) section have the dump user. If this is expected, then nothing else to be done here. If it is not, " [puppet] - ''https://gerrit.wikimedia.org/r/1200756 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 07:36:26 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84586 and previous config saved to /var/cache/conftool/dbconfig/20251103-073624-root.json
2025-11-03 07:39:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 07:40:37 <wikibugs> ('PS1) ''Marostegui: es1034: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1200759 (https://phabricator.wikimedia.org/T409025)'
2025-11-03 07:41:56 <wikibugs> ('CR) ''Marostegui: [C:''+2] es1034: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1200759 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 07:42:00 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P84587 and previous config saved to /var/cache/conftool/dbconfig/20251103-074156-marostegui.json
2025-11-03 07:51:32 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84588 and previous config saved to /var/cache/conftool/dbconfig/20251103-075130-root.json
2025-11-03 07:57:07 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P84589 and previous config saved to /var/cache/conftool/dbconfig/20251103-075706-marostegui.json
2025-11-03 07:57:42 <logmsgbot> marostegui@cumin1003 clone (PID 2864179) is awaiting input
2025-11-03 07:57:47 <wikibugs> 'ops-eqiad, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030 (''Marostegui) ''NEW'
2025-11-03 07:58:32 <wikibugs> 'ops-eqiad, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11334341 (''Marostegui) p:''Triage''Medium'
2025-11-03 08:00:00 <wikibugs> 'SRE, ''SRE-swift-storage, ''Infrastructure-Foundations: Key packages missing from trixie-wikimedia - https://phabricator.wikimedia.org/T407513#11334342 (''MoritzMuehlenhoff) >>! In T407513#11332007, @LSobanski wrote: > To avoid confusion I believe the above statement should say "now available" instead of "...'
2025-11-03 08:00:04 <jouncebot> Amir1, Urbanecm, and awight: #bothumor My software never has bugs. It just develops random features. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T0800).
2025-11-03 08:00:05 <jouncebot> Superpes: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2025-11-03 08:01:34 <Superpes> o/
2025-11-03 08:07:23 <Superpes> I'll probably reschedule the patch for the next window since, as every Monday, the window will be empty :P
2025-11-03 08:09:30 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Re-enable monitoring for maps/bookworm [puppet] - ''https://gerrit.wikimedia.org/r/1200030 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
2025-11-03 08:12:15 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1162 (T407997)', diff saved to https://phabricator.wikimedia.org/P84590 and previous config saved to /var/cache/conftool/dbconfig/20251103-081214-marostegui.json
2025-11-03 08:12:18 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 08:12:31 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
2025-11-03 08:12:39 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1182 (T407997)', diff saved to https://phabricator.wikimedia.org/P84591 and previous config saved to /var/cache/conftool/dbconfig/20251103-081238-marostegui.json
2025-11-03 08:20:42 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db1174 gradually with 4 steps - Pool db1174.eqiad.wmnet in after cloning
2025-11-03 08:22:51 <wikibugs> ('PS1) ''Marostegui: db1231: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1200872 (https://phabricator.wikimedia.org/T408829)'
2025-11-03 08:23:27 <icinga-wm> PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 08:24:09 <wikibugs> ('PS2) ''Marostegui: db1231: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1200872 (https://phabricator.wikimedia.org/T408829)'
2025-11-03 08:24:43 <icinga-wm> PROBLEM - Docker registry HTTPS interface on registry1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Docker
2025-11-03 08:24:58 <wikibugs> ('CR) ''Marostegui: [C:''+2] db1231: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/1200872 (https://phabricator.wikimedia.org/T408829) (owner: ''Marostegui)'
2025-11-03 08:25:44 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 1%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84593 and previous config saved to /var/cache/conftool/dbconfig/20251103-082543-root.json
2025-11-03 08:25:50 <icinga-wm> PROBLEM - very high load average likely xfs on ms-be1074 is CRITICAL: CRITICAL - load average: 160.95, 108.51, 54.98 https://wikitech.wikimedia.org/wiki/Swift
2025-11-03 08:27:46 <icinga-wm> PROBLEM - very high load average likely xfs on ms-be1074 is CRITICAL: CRITICAL - load average: 142.36, 117.39, 64.13 https://wikitech.wikimedia.org/wiki/Swift
2025-11-03 08:28:20 <icinga-wm> RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 30032 bytes in 0.648 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static
2025-11-03 08:29:10 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T407997)', diff saved to https://phabricator.wikimedia.org/P84594 and previous config saved to /var/cache/conftool/dbconfig/20251103-082909-marostegui.json
2025-11-03 08:29:15 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 08:32:46 <icinga-wm> RECOVERY - very high load average likely xfs on ms-be1074 is OK: OK - load average: 16.79, 68.62, 59.75 https://wikitech.wikimedia.org/wiki/Swift
2025-11-03 08:34:19 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Add an alert for Ganeti CA expiry [alerts] - ''https://gerrit.wikimedia.org/r/1199809 (https://phabricator.wikimedia.org/T382902) (owner: ''Muehlenhoff)'
2025-11-03 08:40:55 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 5%: After moving it to s7', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20251103-084049-root.json
2025-11-03 08:40:57 <godog> !log silence wikitech-static icinga alert for a couple of weeks - T409029
2025-11-03 08:41:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 08:41:12 <stashbot> T409029: Flapping wikitech-static icinga alert - https://phabricator.wikimedia.org/T409029
2025-11-03 08:44:18 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P84596 and previous config saved to /var/cache/conftool/dbconfig/20251103-084417-marostegui.json
2025-11-03 08:45:49 <icinga-wm> RECOVERY - Docker registry HTTPS interface on registry1004 is OK: HTTP OK: HTTP/1.1 200 OK - 3746 bytes in 0.203 second response time https://wikitech.wikimedia.org/wiki/Docker
2025-11-03 08:51:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 08:56:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84598 and previous config saved to /var/cache/conftool/dbconfig/20251103-085600-root.json
2025-11-03 08:56:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 08:56:33 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 08:59:28 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P84599 and previous config saved to /var/cache/conftool/dbconfig/20251103-085925-marostegui.json
2025-11-03 08:59:59 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix uncommitted changes for mwdebug2002 - elukey@cumin1003"
2025-11-03 09:00:03 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix uncommitted changes for mwdebug2002 - elukey@cumin1003"
2025-11-03 09:00:03 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 09:02:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 09:03:27 <jinxer-wm> FIRING: ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 09:06:09 <logmsgbot> !log marostegui@cumin1003 END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1174 gradually with 4 steps - Pool db1174.eqiad.wmnet in after cloning
2025-11-03 09:08:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 09:08:45 <logmsgbot> !log marostegui@cumin1003 START - Cookbook sre.mysql.pool db1174 gradually with 4 steps - Pool db1174.eqiad.wmnet in after cloning
2025-11-03 09:08:52 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1174 gradually with 4 steps - Pool db1174.eqiad.wmnet in after cloning
2025-11-03 09:08:58 <logmsgbot> !log marostegui@cumin1003 END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1174.eqiad.wmnet onto db1231.eqiad.wmnet
2025-11-03 09:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 09:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 09:11:13 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 15%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84600 and previous config saved to /var/cache/conftool/dbconfig/20251103-091109-root.json
2025-11-03 09:11:46 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 09:14:41 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1182 (T407997)', diff saved to https://phabricator.wikimedia.org/P84601 and previous config saved to /var/cache/conftool/dbconfig/20251103-091435-marostegui.json
2025-11-03 09:14:45 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
2025-11-03 09:14:55 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1188 (T407997)', diff saved to https://phabricator.wikimedia.org/P84602 and previous config saved to /var/cache/conftool/dbconfig/20251103-091452-marostegui.json
2025-11-03 09:14:55 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 09:15:23 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 09:17:11 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T407997)', diff saved to https://phabricator.wikimedia.org/P84603 and previous config saved to /var/cache/conftool/dbconfig/20251103-091708-marostegui.json
2025-11-03 09:22:35 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Q4:rack/setup/install sretest2010 Config J 1P test host - https://phabricator.wikimedia.org/T394357#11334494 (''elukey) Really interesting, I retried today a reimage and got a "no media present" when trying to pxe/http boot. Then I checked the Boot order and the wrong UEFI netwo...'
2025-11-03 09:25:50 <wikibugs> ('PS1) ''Esanders: Freeze LiquidThreads on huwiki and svwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200876 (https://phabricator.wikimedia.org/T406026)'
2025-11-03 09:26:21 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84604 and previous config saved to /var/cache/conftool/dbconfig/20251103-092618-root.json
2025-11-03 09:29:14 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200876 (https://phabricator.wikimedia.org/T406026) (owner: ''Esanders)'
2025-11-03 09:29:22 <wikibugs> ('CR) ''Clément Goubert: [C:''+2] Route "/api/rest_v1/" requests with "?spec" query to the rest gateway [puppet] - ''https://gerrit.wikimedia.org/r/1199886 (https://phabricator.wikimedia.org/T397203) (owner: ''Aaron Schulz)'
2025-11-03 09:31:02 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdf) failed in thanos-be2008 - https://phabricator.wikimedia.org/T409036 (''MatthewVernon) ''NEW'
2025-11-03 09:31:34 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdf) failed in thanos-be2008 - https://phabricator.wikimedia.org/T409036#11334535 (''MatthewVernon) p:''Triage''High'
2025-11-03 09:32:19 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P84605 and previous config saved to /var/cache/conftool/dbconfig/20251103-093218-marostegui.json
2025-11-03 09:33:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 09:34:01 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 09:35:14 <Lucas_WMDE> is there any way to get information / output about an mwscript-k8s job after it’s been cleaned up? (context: https://phabricator.wikimedia.org/T398177#11334550)
2025-11-03 09:35:24 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 09:35:26 <claime> Lucas_WMDE: logstash
2025-11-03 09:35:27 <Lucas_WMDE> like, maybe it gets cleaned up from k8s but is still in logstash or somewhere else?
2025-11-03 09:35:30 <Lucas_WMDE> ooh
2025-11-03 09:36:34 <Lucas_WMDE> nice, Kubernetes Events has something
2025-11-03 09:37:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 09:37:13 <claime> Lucas_WMDE: tell me if you need help, I still have about 1h free :)
2025-11-03 09:37:31 <Lucas_WMDE> claime: so far I have https://logstash.wikimedia.org/goto/d4e84efcce342199642dede2a735d8be and am trying to make sense of it ^^
2025-11-03 09:37:46 <logmsgbot> !log elukey@deploy2002 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
2025-11-03 09:37:48 <Lucas_WMDE> which looks like it had died within half a day of me launching it
2025-11-03 09:37:57 <Lucas_WMDE> not sure if I can see the error reason anywhere
2025-11-03 09:38:11 <Lucas_WMDE> like, if it was another oom sigkill or something else
2025-11-03 09:38:52 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 09:38:56 <logmsgbot> !log elukey@deploy2002 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
2025-11-03 09:39:32 <logmsgbot> !log elukey@deploy2002 helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
2025-11-03 09:39:46 <claime> Hmm.
2025-11-03 09:40:09 <logmsgbot> !log elukey@deploy2002 helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
2025-11-03 09:40:13 <Lucas_WMDE> oooh, https://logstash.wikimedia.org/goto/607cd49141903a654ac2a97f06710486 looks a lot better
2025-11-03 09:40:16 <logmsgbot> !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 09:40:21 <Lucas_WMDE> (App Logs instead of Kubernetes Events)
2025-11-03 09:40:31 <Lucas_WMDE> that’s… the full output? :o
2025-11-03 09:40:34 <Lucas_WMDE> (until it died anyway)
2025-11-03 09:41:10 <claime> Lucas_WMDE: yeah
2025-11-03 09:41:19 <claime> full output, one line per message becaused it's stupid
2025-11-03 09:41:27 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 30%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84606 and previous config saved to /var/cache/conftool/dbconfig/20251103-094126-root.json
2025-11-03 09:42:16 <Lucas_WMDE> nice
2025-11-03 09:42:46 <Lucas_WMDE> and can I get the error / failure status somewhere? I assume it must have died for some reason that I can’t see yet
2025-11-03 09:43:36 <Lucas_WMDE> also, “Logs are retained in Logstash for a maximum of 90 days by default” (https://wikitech.wikimedia.org/wiki/Logstash) so I should pull the logs out of there later ^^
2025-11-03 09:43:38 <wikibugs> ('PS2) ''Clément Goubert: trafficserver: action api to rest-gateway group1 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198932 (https://phabricator.wikimedia.org/T408223)'
2025-11-03 09:45:44 <claime> Lucas_WMDE: Hmm for the failure status I'm not sure, I'll take a look
2025-11-03 09:45:49 <Lucas_WMDE> ok, thanks!
2025-11-03 09:46:09 <Lucas_WMDE> then I’ll hold off on commenting on the task for a bit :)
2025-11-03 09:47:26 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P84607 and previous config saved to /var/cache/conftool/dbconfig/20251103-094726-marostegui.json
2025-11-03 09:48:06 <claime> Lucas_WMDE: I'm not finding it
2025-11-03 09:48:38 <wikibugs> ('PS1) ''Marostegui: db1178: Remove RBR [puppet] - ''https://gerrit.wikimedia.org/r/1200961'
2025-11-03 09:48:42 <Lucas_WMDE> hm, ok
2025-11-03 09:49:10 <Lucas_WMDE> then I guess I’ll just write that OOM feels like a possibility
2025-11-03 09:49:10 <wikibugs> ('CR) ''Marostegui: [C:''+2] db1178: Remove RBR [puppet] - ''https://gerrit.wikimedia.org/r/1200961 (owner: ''Marostegui)'
2025-11-03 09:49:17 <Lucas_WMDE> (since any PHP-level error should be visible in the logs)
2025-11-03 09:49:19 <claime> Lucas_WMDE: I'm checking grafana to see if I can confirm that
2025-11-03 09:49:44 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040 (''MatthewVernon) ''NEW'
2025-11-03 09:50:09 <moritzm> !log installing intel-microcode security updates
2025-11-03 09:50:10 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 09:50:39 <Lucas_WMDE> interesting idea https://grafana.wikimedia.org/goto/HZJHdSzDg?orgId=1
2025-11-03 09:50:44 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11334669 (''MatthewVernon) p:''Triage''High'
2025-11-03 09:51:21 <Lucas_WMDE> that doesn’t look super OOMy
2025-11-03 09:51:28 <Lucas_WMDE> (maybe you have a better grafana dashboard)
2025-11-03 09:51:30 <claime> Nope
2025-11-03 09:51:36 <claime> (to both)
2025-11-03 09:51:59 <Lucas_WMDE> I guess I could just try an enwiki dry run then, see if it crashes again
2025-11-03 09:52:08 <claime> Yeah that would be the way to go
2025-11-03 09:52:19 <Lucas_WMDE> alright, then I’ll comment on the task
2025-11-03 09:52:21 <Lucas_WMDE> thanks for your help! \o/
2025-11-03 09:52:24 <claime> I'll make a note somewhere to see if we can record failure states in logstash *somehow*
2025-11-03 09:54:05 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Q4:rack/setup/install sretest2010 Config J 1P test host - https://phabricator.wikimedia.org/T394357#11334715 (''elukey) In theory the HttpBootPolicy should hit the right HTTP boot after some tries without stopping at the first failure: ` ['(B199/D0/F0) UEFI HTTP IPv4 Intel(R) I...'
2025-11-03 09:56:10 <wikibugs> ('CR) ''Clément Goubert: [C:''+2] trafficserver: action api to rest-gateway group1 10% [puppet] - ''https://gerrit.wikimedia.org/r/1198932 (https://phabricator.wikimedia.org/T408223) (owner: ''Clément Goubert)'
2025-11-03 09:56:33 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 50%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84608 and previous config saved to /var/cache/conftool/dbconfig/20251103-095632-root.json
2025-11-03 09:58:24 <Lucas_WMDE> hm, if I narrow the date range, then https://grafana.wikimedia.org/goto/SYiEOSzDg?orgId=1 shows some suspicious spikes in the memory usage
2025-11-03 09:58:45 <Lucas_WMDE> it already came *very* close to the limit earlier (peaked at 1.13 out of 1.17 GiB limit)
2025-11-03 09:59:18 <claime> Lucas_WMDE: Hah, sampling :D
2025-11-03 09:59:22 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11334728 (''LSobanski)'
2025-11-03 09:59:27 <Lucas_WMDE> un ts un ts un ts un ts
2025-11-03 09:59:31 <claime> but it's not high at the moment of the cut
2025-11-03 09:59:34 <Lucas_WMDE> yeah
2025-11-03 09:59:52 <Lucas_WMDE> and it doesn’t feel like it could’ve spiked past the limit before even a single sample was recorded
2025-11-03 09:59:53 <claime> Although it could have spiked hard and fast enough to get wrecked and the metrics not scraped
2025-11-03 09:59:58 <Lucas_WMDE> hah
2025-11-03 10:00:06 <claime> I think it's 1m interval for the scrape
2025-11-03 10:00:14 <Lucas_WMDE> hm
2025-11-03 10:00:42 <Lucas_WMDE> yeah ok the previous spike hit its plateau within just over a minute apparently
2025-11-03 10:01:01 <claime> Honestly I would try to repro
2025-11-03 10:01:07 <claime> It's probably the easiest
2025-11-03 10:01:39 <Lucas_WMDE> alright
2025-11-03 10:01:49 <Lucas_WMDE> but I’ll leave that to MatmaRex first, it’s his maintenance script ^^
2025-11-03 10:02:34 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1188 (T407997)', diff saved to https://phabricator.wikimedia.org/P84609 and previous config saved to /var/cache/conftool/dbconfig/20251103-100233-marostegui.json
2025-11-03 10:02:37 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 10:02:50 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
2025-11-03 10:02:57 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11334741 (''Geagea) I've just received notification from October 29 (6 days).'
2025-11-03 10:02:58 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1197 (T407997)', diff saved to https://phabricator.wikimedia.org/P84610 and previous config saved to /var/cache/conftool/dbconfig/20251103-100257-marostegui.json
2025-11-03 10:03:32 <Lucas_WMDE> commented, feel free to unsubscribe again if you like ;)
2025-11-03 10:04:07 <wikibugs> ('PS1) ''David Caro: toolforge: add elasticsearch metrics gathering [puppet] - ''https://gerrit.wikimedia.org/r/1201011 (https://phabricator.wikimedia.org/T409047)'
2025-11-03 10:04:20 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11334768 (''LSobanski) I just checked and the junk queue is close to 500k at this time.'
2025-11-03 10:04:58 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 10:05:12 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T407997)', diff saved to https://phabricator.wikimedia.org/P84611 and previous config saved to /var/cache/conftool/dbconfig/20251103-100511-marostegui.json
2025-11-03 10:07:33 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 10:07:51 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11334788 (''LSobanski) Here's the increase in disk space and inode usage since October 27th: {F69754786}'
2025-11-03 10:08:13 <wikibugs> ('CR) ''David Caro: [V:''+1] "Tested in tools, all endpoints scraping ok https://phabricator.wikimedia.org/T409047#11334799"; [puppet] - ''https://gerrit.wikimedia.org/r/1201011 (https://phabricator.wikimedia.org/T409047) (owner: ''David Caro)'
2025-11-03 10:11:39 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84612 and previous config saved to /var/cache/conftool/dbconfig/20251103-101138-root.json
2025-11-03 10:16:08 <wikibugs> ('CR) ''Jcrespo: [C:''+1] backup1013.cnf.erb: Replace es1034 with es1057 [puppet] - ''https://gerrit.wikimedia.org/r/1200756 (https://phabricator.wikimedia.org/T409025) (owner: ''Marostegui)'
2025-11-03 10:17:57 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 10:19:35 <wikibugs> ('PS1) ''Muehlenhoff: Limit microcode installation to Bullseye [puppet] - ''https://gerrit.wikimedia.org/r/1201014'
2025-11-03 10:20:13 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1201014 (owner: ''Muehlenhoff)'
2025-11-03 10:20:20 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P84614 and previous config saved to /var/cache/conftool/dbconfig/20251103-102018-marostegui.json
2025-11-03 10:22:11 <logmsgbot> !log brouberol@cumin1003 START - Cookbook sre.hosts.reimage for host an-test-worker1001.eqiad.wmnet with OS bullseye
2025-11-03 10:24:44 <wikibugs> 'SRE, ''SRE-Unowned, ''Maps, ''Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11334866 (''TheDJ) Not sure if this font issue T408884 is related, but it was reported around the switch to the new services, so might be worth double checking if the k8s images have...'
2025-11-03 10:25:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 10:26:39 <wikibugs> ('PS10) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 10:26:46 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84616 and previous config saved to /var/cache/conftool/dbconfig/20251103-102645-root.json
2025-11-03 10:27:01 <wikibugs> ('CR) ''Brouberol: [C:''+2] Enable normal caching for growthbook.wikimedia.org [puppet] - ''https://gerrit.wikimedia.org/r/1200289 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 10:27:04 <wikibugs> ('CR) ''Brouberol: [C:''+2] Expose the growthbook service publicly [puppet] - ''https://gerrit.wikimedia.org/r/1200290 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 10:27:24 <wikibugs> ('PS1) ''Marostegui: wmnet: Switch m3 to dbproxy1028 [dns] - ''https://gerrit.wikimedia.org/r/1201016 (https://phabricator.wikimedia.org/T408956)'
2025-11-03 10:29:30 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Q4:rack/setup/install sretest2010 Config J 1P test host - https://phabricator.wikimedia.org/T394357#11334890 (''elukey) I've set up the `UEFINetwork` list with `90:5A:08:9F:08:80` UEFI HTTP first, and it got reflected to `FixedBootOrder`. Ran a chassis reset, waited for the os t...'
2025-11-03 10:33:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 10:35:01 <wikibugs> ('PS11) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 10:35:31 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20251103-103527-marostegui.json
2025-11-03 10:38:49 <wikibugs> ('PS12) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 10:38:51 <logmsgbot> !log brouberol@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
2025-11-03 10:39:29 <wikibugs> 'SRE, ''SRE-Unowned, ''Maps, ''Patch-For-Review: Move maps servers to Bookworm - https://phabricator.wikimedia.org/T381565#11334905 (''elukey) >>! In T381565#11334866, @TheDJ wrote: > Not sure if this font issue T408884 is related, but it was reported around the switch to the new services, so might be wo...'
2025-11-03 10:40:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 10:41:53 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: After moving it to s7', diff saved to https://phabricator.wikimedia.org/P84617 and previous config saved to /var/cache/conftool/dbconfig/20251103-104152-root.json
2025-11-03 10:43:40 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] wmnet: Switch m3 to dbproxy1028 [dns] - ''https://gerrit.wikimedia.org/r/1201016 (https://phabricator.wikimedia.org/T408956) (owner: ''Marostegui)'
2025-11-03 10:43:56 <wikibugs> ('CR) ''Federico Ceratto: [C:''+1] wmnet: Switch m3 to dbproxy1028 [dns] - ''https://gerrit.wikimedia.org/r/1201016 (https://phabricator.wikimedia.org/T408956) (owner: ''Marostegui)'
2025-11-03 10:44:00 <wikibugs> ('CR) ''Marostegui: [C:''+2] wmnet: Switch m3 to dbproxy1028 [dns] - ''https://gerrit.wikimedia.org/r/1201016 (https://phabricator.wikimedia.org/T408956) (owner: ''Marostegui)'
2025-11-03 10:44:06 <logmsgbot> !log marostegui@dns1006 START - running authdns-update
2025-11-03 10:44:07 <logmsgbot> !log brouberol@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: host reimage
2025-11-03 10:44:30 <marostegui> !log Switch m3 (phabricator) proxy to dbproxy1028 T408956
2025-11-03 10:44:37 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 10:44:41 <stashbot> T408956: Occasional database errors when using/browsing Phabricator - https://phabricator.wikimedia.org/T408956
2025-11-03 10:44:59 <logmsgbot> !log marostegui@dns1006 END - running authdns-update
2025-11-03 10:46:56 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
2025-11-03 10:47:07 <wikibugs> ('CR) ''Daniel Kinzler: api-gateway: Rest-gateway Read `user_class` and `user_id` from JWT (''6 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1192579 (https://phabricator.wikimedia.org/T405578) (owner: ''Pmiazga)'
2025-11-03 10:49:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 10:50:40 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1197 (T407997)', diff saved to https://phabricator.wikimedia.org/P84618 and previous config saved to /var/cache/conftool/dbconfig/20251103-105038-marostegui.json
2025-11-03 10:50:48 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 10:50:55 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
2025-11-03 10:52:44 <wikibugs> ('PS2) ''Muehlenhoff: Limit microcode installation to Bullseye [puppet] - ''https://gerrit.wikimedia.org/r/1201014'
2025-11-03 10:52:58 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
2025-11-03 10:54:20 <wikibugs> 'SRE, ''AQS2.0, ''Cassandra, ''serviceops, ''Service-deployment-requests: AQS 2.0 differentially private pageviews deploy API - https://phabricator.wikimedia.org/T343855#11334980 (''Htriedman) I would love it to be but have no control over priorities here! What could I do o help move it forward?'
2025-11-03 10:57:15 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1201014 (owner: ''Muehlenhoff)'
2025-11-03 10:59:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 11:00:05 <jouncebot> Deploy window MediaWiki infrastructure (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1100)
2025-11-03 11:01:04 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance
2025-11-03 11:01:09 <wikibugs> 'SRE, ''Wikimedia-Mailing-lists: Reports of unsubscribe from wikitech-ambassadors failing to work - https://phabricator.wikimedia.org/T405153#11335012 (''Aklapper) ''Open''Stalled > Tried again earlier today, we'll see if I get the mailing list mail again next week. @Technical13: Is this still an issue?'
2025-11-03 11:01:12 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84619 and previous config saved to /var/cache/conftool/dbconfig/20251103-110111-marostegui.json
2025-11-03 11:01:16 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 11:03:27 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84620 and previous config saved to /var/cache/conftool/dbconfig/20251103-110326-marostegui.json
2025-11-03 11:05:51 <wikibugs> ('CR) ''JMeybohm: [C:''+1] "Cool!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200356 (owner: ''Clément Goubert)'
2025-11-03 11:06:08 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11335034 (''elukey) >>! In T404356#11331717, @elukey wrote: > There are still some provisioning issues for sretest2010 (see T394357)...'
2025-11-03 11:07:53 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11335037 (''elukey) >>! In T404356#11335034, @elukey wrote: >>>! In T404356#11331717, @elukey wrote: >> There are still some provisio...'
2025-11-03 11:08:55 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1201014 (owner: ''Muehlenhoff)'
2025-11-03 11:10:06 <logmsgbot> !log brouberol@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1001.eqiad.wmnet with OS bullseye
2025-11-03 11:10:33 <jinxer-wm> FIRING: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2025-11-03 11:13:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 11:13:30 <wikibugs> ('PS1) ''Muehlenhoff: Remove code to install hp-health [puppet] - ''https://gerrit.wikimedia.org/r/1201030'
2025-11-03 11:14:55 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Alert for device ps1-b8-codfw.mgmt.codfw.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T408963#11335049 (''phaultfinder)'
2025-11-03 11:15:19 <wikibugs> ('CR) ''Jcrespo: [C:''+2] "Merge for easier migration to gitlab." [software/transferpy] - ''https://gerrit.wikimedia.org/r/972446 (owner: ''Jcrespo)'
2025-11-03 11:15:33 <jinxer-wm> RESOLVED: KubernetesAPILatency: High Kubernetes API latency (LIST certificaterequests) on k8s-mlstaging@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=codfw&var-cluster=k8s-mlstaging&var-latency_percentile=0.95&var-verb=LIST - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
2025-11-03 11:15:55 <wikibugs> ('CR) ''Jcrespo: [C:''+2] "https://gerrit.wikimedia.org/r/operations/software/transferpy"; [software/transferpy] - ''https://gerrit.wikimedia.org/r/972471 (owner: ''Jcrespo)'
2025-11-03 11:16:14 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] Transferer: Add a few fixes after lintering to clean up the code [software/transferpy] - ''https://gerrit.wikimedia.org/r/972471 (owner: ''Jcrespo)'
2025-11-03 11:16:35 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] RemoteExecution: Restore RemoteExecution class back into transfer.py [software/transferpy] - ''https://gerrit.wikimedia.org/r/972475 (https://phabricator.wikimedia.org/T330882) (owner: ''Jcrespo)'
2025-11-03 11:16:53 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] "Merge for easier migration to gitlab" [software/transferpy] - ''https://gerrit.wikimedia.org/r/972475 (https://phabricator.wikimedia.org/T330882) (owner: ''Jcrespo)'
2025-11-03 11:17:45 <wikibugs> ('CR) ''Brouberol: [C:''+2] Create the growthbook.wikimedia.org subdomain [dns] - ''https://gerrit.wikimedia.org/r/1200317 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 11:18:04 <logmsgbot> !log brouberol@dns1004 START - running authdns-update
2025-11-03 11:18:34 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P84621 and previous config saved to /var/cache/conftool/dbconfig/20251103-111834-marostegui.json
2025-11-03 11:18:43 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] "Merge for easier migration to gitlab (issues were fixed on a latter patch)" [software/transferpy] - ''https://gerrit.wikimedia.org/r/972729 (https://phabricator.wikimedia.org/T330882) (owner: ''Jcrespo)'
2025-11-03 11:18:59 <logmsgbot> !log brouberol@dns1004 END - running authdns-update
2025-11-03 11:19:01 <wikibugs> ('CR) ''Jcrespo: [C:''+2] RemoteExecution: Remove cumin logged errors from low level execution [software/transferpy] - ''https://gerrit.wikimedia.org/r/974159 (https://phabricator.wikimedia.org/T330882) (owner: ''Jcrespo)'
2025-11-03 11:19:05 <wikibugs> ('PS13) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 11:19:18 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] "Merge for easier migration to gitlab" [software/transferpy] - ''https://gerrit.wikimedia.org/r/974159 (https://phabricator.wikimedia.org/T330882) (owner: ''Jcrespo)'
2025-11-03 11:20:03 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] [WIP]Prepare for release [software/transferpy] - ''https://gerrit.wikimedia.org/r/974683 (owner: ''Jcrespo)'
2025-11-03 11:20:55 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] "Merge for easier migration to gitlab (this was fixed on a latter commit)" [software/transferpy] - ''https://gerrit.wikimedia.org/r/974986 (owner: ''Jcrespo)'
2025-11-03 11:21:48 <wikibugs> ('PS3) ''Jcrespo: Transferer: Update logic for is_empty_dir() to avoid future bugs [software/transferpy] - ''https://gerrit.wikimedia.org/r/974986'
2025-11-03 11:21:51 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] Transferer: Update logic for is_empty_dir() to avoid future bugs [software/transferpy] - ''https://gerrit.wikimedia.org/r/974986 (owner: ''Jcrespo)'
2025-11-03 11:22:04 <wikibugs> ('CR) ''Jcrespo: [C:''+2] [WIP] transferpy: Add support for nftables [software/transferpy] - ''https://gerrit.wikimedia.org/r/1197676 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:22:20 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] [WIP] transferpy: Add support for nftables [software/transferpy] - ''https://gerrit.wikimedia.org/r/1197676 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:22:36 <wikibugs> ('PS4) ''Jcrespo: [WIP] transferpy: Add support for nftables [software/transferpy] - ''https://gerrit.wikimedia.org/r/1197676 (https://phabricator.wikimedia.org/T393692)'
2025-11-03 11:22:38 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] [WIP] transferpy: Add support for nftables [software/transferpy] - ''https://gerrit.wikimedia.org/r/1197676 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:23:00 <wikibugs> ('CR) ''Jcrespo: [C:''+2] transferpy: Prepare for Release 1.2 [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198104 (owner: ''Jcrespo)'
2025-11-03 11:23:04 <wikibugs> ('PS5) ''Jcrespo: transferpy: Prepare for Release 1.2 [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198104'
2025-11-03 11:23:06 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] transferpy: Prepare for Release 1.2 [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198104 (owner: ''Jcrespo)'
2025-11-03 11:23:13 <wikibugs> ('CR) ''Jcrespo: [C:''+2] transferpy: Type hints, reduced cyclomatic complexity and overal cleanup [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198314 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:23:17 <wikibugs> ('PS2) ''Jcrespo: transferpy: Type hints, reduced cyclomatic complexity and overal cleanup [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198314 (https://phabricator.wikimedia.org/T393692)'
2025-11-03 11:23:18 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] transferpy: Type hints, reduced cyclomatic complexity and overal cleanup [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198314 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:24:05 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] "New command is here" [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198501 (owner: ''Jcrespo)'
2025-11-03 11:24:11 <wikibugs> ('PS2) ''Jcrespo: transferpy: Fix the check for empty directories [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198501'
2025-11-03 11:24:17 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] transferpy: Fix the check for empty directories [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198501 (owner: ''Jcrespo)'
2025-11-03 11:24:28 <wikibugs> ('PS2) ''Jcrespo: transferpy: Force ipv4 usage for now, fix bug with found port [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198521'
2025-11-03 11:24:49 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] transferpy: Force ipv4 usage for now, fix bug with found port [software/transferpy] - ''https://gerrit.wikimedia.org/r/1198521 (owner: ''Jcrespo)'
2025-11-03 11:25:01 <wikibugs> ('PS2) ''Jcrespo: Fix unit tests that had been broken (but only were detected on trixie) [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200112'
2025-11-03 11:25:10 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] Fix unit tests that had been broken (but only were detected on trixie) [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200112 (owner: ''Jcrespo)'
2025-11-03 11:25:34 <wikibugs> ('CR) ''Jcrespo: "And here is the second part of the fix" [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200330 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:25:42 <wikibugs> ('CR) ''Jcrespo: [C:''+2] Transferer: Fix issue due to escaping where filenames with space failed [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200330 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:25:47 <wikibugs> ('PS3) ''Jcrespo: Transferer: Fix issue due to escaping where filenames with space failed [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200330 (https://phabricator.wikimedia.org/T393692)'
2025-11-03 11:25:51 <wikibugs> ('CR) ''Jcrespo: [V:''+2 C:''+2] Transferer: Fix issue due to escaping where filenames with space failed [software/transferpy] - ''https://gerrit.wikimedia.org/r/1200330 (https://phabricator.wikimedia.org/T393692) (owner: ''Jcrespo)'
2025-11-03 11:26:31 <wikibugs> ('Abandoned) ''Jcrespo: transferpy: Build for Bookworm [software/transferpy] - ''https://gerrit.wikimedia.org/r/1143539 (https://phabricator.wikimedia.org/T389380) (owner: ''Muehlenhoff)'
2025-11-03 11:27:03 <wikibugs> ('Abandoned) ''Jcrespo: transferpy: Add support for nftables [software/transferpy] - ''https://gerrit.wikimedia.org/r/1180570 (https://phabricator.wikimedia.org/T393692) (owner: ''Muehlenhoff)'
2025-11-03 11:27:10 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
2025-11-03 11:27:41 <wikibugs> ('Abandoned) ''Jcrespo: [POC4 WIP] transferpy: Multiprocess the transfers [software/transferpy] - ''https://gerrit.wikimedia.org/r/616282 (https://phabricator.wikimedia.org/T259327) (owner: ''Privacybatm)'
2025-11-03 11:28:11 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
2025-11-03 11:28:11 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 11:28:27 <wikibugs> 'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11335120 (''elukey) Found a little odd spike today in Pyrra for `xlab-standalone-event-validation-success-rate-v1`: [[ https://thanos.wikimedia.org/graph?g0.exp...'
2025-11-03 11:28:38 <wikibugs> ('Abandoned) ''Jcrespo: Modify:: The parsing function in transfer.py [software/transferpy] - ''https://gerrit.wikimedia.org/r/674577 (https://phabricator.wikimedia.org/T268258) (owner: ''Palak199)'
2025-11-03 11:28:47 <wikibugs> ('Abandoned) ''Jcrespo: Fix:: InvalidQueryException handling [software/transferpy] - ''https://gerrit.wikimedia.org/r/674319 (https://phabricator.wikimedia.org/T268258) (owner: ''Palak199)'
2025-11-03 11:29:09 <wikibugs> ('Abandoned) ''Jcrespo: [POC5 WIP] transferpy: Multiprocess the transfers [software/transferpy] - ''https://gerrit.wikimedia.org/r/621898 (https://phabricator.wikimedia.org/T259327) (owner: ''Privacybatm)'
2025-11-03 11:29:13 <wikibugs> ('Abandoned) ''Jcrespo: [POC3 WIP] transferpy: Multiprocess the transfers [software/transferpy] - ''https://gerrit.wikimedia.org/r/615179 (https://phabricator.wikimedia.org/T259327) (owner: ''Privacybatm)'
2025-11-03 11:33:42 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P84622 and previous config saved to /var/cache/conftool/dbconfig/20251103-113341-marostegui.json
2025-11-03 11:33:52 <wikibugs> ('PS1) ''Jcrespo: [WIP]Prepare for release 2 [software/transferpy] - ''https://gerrit.wikimedia.org/r/1201036'
2025-11-03 11:34:31 <wikibugs> ('Abandoned) ''Jcrespo: [WIP]Prepare for release 2 [software/transferpy] - ''https://gerrit.wikimedia.org/r/1201036 (owner: ''Jcrespo)'
2025-11-03 11:35:07 <wikibugs> ('PS1) ''Federico Ceratto: Flip es1, es2, es3 masters [dns] - ''https://gerrit.wikimedia.org/r/1201037 (https://phabricator.wikimedia.org/T402859)'
2025-11-03 11:35:41 <wikibugs> ('CR) ''Elukey: [C:''+1] Remove code to install hp-health [puppet] - ''https://gerrit.wikimedia.org/r/1201030 (owner: ''Muehlenhoff)'
2025-11-03 11:36:26 <wikibugs> ('CR) ''Marostegui: [C:''+1] Flip es1, es2, es3 masters [dns] - ''https://gerrit.wikimedia.org/r/1201037 (https://phabricator.wikimedia.org/T402859) (owner: ''Federico Ceratto)'
2025-11-03 11:43:45 <jinxer-wm> FIRING: WidespreadPuppetFailure: Puppet has failed in ulsfo - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
2025-11-03 11:48:25 <icinga-wm> PROBLEM - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1199 is CRITICAL: communication: 0 OK : controller: 1 Needs Attention : physical_disk: 1 Failed : virtual_disk: 1 OfLn : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021 https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
2025-11-03 11:48:27 <icinga-wm> ACKNOWLEDGEMENT - Dell PowerEdge or Supermicro Broadcom RAID Controller on an-worker1199 is CRITICAL: communication: 0 OK : controller: 1 Needs Attention : physical_disk: 1 Failed : virtual_disk: 1 OfLn : bbu: 0 OK : enclosure: 0 OK : CLI Version = 007.1910.0000.0000 Oct 08, 2021 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T409060 https://wikitech.wikimedia.org/wiki/PERCCli%23Monitoring
2025-11-03 11:48:34 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: Degraded RAID on an-worker1199 - https://phabricator.wikimedia.org/T409060 (''ops-monitoring-bot) ''NEW'
2025-11-03 11:48:38 <wikibugs> ('PS14) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 11:48:50 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84623 and previous config saved to /var/cache/conftool/dbconfig/20251103-114849-marostegui.json
2025-11-03 11:48:54 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 11:49:06 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance
2025-11-03 11:49:16 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1233 (T407997)', diff saved to https://phabricator.wikimedia.org/P84624 and previous config saved to /var/cache/conftool/dbconfig/20251103-114913-marostegui.json
2025-11-03 11:51:57 <wikibugs> ('CR) ''Vgutierrez: P:cache:haproxy: introduce ua classes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 11:54:11 <wikibugs> 'ops-ulsfo, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, ''netops: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11335237 (''cmooney) Thanks @papaul. One to discuss with @ayounsi when he is back are the IPv6 gateway addresses on the vlans. ` on asw1-22 irb.411 public1-ul...'
2025-11-03 11:55:14 <wikibugs> ('PS15) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 11:58:01 <topranks> !log move analytics1-c-eqiad gateway IPs to new spine switch ports eqiad T405579
2025-11-03 11:58:08 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 11:58:11 <stashbot> T405579: Eqiad C/D refresh: move asw2-c-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T405579
2025-11-03 12:01:10 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T407997)', diff saved to https://phabricator.wikimedia.org/P84625 and previous config saved to /var/cache/conftool/dbconfig/20251103-120108-marostegui.json
2025-11-03 12:01:16 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 12:01:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 12:03:15 <jinxer-wm> FIRING: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 1.844s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
2025-11-03 12:08:15 <jinxer-wm> RESOLVED: MediaWikiLatencyExceeded: p75 latency high: codfw mw-parsoid releases routed via main (k8s) 2.5s - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=55&var-dc=codfw%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-parsoid&var-release=main - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded
2025-11-03 12:11:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 12:12:06 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11335269 (''hnowlan) ''Open''In progress'
2025-11-03 12:12:44 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11335273 (''hnowlan) Awaiting out of band verification of SSH key on Slack. Tagging @thcipriani as approver for `deployment` group.'
2025-11-03 12:16:18 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P84626 and previous config saved to /var/cache/conftool/dbconfig/20251103-121617-marostegui.json
2025-11-03 12:16:18 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Machine-Learning-Team: Promote dpogorzelski from ops-limited to ops - https://phabricator.wikimedia.org/T408702#11335277 (''hnowlan) ''Open''Stalled Blocked on approval from @mark.'
2025-11-03 12:16:50 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11335279 (''hnowlan)'
2025-11-03 12:16:52 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11335280 (''hnowlan) Key verified out of band.'
2025-11-03 12:18:45 <jinxer-wm> RESOLVED: WidespreadPuppetFailure: Puppet has failed in ulsfo - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
2025-11-03 12:24:23 <icinga-wm> PROBLEM - VRRP status on cr1-eqiad is CRITICAL: VRRP CRITICAL - 2 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
2025-11-03 12:26:18 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 12:27:30 <topranks> !log adjust VRRP priority for analytics1-d-eqiad to make cr1-eqiad active gateway T405579
2025-11-03 12:27:32 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 12:27:33 <stashbot> T405579: Eqiad C/D refresh: move asw2-c-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T405579
2025-11-03 12:28:43 <jinxer-wm> FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 12:29:03 <topranks> ^^ above VRRP alert is 100% due to my works, everything is fine I'll check what the stupid alert is expecting to see but traffic is unaffected
2025-11-03 12:31:25 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P84627 and previous config saved to /var/cache/conftool/dbconfig/20251103-123125-marostegui.json
2025-11-03 12:32:05 <logmsgbot> cmooney@cumin1003 netbox (PID 2985360) is awaiting input
2025-11-03 12:32:37 <topranks> ok yeah it expects the interface names on each router to be identical. that is normally the case in our infra and will be again when I'm done, will clear it shortly
2025-11-03 12:32:54 <wikibugs> ('PS1) ''Brouberol: growthbook: add the growthbook.wikimedia.org SAN to the certificate [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201046 (https://phabricator.wikimedia.org/T408903)'
2025-11-03 12:33:23 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for analytics1-c-eqiad IPs cr1-eqiad - cmooney@cumin1003"
2025-11-03 12:33:27 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for analytics1-c-eqiad IPs cr1-eqiad - cmooney@cumin1003"
2025-11-03 12:33:27 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 12:34:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 12:35:02 <topranks> !log move analytics1-c-eqiad gateway IPs to new spine switch port cr2-eqiad T405579
2025-11-03 12:35:04 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 12:35:05 <stashbot> T405579: Eqiad C/D refresh: move asw2-c-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T405579
2025-11-03 12:38:23 <icinga-wm> RECOVERY - VRRP status on cr1-eqiad is OK: VRRP OK - 0 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
2025-11-03 12:39:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 12:43:43 <jinxer-wm> RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 12:46:33 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1233 (T407997)', diff saved to https://phabricator.wikimedia.org/P84628 and previous config saved to /var/cache/conftool/dbconfig/20251103-124632-marostegui.json
2025-11-03 12:46:36 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 12:46:49 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
2025-11-03 12:54:22 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] Flip es1, es2, es3 masters [dns] - ''https://gerrit.wikimedia.org/r/1201037 (https://phabricator.wikimedia.org/T402859) (owner: ''Federico Ceratto)'
2025-11-03 12:54:58 <logmsgbot> !log fceratto@dns1004 START - running authdns-update
2025-11-03 12:55:55 <logmsgbot> !log fceratto@dns1004 END - running authdns-update
2025-11-03 12:56:36 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance
2025-11-03 12:56:43 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1254 (T407997)', diff saved to https://phabricator.wikimedia.org/P84629 and previous config saved to /var/cache/conftool/dbconfig/20251103-125643-marostegui.json
2025-11-03 12:56:46 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 12:59:09 <wikibugs> 'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11335391 (''tappof) It seems that some of the eventgate pods were restarted between 16:00 and 17:00 (Just a quick check by looking at the metrics — I didn’t dig...'
2025-11-03 13:00:12 <logmsgbot> !log fceratto@cumin1003 dbctl commit (dc=all): 'Update masters for T402859', diff saved to https://phabricator.wikimedia.org/P84630 and previous config saved to /var/cache/conftool/dbconfig/20251103-130011-fceratto.json
2025-11-03 13:00:14 <stashbot> T402859: Productionize es2049-es2057 - https://phabricator.wikimedia.org/T402859
2025-11-03 13:07:56 <wikibugs> ('PS1) ''A smart kitten: enwikibooks: Limit FlaggedRevs to the main, Cookbook & Wikijunior namespaces [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110)'
2025-11-03 13:08:13 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T407997)', diff saved to https://phabricator.wikimedia.org/P84631 and previous config saved to /var/cache/conftool/dbconfig/20251103-130812-marostegui.json
2025-11-03 13:08:17 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 13:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 13:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 13:12:02 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: ''A smart kitten)'
2025-11-03 13:15:28 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops, ''Data-Platform-SRE (2025.10.17 - 2025.11.07), ''Essential-Work: Degraded RAID on an-presto1013 - https://phabricator.wikimedia.org/T408065#11335439 (''Jclark-ctr)'
2025-11-03 13:15:30 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: Degraded RAID on an-presto1013 - https://phabricator.wikimedia.org/T408966#11335441 (''Jclark-ctr) →''Duplicate dup:''T408065'
2025-11-03 13:18:33 <wikibugs> ('CR) ''Brouberol: [C:''+2] airflow-platform-eng: allow task pods to egress to the urldownloader hosts [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200354 (https://phabricator.wikimedia.org/T408238) (owner: ''Brouberol)'
2025-11-03 13:18:36 <wikibugs> ('CR) ''Brouberol: [C:''+2] airflow-platform-eng: define a connection to the spur.us API [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200357 (https://phabricator.wikimedia.org/T408238) (owner: ''Brouberol)'
2025-11-03 13:19:44 <wikibugs> ('PS1) ''Cathal Mooney: Eqiad C/D migration: move analytics1-c-eqiad GW to CR et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201056 (https://phabricator.wikimedia.org/T405579)'
2025-11-03 13:19:54 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Alert for device ps1-b8-codfw.mgmt.codfw.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T408963#11335452 (''phaultfinder)'
2025-11-03 13:20:30 <wikibugs> ('PS1) ''Muehlenhoff: ganeti-ca-exporter: Log the cluster name as part of the metric [puppet] - ''https://gerrit.wikimedia.org/r/1201057 (https://phabricator.wikimedia.org/T382902)'
2025-11-03 13:20:34 <wikibugs> ('Merged) ''jenkins-bot: airflow-platform-eng: allow task pods to egress to the urldownloader hosts [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200354 (https://phabricator.wikimedia.org/T408238) (owner: ''Brouberol)'
2025-11-03 13:20:45 <wikibugs> ('Merged) ''jenkins-bot: airflow-platform-eng: define a connection to the spur.us API [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200357 (https://phabricator.wikimedia.org/T408238) (owner: ''Brouberol)'
2025-11-03 13:21:10 <wikibugs> ('CR) ''CI reject: [V:''-1] ganeti-ca-exporter: Log the cluster name as part of the metric [puppet] - ''https://gerrit.wikimedia.org/r/1201057 (https://phabricator.wikimedia.org/T382902) (owner: ''Muehlenhoff)'
2025-11-03 13:21:33 <wikibugs> ('CR) ''Cathal Mooney: [C:''+2] Eqiad C/D migration: move analytics1-c-eqiad GW to CR et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201056 (https://phabricator.wikimedia.org/T405579) (owner: ''Cathal Mooney)'
2025-11-03 13:22:52 <wikibugs> ('Merged) ''jenkins-bot: Eqiad C/D migration: move analytics1-c-eqiad GW to CR et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201056 (https://phabricator.wikimedia.org/T405579) (owner: ''Cathal Mooney)'
2025-11-03 13:23:21 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P84632 and previous config saved to /var/cache/conftool/dbconfig/20251103-132320-marostegui.json
2025-11-03 13:24:33 <wikibugs> ('PS2) ''Muehlenhoff: ganeti-ca-exporter: Log the cluster name as part of the metric [puppet] - ''https://gerrit.wikimedia.org/r/1201057 (https://phabricator.wikimedia.org/T382902)'
2025-11-03 13:25:57 <wikibugs> ('PS2) ''A smart kitten: enwikibooks: Limit FlaggedRevs to the main, Cookbook & Wikijunior namespaces [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110)'
2025-11-03 13:27:21 <wikibugs> ('PS16) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 13:27:45 <wikibugs> ('CR) ''Muehlenhoff: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1201057 (https://phabricator.wikimedia.org/T382902) (owner: ''Muehlenhoff)'
2025-11-03 13:28:12 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067 (''cmooney) ''NEW p:''Triage''Medium'
2025-11-03 13:28:21 <wikibugs> ('CR) ''Fabfur: P:cache:haproxy: introduce ua classes (''3 comments) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 13:28:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 13:28:49 <wikibugs> ('CR) ''Fabfur: P:cache:haproxy: introduce ua classes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 13:33:43 <logmsgbot> !log fceratto@cumin1003 dbctl commit (dc=all): 'Update masters for T402859', diff saved to https://phabricator.wikimedia.org/P84633 and previous config saved to /var/cache/conftool/dbconfig/20251103-133342-fceratto.json
2025-11-03 13:33:49 <stashbot> T402859: Productionize es2049-es2057 - https://phabricator.wikimedia.org/T402859
2025-11-03 13:33:57 <wikibugs> ('CR) ''Fabfur: "check experimental" [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 13:34:01 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 13:37:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 13:38:29 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P84634 and previous config saved to /var/cache/conftool/dbconfig/20251103-133828-marostegui.json
2025-11-03 13:39:01 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 13:41:10 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''serviceops: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet - https://phabricator.wikimedia.org/T408004#11335531 (''Raine) @Jhancock.wm great, thanks for the update!'
2025-11-03 13:45:13 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
2025-11-03 13:45:54 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
2025-11-03 13:47:38 <wikibugs> ('CR) ''Kamila Součková: "LGTM except see inline" [puppet] - ''https://gerrit.wikimedia.org/r/1200116 (https://phabricator.wikimedia.org/T408749) (owner: ''Clément Goubert)'
2025-11-03 13:48:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 13:50:59 <wikibugs> ('PS1) ''Bartosz Dziewoński: recentchanges: Fix highlights where more than one action is defined [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201064 (https://phabricator.wikimedia.org/T409020)'
2025-11-03 13:51:54 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy"; [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201064 (https://phabricator.wikimedia.org/T409020) (owner: ''Bartosz Dziewoński)'
2025-11-03 13:52:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 13:52:08 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335585 (''cmooney)'
2025-11-03 13:52:39 <wikibugs> ('PS1) ''Muehlenhoff: ganeti-ca: Adapt to change of logged clustername for the expity metric [alerts] - ''https://gerrit.wikimedia.org/r/1201066 (https://phabricator.wikimedia.org/T382902)'
2025-11-03 13:53:20 <wikibugs> ('CR) ''Gehel: [C:''+1] "LGTM" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201046 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 13:53:36 <logmsgbot> !log cmooney@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw2-d-eqiad,cr[1-2]-eqiad with reason: moving uplinks from CRs to Nokia Spines on asw2-d-eqiad
2025-11-03 13:53:37 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1254 (T407997)', diff saved to https://phabricator.wikimedia.org/P84635 and previous config saved to /var/cache/conftool/dbconfig/20251103-135336-marostegui.json
2025-11-03 13:53:41 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 13:53:53 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance
2025-11-03 13:53:58 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11335590 (''ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a04c020e-81be-4ee8-bf2f-5bcc8830a8da) set by cmooney@cumin1003 for 2:00:00...'
2025-11-03 13:54:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db1259 (T407997)', diff saved to https://phabricator.wikimedia.org/P84636 and previous config saved to /var/cache/conftool/dbconfig/20251103-135400-marostegui.json
2025-11-03 13:55:23 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.mysql.depool es2029 - Depool es2029 T408408
2025-11-03 13:55:29 <stashbot> T408408: decommission es2029 - https://phabricator.wikimedia.org/T408408
2025-11-03 13:55:42 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2029 - Depool es2029 T408408
2025-11-03 13:55:59 <wikibugs> ('CR) ''Brouberol: [C:''+2] growthbook: add the growthbook.wikimedia.org SAN to the certificate [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201046 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 13:56:19 <topranks> !log shut down cr1-eqiad link to asw2-d-eqiad to migrate traffic via Nokia spines T409067
2025-11-03 13:56:24 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 13:56:26 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 13:57:06 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] instances.yaml: remove es2031 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199742 (https://phabricator.wikimedia.org/T408410) (owner: ''Federico Ceratto)'
2025-11-03 13:57:08 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] instances.yaml: remove es2030 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199741 (https://phabricator.wikimedia.org/T408409) (owner: ''Federico Ceratto)'
2025-11-03 13:59:52 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2025-11-03 14:00:05 <jouncebot> Lucas_WMDE, Urbanecm, and TheresNoTime: gettimeofday() says it's time for UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1400)
2025-11-03 14:00:05 <jouncebot> cormacparle, MatmaRex, Superpes, and edsanders: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2025-11-03 14:00:07 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
2025-11-03 14:00:07 <edsanders> o/
2025-11-03 14:00:15 <Lucas_WMDE> o/
2025-11-03 14:00:24 <MatmaRex> hey
2025-11-03 14:00:31 <wikibugs> ('PS8) ''Federico Ceratto: instances.yaml: remove es2029 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199740 (https://phabricator.wikimedia.org/T408408)'
2025-11-03 14:00:31 <wikibugs> ('PS8) ''Federico Ceratto: instances.yaml: remove es2030 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199741 (https://phabricator.wikimedia.org/T408409)'
2025-11-03 14:00:31 <wikibugs> ('PS8) ''Federico Ceratto: instances.yaml: remove es2031 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199742 (https://phabricator.wikimedia.org/T408410)'
2025-11-03 14:01:38 <MatmaRex> hi Lucas_WMDE, btw, i saw your replies on the CentralAuth maintenance script task. i was going to look at the logs from logstash, but i haven't found the time last week, sorry you were the first to discover the current failures :)
2025-11-03 14:01:47 <MatmaRex> hope you enjoyed your time off
2025-11-03 14:01:52 <Lucas_WMDE> it was nice, thanks :)
2025-11-03 14:02:26 <Lucas_WMDE> MatmaRex: is your config change related to the backports?
2025-11-03 14:02:30 <Lucas_WMDE> or can it be deployed separately?
2025-11-03 14:02:36 <MatmaRex> no, config is just cleanup (no-op)
2025-11-03 14:02:40 <Lucas_WMDE> ok
2025-11-03 14:02:58 <Lucas_WMDE> then I’d say let’s start with that config change + edsanders
2025-11-03 14:02:59 <MatmaRex> the two backports are independent as well, can go out separately or together
2025-11-03 14:03:04 <Lucas_WMDE> and then start the backport gate-and-submit
2025-11-03 14:03:12 <Lucas_WMDE> reviews the config changes
2025-11-03 14:03:52 <wikibugs> ('CR) ''Muehlenhoff: "Respective alert change at https://gerrit.wikimedia.org/r/c/operations/alerts/+/1201066"; [puppet] - ''https://gerrit.wikimedia.org/r/1201057 (https://phabricator.wikimedia.org/T382902) (owner: ''Muehlenhoff)'
2025-11-03 14:04:18 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/941424 (https://phabricator.wikimedia.org/T183848) (owner: ''Func)'
2025-11-03 14:04:18 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200876 (https://phabricator.wikimedia.org/T406026) (owner: ''Esanders)'
2025-11-03 14:04:19 <cormacparle> o/
2025-11-03 14:04:30 <Lucas_WMDE> my spiderpig session survived the vacation btw ^^
2025-11-03 14:05:12 <wikibugs> ('Merged) ''jenkins-bot: Revert "Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/941424 (https://phabricator.wikimedia.org/T183848) (owner: ''Func)'
2025-11-03 14:05:18 <wikibugs> ('Merged) ''jenkins-bot: Freeze LiquidThreads on huwiki and svwikisource [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200876 (https://phabricator.wikimedia.org/T406026) (owner: ''Esanders)'
2025-11-03 14:05:43 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:941424|Revert "Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow" (T183848)]], [[gerrit:1200876|Freeze LiquidThreads on huwiki and svwikisource (T406026 T406227)]]
2025-11-03 14:05:55 <wikibugs> ('PS2) ''Bartosz Dziewoński: upload: Remove stashed file in UploadFromStash when upload completed [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200194 (https://phabricator.wikimedia.org/T408610)'
2025-11-03 14:06:00 <stashbot> T183848: MediaWiki:Movepage-summary is not forced to content language - https://phabricator.wikimedia.org/T183848
2025-11-03 14:06:01 <stashbot> T406026: Convert LQT pages on huwiki to Flow - https://phabricator.wikimedia.org/T406026
2025-11-03 14:06:02 <stashbot> T406227: Convert LQT pages on svwikisource to Flow - https://phabricator.wikimedia.org/T406227
2025-11-03 14:06:02 <wikibugs> ('PS2) ''Bartosz Dziewoński: recentchanges: Fix highlights where more than one action is defined [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201064 (https://phabricator.wikimedia.org/T409020)'
2025-11-03 14:06:02 <edsanders> ok
2025-11-03 14:06:15 <MatmaRex> (just rebasing to run tests for the cache)
2025-11-03 14:06:20 <Lucas_WMDE> ack
2025-11-03 14:06:38 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11335666 (''Xaosflux)'
2025-11-03 14:06:57 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259 (T407997)', diff saved to https://phabricator.wikimedia.org/P84638 and previous config saved to /var/cache/conftool/dbconfig/20251103-140653-marostegui.json
2025-11-03 14:06:57 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11335667 (''Xaosflux)'
2025-11-03 14:07:00 <wikibugs> ('PS1) ''C. Scott Ananian: i18n: all behavior switches should start/end with __ (part 2) [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201069'
2025-11-03 14:07:03 <Lucas_WMDE> those backports look fine to me
2025-11-03 14:07:06 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 14:07:13 <edsanders> Lucas_WMDE: thanks
2025-11-03 14:07:17 <Lucas_WMDE> let’s +2 them, and then we’ll see if cormacparle’s config change happens first or not
2025-11-03 14:07:23 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy"; [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201069 (owner: ''C. Scott Ananian)'
2025-11-03 14:07:30 <cormacparle> 👍
2025-11-03 14:07:36 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): [C:''+2] "starting gate-and-submit ahead of deployment" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200194 (https://phabricator.wikimedia.org/T408610) (owner: ''Bartosz Dziewoński)'
2025-11-03 14:07:38 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): [C:''+2] "starting gate-and-submit ahead of deployment" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201064 (https://phabricator.wikimedia.org/T409020) (owner: ''Bartosz Dziewoński)'
2025-11-03 14:08:02 <cscott> sliding into the backport window if there's space?
2025-11-03 14:08:06 <Lucas_WMDE> (also still waiting for Superpes to show up ^^)
2025-11-03 14:08:19 <Lucas_WMDE> cscott: we’ll see, I guess :)
2025-11-03 14:08:21 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11335670 (''Xaosflux)'
2025-11-03 14:09:01 <wikibugs> ('PS1) ''C. Scott Ananian: i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201070 (https://phabricator.wikimedia.org/T407289)'
2025-11-03 14:09:25 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploy"; [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201070 (https://phabricator.wikimedia.org/T407289) (owner: ''C. Scott Ananian)'
2025-11-03 14:10:27 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, esanders, func: Backport for [[gerrit:941424|Revert "Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow" (T183848)]], [[gerrit:1200876|Freeze LiquidThreads on huwiki and svwikisource (T406026 T406227)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 14:11:10 <Lucas_WMDE> MatmaRex, edsanders: please test!
2025-11-03 14:11:50 <Lucas_WMDE> has also just looked up what the logspam-watch circular glyphs mean, and now wonders if the glyphs at https://gerrit.wikimedia.org/g/operations/puppet/+/fd659bc4bb/modules/role/files/logging/logspam-watch.sh#177 have similar sizes for other people
2025-11-03 14:11:58 <Lucas_WMDE> (on my end the third one is way larger than the others)
2025-11-03 14:12:26 <MatmaRex> my config change looks good
2025-11-03 14:12:31 <Lucas_WMDE> ok
2025-11-03 14:12:39 <edsanders> Same here, looks good
2025-11-03 14:12:46 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, esanders, func: Continuing with sync
2025-11-03 14:12:47 <Lucas_WMDE> yay
2025-11-03 14:12:50 <Lucas_WMDE> FrozenThreads
2025-11-03 14:13:00 <MatmaRex> heh
2025-11-03 14:13:11 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] instances.yaml: remove es2029 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199740 (https://phabricator.wikimedia.org/T408408) (owner: ''Federico Ceratto)'
2025-11-03 14:13:19 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] instances.yaml: remove es2030 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199741 (https://phabricator.wikimedia.org/T408409) (owner: ''Federico Ceratto)'
2025-11-03 14:13:34 <wikibugs> ('CR) ''Federico Ceratto: [C:''+2] instances.yaml: remove es2031 from dbctl [puppet] - ''https://gerrit.wikimedia.org/r/1199742 (https://phabricator.wikimedia.org/T408410) (owner: ''Federico Ceratto)'
2025-11-03 14:14:25 <Lucas_WMDE> huh, the mwdebug logstash board only has 3 messages in the last 24 hours o_O
2025-11-03 14:14:48 <wikibugs> ('CR) ''A smart kitten: "(For the record, I un-scheduled this for now)" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: ''A smart kitten)'
2025-11-03 14:15:04 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.mysql.depool es2030 - Depool es2030 T408409
2025-11-03 14:15:07 <stashbot> T408409: decommission es2030 - https://phabricator.wikimedia.org/T408409
2025-11-03 14:15:33 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2030 - Depool es2030 T408409
2025-11-03 14:16:00 <cormacparle> don't see my change yet ... should I expect to?
2025-11-03 14:16:04 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.mysql.depool es2030 - Depool es2030 T408409
2025-11-03 14:16:09 <Lucas_WMDE> nope, I haven’t deployed it yet
2025-11-03 14:16:11 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2030 - Depool es2030 T408409
2025-11-03 14:16:15 <cormacparle> kk
2025-11-03 14:16:18 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.mysql.depool es2031 - Depool es2031 T408410
2025-11-03 14:16:18 <Lucas_WMDE> it’s either up next or later
2025-11-03 14:16:20 <stashbot> T408410: decommission es2031 - https://phabricator.wikimedia.org/T408410
2025-11-03 14:16:28 <Lucas_WMDE> depending on whether the backports finish merging before the current deployment is done
2025-11-03 14:16:35 <cormacparle> cool
2025-11-03 14:16:47 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2031 - Depool es2031 T408410
2025-11-03 14:18:14 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11335719 (''Xaosflux) Following up on progress at #wikimedia-sre ; expected resource is not yet on shift. Let's give them s...'
2025-11-03 14:18:36 <wikibugs> 'SRE-Access-Requests: Migrate ori to a FIDO-backed key - https://phabricator.wikimedia.org/T409075 (''ori) ''NEW'
2025-11-03 14:19:26 <MatmaRex> Lucas_WMDE: the shaded circle ◍ is larger than the others for me in some fonts, but the same size in others. e.g. on gitiles: https://phabricator.wikimedia.org/F69799528 in my editor: https://phabricator.wikimedia.org/F69799547
2025-11-03 14:19:29 <wikibugs> ('PS4) ''Ori: admin: add FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1200217 (https://phabricator.wikimedia.org/T409075)'
2025-11-03 14:19:59 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:941424|Revert "Adding Movepage-summary to wgForceUIMsgAsContentMsg to allow" (T183848)]], [[gerrit:1200876|Freeze LiquidThreads on huwiki and svwikisource (T406026 T406227)]] (duration: 14m 16s)
2025-11-03 14:20:04 <stashbot> T183848: MediaWiki:Movepage-summary is not forced to content language - https://phabricator.wikimedia.org/T183848
2025-11-03 14:20:05 <stashbot> T406026: Convert LQT pages on huwiki to Flow - https://phabricator.wikimedia.org/T406026
2025-11-03 14:20:05 <stashbot> T406227: Convert LQT pages on svwikisource to Flow - https://phabricator.wikimedia.org/T406227
2025-11-03 14:20:10 <MatmaRex> (that's Roboto Mono and DejaVu Sans Mono, respectively)
2025-11-03 14:20:11 <wikibugs> 'SRE-SLO, ''Experimentation Lab (Experiment Platform Sprint 14), ''OKR-Work: Create Pyrra SLOs for xLab - https://phabricator.wikimedia.org/T398869#11335743 (''elukey) @tappof The main issue is that Pyrra/Sloth/etc.. IIUC assume counters, and without changing them dramatically we cannot do much. Sloth has t...'
2025-11-03 14:20:52 <wikibugs> ('CR) ''Elukey: [C:''+1] Limit microcode installation to Bullseye [puppet] - ''https://gerrit.wikimedia.org/r/1201014 (owner: ''Muehlenhoff)'
2025-11-03 14:20:54 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200105 (https://phabricator.wikimedia.org/T41510) (owner: ''Cparle)'
2025-11-03 14:21:02 <Lucas_WMDE> interesting, I think for me it’s an even bigger difference
2025-11-03 14:21:46 <Lucas_WMDE> https://phabricator.wikimedia.org/F69799528#12462
2025-11-03 14:22:00 <wikibugs> ('Merged) ''jenkins-bot: Enable pagination on Special:EditWatchlist everywhere [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200105 (https://phabricator.wikimedia.org/T41510) (owner: ''Cparle)'
2025-11-03 14:22:04 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P84641 and previous config saved to /var/cache/conftool/dbconfig/20251103-142204-marostegui.json
2025-11-03 14:22:18 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1200105|Enable pagination on Special:EditWatchlist everywhere (T41510)]]
2025-11-03 14:22:20 <stashbot> T41510: Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager) - https://phabricator.wikimedia.org/T41510
2025-11-03 14:22:31 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 04 UTC afternoon backport window](https://wikitech.wikimedia.org/wiki/Deployments#deplo"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199751 (https://phabricator.wikimedia.org/T400067) (owner: ''Abijeet Patro)'
2025-11-03 14:24:15 <wikibugs> ('CR) ''Ssingh: [C:''+1] dotls: enable nrpe2nodexp wrapper on check_dotls [puppet] - ''https://gerrit.wikimedia.org/r/1200088 (https://phabricator.wikimedia.org/T384425) (owner: ''Tiziano Fogli)'
2025-11-03 14:24:46 <MatmaRex> at https://en.wikipedia.org/wiki/Geometric_Shapes_(Unicode_block)#Block they also appear in different sizes (U+25Cx row, B D F columns)
2025-11-03 14:24:47 <wikibugs> ('Merged) ''jenkins-bot: upload: Remove stashed file in UploadFromStash when upload completed [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1200194 (https://phabricator.wikimedia.org/T408610) (owner: ''Bartosz Dziewoński)'
2025-11-03 14:24:48 <MatmaRex> they a
2025-11-03 14:24:52 <wikibugs> ('Merged) ''jenkins-bot: recentchanges: Fix highlights where more than one action is defined [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201064 (https://phabricator.wikimedia.org/T409020) (owner: ''Bartosz Dziewoński)'
2025-11-03 14:25:01 <MatmaRex> they are probably supposed to be the same size. could file a bug with the fonts or something ;)
2025-11-03 14:25:26 <Lucas_WMDE> huh, *those* are consistent for me OTOH
2025-11-03 14:26:16 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 cparle, lucaswerkmeister-wmde: Backport for [[gerrit:1200105|Enable pagination on Special:EditWatchlist everywhere (T41510)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 14:26:29 <Lucas_WMDE> cormacparle: please test!
2025-11-03 14:26:59 <wikibugs> ('PS1) ''Federico Ceratto: site.pp, es2029.yaml: Decommission es2029 [puppet] - ''https://gerrit.wikimedia.org/r/1201071 (https://phabricator.wikimedia.org/T408408)'
2025-11-03 14:27:01 <wikibugs> ('PS1) ''Federico Ceratto: site.pp, es2030.yaml: Decommission es2030 [puppet] - ''https://gerrit.wikimedia.org/r/1201072 (https://phabricator.wikimedia.org/T408409)'
2025-11-03 14:27:03 <wikibugs> ('PS1) ''Federico Ceratto: site.pp, es2031.yaml: Decommission es2031 [puppet] - ''https://gerrit.wikimedia.org/r/1201073 (https://phabricator.wikimedia.org/T408410)'
2025-11-03 14:27:16 <Lucas_WMDE> I get a paginated watchlist on wikidata, at least
2025-11-03 14:27:19 <cormacparle> Lucas_WMDE: on it
2025-11-03 14:27:23 <Lucas_WMDE> ack
2025-11-03 14:28:15 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.hosts.decommission for hosts es2029.codfw.wmnet
2025-11-03 14:28:58 <wikibugs> ('CR) ''MVernon: [C:''+2] Return ms-be10{89,90} to the rings [puppet] - ''https://gerrit.wikimedia.org/r/1200288 (https://phabricator.wikimedia.org/T400877) (owner: ''MVernon)'
2025-11-03 14:29:20 <wikibugs> ('PS1) ''Brouberol: trafficserver: rediredct growthbook-backend from public to private domains [puppet] - ''https://gerrit.wikimedia.org/r/1201074 (https://phabricator.wikimedia.org/T408903)'
2025-11-03 14:29:22 <wikibugs> ('PS1) ''Brouberol: Define the growthbook-backend domain [dns] - ''https://gerrit.wikimedia.org/r/1201075 (https://phabricator.wikimedia.org/T408903)'
2025-11-03 14:29:24 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
2025-11-03 14:29:37 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
2025-11-03 14:29:52 <cormacparle> Lucas_WMDE: looks good to me
2025-11-03 14:29:57 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 cparle, lucaswerkmeister-wmde: Continuing with sync
2025-11-03 14:29:58 <Lucas_WMDE> \o/
2025-11-03 14:31:23 <logmsgbot> fceratto@cumin1003 decommission (PID 3107435) is awaiting input
2025-11-03 14:32:46 <wikibugs> ('CR) ''Ssingh: [C:''+1] dns: enable nrpe2nodexp wrapper on authdns_update_run check [puppet] - ''https://gerrit.wikimedia.org/r/1200359 (https://phabricator.wikimedia.org/T384425) (owner: ''Tiziano Fogli)'
2025-11-03 14:34:26 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1200105|Enable pagination on Special:EditWatchlist everywhere (T41510)]] (duration: 12m 08s)
2025-11-03 14:34:26 <wikibugs> ('CR) ''Ssingh: "This can go to the Traffic team IMO." [puppet] - ''https://gerrit.wikimedia.org/r/1200362 (https://phabricator.wikimedia.org/T407330) (owner: ''Tiziano Fogli)'
2025-11-03 14:34:29 <stashbot> T41510: Opening Special:EditWatchlist with a large watchlist hits server timeout (Create watchlist pager) - https://phabricator.wikimedia.org/T41510
2025-11-03 14:34:55 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: schema change
2025-11-03 14:35:08 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1200194|upload: Remove stashed file in UploadFromStash when upload completed (T408610)]], [[gerrit:1201064|recentchanges: Fix highlights where more than one action is defined (T409020)]]
2025-11-03 14:35:13 <stashbot> T408610: [Regression] Clicking on images after upload leads to broken links - https://phabricator.wikimedia.org/T408610
2025-11-03 14:35:13 <wikibugs> ('PS1) ''Muehlenhoff: Add separate role for single-node staging DB [puppet] - ''https://gerrit.wikimedia.org/r/1201077 (https://phabricator.wikimedia.org/T381565)'
2025-11-03 14:35:13 <stashbot> T409020: ChangesListSpecialPage incorrect highlight for mw-changeslist-last - https://phabricator.wikimedia.org/T409020
2025-11-03 14:35:16 <Lucas_WMDE> Superpes: are you around for your config change?
2025-11-03 14:35:28 <Lucas_WMDE> otherwise cscott’s backports would be up next
2025-11-03 14:35:44 <Lucas_WMDE> cscott: should those be deployed separately or together?
2025-11-03 14:35:56 <cscott> they can be deployed together
2025-11-03 14:36:00 <Lucas_WMDE> ok
2025-11-03 14:36:04 <Lucas_WMDE> let’s start gate-and-submit then
2025-11-03 14:36:14 <cscott> they'll probably be slow to deploy because i18n
2025-11-03 14:36:22 <Lucas_WMDE> ah
2025-11-03 14:36:24 <wikibugs> ('CR) ''Muehlenhoff: [C:''+1] "Looks good!" [puppet] - ''https://gerrit.wikimedia.org/r/1200217 (https://phabricator.wikimedia.org/T409075) (owner: ''Ori)'
2025-11-03 14:36:26 <Lucas_WMDE> very possible, yes
2025-11-03 14:36:32 <Lucas_WMDE> let’s definitely do them together then
2025-11-03 14:36:37 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): [C:''+2] "starting gate-and-submit before deployment" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201069 (owner: ''C. Scott Ananian)'
2025-11-03 14:36:41 <wikibugs> ('CR) ''Lucas Werkmeister (WMDE): [C:''+2] "starting gate-and-submit before deployment" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201070 (https://phabricator.wikimedia.org/T407289) (owner: ''C. Scott Ananian)'
2025-11-03 14:36:50 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 14:37:29 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: schema change
2025-11-03 14:39:01 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, matmarex: Backport for [[gerrit:1200194|upload: Remove stashed file in UploadFromStash when upload completed (T408610)]], [[gerrit:1201064|recentchanges: Fix highlights where more than one action is defined (T409020)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 14:39:11 <topranks> !log enable cr1-eqiad sub-interfaces for row D vlans T409067
2025-11-03 14:39:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 14:39:14 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 14:39:19 <Lucas_WMDE> MatmaRex: please test :)
2025-11-03 14:39:19 <MatmaRex> testing
2025-11-03 14:39:21 <Lucas_WMDE> ack
2025-11-03 14:39:26 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
2025-11-03 14:40:12 <wikibugs> ('PS2) ''Muehlenhoff: Add separate role for single-node staging DB [puppet] - ''https://gerrit.wikimedia.org/r/1201077 (https://phabricator.wikimedia.org/T381565)'
2025-11-03 14:41:27 <wikibugs> ('CR) ''Ori: [C:''+2] admin: add FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1200217 (https://phabricator.wikimedia.org/T409075) (owner: ''Ori)'
2025-11-03 14:41:37 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, ''Znuny: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11335871 (''jhathaway) a:''jhathaway'
2025-11-03 14:41:43 <wikibugs> ('PS5) ''Ori: admin: add FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1200217 (https://phabricator.wikimedia.org/T409075)'
2025-11-03 14:41:54 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11335872 (''jhathaway) a:''jhathaway'
2025-11-03 14:42:05 <logmsgbot> !log fceratto@cumin1003 dbctl commit (dc=all): 'Cleanup T408408 T408409 T408410', diff saved to https://phabricator.wikimedia.org/P84642 and previous config saved to /var/cache/conftool/dbconfig/20251103-144204-fceratto.json
2025-11-03 14:42:11 <stashbot> T408408: decommission es2029 - https://phabricator.wikimedia.org/T408408
2025-11-03 14:42:11 <stashbot> T408409: decommission es2030 - https://phabricator.wikimedia.org/T408409
2025-11-03 14:42:12 <stashbot> T408410: decommission es2031 - https://phabricator.wikimedia.org/T408410
2025-11-03 14:42:16 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P84643 and previous config saved to /var/cache/conftool/dbconfig/20251103-144215-marostegui.json
2025-11-03 14:42:46 <wikibugs> ('CR) ''Ori: [C:''+2] admin: add FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1200217 (https://phabricator.wikimedia.org/T409075) (owner: ''Ori)'
2025-11-03 14:42:54 <MatmaRex> Lucas_WMDE: all good
2025-11-03 14:42:58 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde, matmarex: Continuing with sync
2025-11-03 14:42:59 <Lucas_WMDE> ok!
2025-11-03 14:44:08 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 14:44:48 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''Mail, ''serviceops: Sendmail network error (deployment) - https://phabricator.wikimedia.org/T407723#11335888 (''jhathaway) p:''Triage''Medium a:''jhathaway'
2025-11-03 14:45:30 <Lucas_WMDE> jouncebot: next
2025-11-03 14:45:30 <jouncebot> In 0 hour(s) and 44 minute(s): xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1530)
2025-11-03 14:45:42 <Lucas_WMDE> ok, so we have some time after the window if the next backports take longer due to i18n
2025-11-03 14:46:46 <wikibugs> ('PS2) ''Tchanders: Deploy temporary accounts to enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200083 (https://phabricator.wikimedia.org/T340001) (owner: ''STran)'
2025-11-03 14:47:18 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1200194|upload: Remove stashed file in UploadFromStash when upload completed (T408610)]], [[gerrit:1201064|recentchanges: Fix highlights where more than one action is defined (T409020)]] (duration: 12m 10s)
2025-11-03 14:47:22 <stashbot> T408610: [Regression] Clicking on images after upload leads to broken links - https://phabricator.wikimedia.org/T408610
2025-11-03 14:47:22 <stashbot> T409020: ChangesListSpecialPage incorrect highlight for mw-changeslist-last - https://phabricator.wikimedia.org/T409020
2025-11-03 14:48:01 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201069 (owner: ''C. Scott Ananian)'
2025-11-03 14:48:02 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by lucaswerkmeister-wmde@deploy2002 using scap backport" [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201070 (https://phabricator.wikimedia.org/T407289) (owner: ''C. Scott Ananian)'
2025-11-03 14:48:24 <topranks> !log make cr1-eqiad VRRP primary for row D vlans T409067
2025-11-03 14:48:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 14:48:26 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 14:49:45 <logmsgbot> fceratto@cumin1003 decommission (PID 3107435) is awaiting input
2025-11-03 14:50:03 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
2025-11-03 14:50:19 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repool db1259', diff saved to https://phabricator.wikimedia.org/P84646 and previous config saved to /var/cache/conftool/dbconfig/20251103-145018-marostegui.json
2025-11-03 14:50:20 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 14:51:35 <wikibugs> ('CR) ''Tchanders: [C:''+1] Deploy temporary accounts to enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200083 (https://phabricator.wikimedia.org/T340001) (owner: ''STran)'
2025-11-03 14:52:19 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Tuesday, November 04 UTC morning backport window](https://wikitech.wikimedia.org/wiki/Deployments#deployc"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200083 (https://phabricator.wikimedia.org/T340001) (owner: ''STran)'
2025-11-03 14:52:52 <wikibugs> ('Merged) ''jenkins-bot: i18n: all behavior switches should start/end with __ (part 2) [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201069 (owner: ''C. Scott Ananian)'
2025-11-03 14:53:25 <logmsgbot> fceratto@cumin1003 decommission (PID 3107435) is awaiting input
2025-11-03 14:53:29 <wikibugs> ('Merged) ''jenkins-bot: i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep [core] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201070 (https://phabricator.wikimedia.org/T407289) (owner: ''C. Scott Ananian)'
2025-11-03 14:53:44 <wikibugs> ('PS1) ''Ori: admin: Remove old, non-FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1201079 (https://phabricator.wikimedia.org/T409075)'
2025-11-03 14:53:50 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Started scap sync-world: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]]
2025-11-03 14:53:59 <stashbot> T407289: Parsoid doesn't handle Japanese behavior switches with U+FF3F (full width underscore) - https://phabricator.wikimedia.org/T407289
2025-11-03 14:55:26 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11335950 (''MatthewVernon) >>! In T404356#11335034, @elukey wrote: > Tried to reimage again, there are some HTTP boot issues that we...'
2025-11-03 14:55:29 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
2025-11-03 14:55:58 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 cscott, lucaswerkmeister-wmde: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 14:56:20 <Lucas_WMDE> cscott: can you test the changes?
2025-11-03 14:56:29 <cscott> yup, will do!
2025-11-03 14:56:33 <wikibugs> ('PS3) ''Tchanders: Deploy temporary accounts to enwiki [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200083 (https://phabricator.wikimedia.org/T409079) (owner: ''STran)'
2025-11-03 14:56:42 <topranks> !log disable et-1/1/3 on cr2-eqiad connecting to asw2-d-eqiad T409067
2025-11-03 14:56:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 14:56:48 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 14:56:48 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 14:56:52 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 14:56:52 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2029.codfw.wmnet
2025-11-03 14:57:13 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS bookworm
2025-11-03 14:57:20 <Lucas_WMDE> that was… surprisingly fast btw
2025-11-03 14:57:31 <Lucas_WMDE> Finished build-and-push-container-images (duration: 01m 08s)
2025-11-03 14:57:41 <wikibugs> ('PS3) ''Muehlenhoff: Add separate role for single-node staging DB [puppet] - ''https://gerrit.wikimedia.org/r/1201077 (https://phabricator.wikimedia.org/T381565)'
2025-11-03 14:57:54 <Lucas_WMDE> so I guess it didn’t need a rebuild of the l10n cache?
2025-11-03 14:58:25 <cscott> yeah it was an edit to Messages*.php so maybe that doesn't affect the l10n cache. It's a magic word, not a Message?
2025-11-03 14:58:38 <Lucas_WMDE> I don’t quite understand it
2025-11-03 14:58:40 <cscott> anyway, tested on etwiki and looks good. Clear to go ahead.
2025-11-03 14:58:43 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.hosts.decommission for hosts es2030.codfw.wmnet
2025-11-03 14:58:45 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 cscott, lucaswerkmeister-wmde: Continuing with sync
2025-11-03 14:58:51 <Lucas_WMDE> because if I change a magic word locally and don’t rebuild the l10n cache, I get an error about it
2025-11-03 14:59:02 <Lucas_WMDE> (usually because I add a wfLoadExtension())
2025-11-03 14:59:07 <Lucas_WMDE> but anyway 🤷
2025-11-03 14:59:50 <wikibugs> ('PS2) ''Clément Goubert: site.pp: Add new wikikube insetup hosts [puppet] - ''https://gerrit.wikimedia.org/r/1200116 (https://phabricator.wikimedia.org/T408749)'
2025-11-03 15:00:22 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
2025-11-03 15:00:24 <wikibugs> ('CR) ''Clément Goubert: site.pp: Add new wikikube insetup hosts (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1200116 (https://phabricator.wikimedia.org/T408749) (owner: ''Clément Goubert)'
2025-11-03 15:00:30 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2148 (T407997)', diff saved to https://phabricator.wikimedia.org/P84647 and previous config saved to /var/cache/conftool/dbconfig/20251103-150029-marostegui.json
2025-11-03 15:00:34 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 15:00:39 <wikibugs> ('CR) ''Ori: [C:''+2] admin: Remove old, non-FIDO key for ori [puppet] - ''https://gerrit.wikimedia.org/r/1201079 (https://phabricator.wikimedia.org/T409075) (owner: ''Ori)'
2025-11-03 15:01:39 <wikibugs> ('PS1) ''Brouberol: dse-k8s-eqiad: add the backend domain to the certificate SANs [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201080 (https://phabricator.wikimedia.org/T408903)'
2025-11-03 15:01:41 <wikibugs> ('PS1) ''Brouberol: growthbook: set the APP_ORIGIN and API_HOST env vars to the public domains [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201081 (https://phabricator.wikimedia.org/T408903)'
2025-11-03 15:02:32 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11335987 (''MatthewVernon) @elukey while I'm at it, you also have a Dell Config-J system for testing (ms-be2078, T406964); are you fi...'
2025-11-03 15:02:34 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 15:03:36 <logmsgbot> !log lucaswerkmeister-wmde@deploy2002 Finished scap sync-world: Backport for [[gerrit:1201069|i18n: all behavior switches should start/end with __ (part 2)]], [[gerrit:1201070|i18n: Remove deprecated behavior switches without underscores in et/sh-latn/vep (T407289)]] (duration: 09m 45s)
2025-11-03 15:03:39 <stashbot> T407289: Parsoid doesn't handle Japanese behavior switches with U+FF3F (full width underscore) - https://phabricator.wikimedia.org/T407289
2025-11-03 15:03:47 <Lucas_WMDE> !log UTC afternoon backport+config window done
2025-11-03 15:03:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 15:04:06 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11335995 (''elukey) >>! In T404356#11335987, @MatthewVernon wrote: > @elukey while I'm at it, you also have a Dell Config-J system fo...'
2025-11-03 15:05:05 <wikibugs> ('PS1) ''Brouberol: postgresql-growthbook: add additional PG parameters [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201082 (https://phabricator.wikimedia.org/T406578)'
2025-11-03 15:05:13 <topranks> !log enable link from asw2-d7-eqiad to ssw1-d8-eqiad T409067
2025-11-03 15:05:15 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 15:05:16 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 15:05:17 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 15:06:16 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Migrate ori to a FIDO-backed key - https://phabricator.wikimedia.org/T409075#11336001 (''ori) ''Open''Resolved a:''ori'
2025-11-03 15:08:18 <wikibugs> ('PS1) ''AOkoth: spamassassin: add multi.uribl.com to deny list [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632)'
2025-11-03 15:08:52 <jinxer-wm> FIRING: [3x] JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 15:11:01 <logmsgbot> fceratto@cumin1003 decommission (PID 3139971) is awaiting input
2025-11-03 15:11:16 <wikibugs> ('PS1) ''Brouberol: growthbook: enable email sending [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201084 (https://phabricator.wikimedia.org/T408904)'
2025-11-03 15:11:45 <cscott> Lucas_WMDE: Thanks!
2025-11-03 15:11:49 <Lucas_WMDE> np :)
2025-11-03 15:13:16 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T407997)', diff saved to https://phabricator.wikimedia.org/P84648 and previous config saved to /var/cache/conftool/dbconfig/20251103-151315-marostegui.json
2025-11-03 15:13:18 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 15:14:44 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 15:15:11 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 15:15:11 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 15:15:12 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2030.codfw.wmnet
2025-11-03 15:16:11 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 15:18:33 <wikibugs> ('CR) ''Xcollazo: dumps: Release the new MW Content File Export. Deprecate legacy XML dumps. (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199783 (https://phabricator.wikimedia.org/T401022) (owner: ''Xcollazo)'
2025-11-03 15:19:18 <wikibugs> ('PS1) ''Brouberol: growthbook: define public configuration for s3 file uploads [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201086 (https://phabricator.wikimedia.org/T408415)'
2025-11-03 15:19:20 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336043 (''VRiley-WMF) a:''VRiley-WMF'
2025-11-03 15:19:52 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 15:21:38 <wikibugs> ('PS1) ''AOkoth: vrts: alert on vrts junk queue size [alerts] - ''https://gerrit.wikimedia.org/r/1201087 (https://phabricator.wikimedia.org/T408632)'
2025-11-03 15:21:53 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.hosts.decommission for hosts es2031.codfw.wmnet
2025-11-03 15:22:38 <wikibugs> ('CR) ''Kamila Součková: [C:''+1] site.pp: Add new wikikube insetup hosts [puppet] - ''https://gerrit.wikimedia.org/r/1200116 (https://phabricator.wikimedia.org/T408749) (owner: ''Clément Goubert)'
2025-11-03 15:25:44 <wikibugs> ('PS1) ''Scott French: Enroll 100% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200409 (https://phabricator.wikimedia.org/T405955)'
2025-11-03 15:25:46 <wikibugs> ('PS1) ''Scott French: mw-(api-ext|web): scale next releases to 30% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200410 (https://phabricator.wikimedia.org/T405955)'
2025-11-03 15:25:47 <wikibugs> ('PS1) ''Scott French: mw-(api-int|jobrunner): serve 50% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200411 (https://phabricator.wikimedia.org/T405955)'
2025-11-03 15:26:36 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 15:28:23 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P84649 and previous config saved to /var/cache/conftool/dbconfig/20251103-152822-marostegui.json
2025-11-03 15:30:05 <jouncebot> Deploy window xLab Experiment Deployment Window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1530)
2025-11-03 15:31:21 <logmsgbot> !log fceratto@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 15:31:49 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
2025-11-03 15:31:49 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 15:31:50 <wikibugs> ('CR) ''Hnowlan: [C:''+1] mw-(api-ext|web): scale next releases to 30% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200410 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 15:31:51 <logmsgbot> !log fceratto@cumin1003 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2031.codfw.wmnet
2025-11-03 15:32:26 <wikibugs> ('CR) ''Hnowlan: [C:''+1] Enroll 100% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200409 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 15:32:50 <wikibugs> ('PS1) ''Scott French: haproxy: add known-client DSL fixture in tests [puppet] - ''https://gerrit.wikimedia.org/r/1200397 (https://phabricator.wikimedia.org/T403220)'
2025-11-03 15:32:52 <wikibugs> ('PS13) ''Scott French: hieradata: pilot use_etcd_known_client_ident on cp2041 [puppet] - ''https://gerrit.wikimedia.org/r/1196544 (https://phabricator.wikimedia.org/T403220)'
2025-11-03 15:33:52 <jinxer-wm> FIRING: [3x] JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 15:36:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 15:37:17 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336096 (''VRiley-WMF) Created a Dell ticket number for a replacement part.'
2025-11-03 15:37:43 <wikibugs> ('PS1) ''Jon Harald Søby: missing.php: Use Codex colors for dark mode [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201085'
2025-11-03 15:37:43 <wikibugs> ('CR) ''Jon Harald Søby: "Might be too small a change for it to matter, but I wanted to give you a chance to yay or nay it, @krinkle@fastmail.com, since you touched" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201085 (owner: ''Jon Harald Søby)'
2025-11-03 15:39:57 <wikibugs> ('PS1) ''MVernon: swift: remove 3 drained nodes for controller swap [puppet] - ''https://gerrit.wikimedia.org/r/1201089 (https://phabricator.wikimedia.org/T400876)'
2025-11-03 15:42:02 <wikibugs> ('PS8) ''Xcollazo: dumps: Release the new MW Content File Export. Deprecate legacy XML dumps. [puppet] - ''https://gerrit.wikimedia.org/r/1199783 (https://phabricator.wikimedia.org/T401022)'
2025-11-03 15:42:20 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336129 (''MatthewVernon) Thanks! Do we have a suitable spare in stock still, in the mean time?'
2025-11-03 15:42:42 <wikibugs> ('CR) ''CI reject: [V:''-1] dumps: Release the new MW Content File Export. Deprecate legacy XML dumps. [puppet] - ''https://gerrit.wikimedia.org/r/1199783 (https://phabricator.wikimedia.org/T401022) (owner: ''Xcollazo)'
2025-11-03 15:43:31 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P84650 and previous config saved to /var/cache/conftool/dbconfig/20251103-154330-marostegui.json
2025-11-03 15:43:45 <wikibugs> ('PS9) ''Xcollazo: dumps: Release the new MW Content File Export. Deprecate legacy XML dumps. [puppet] - ''https://gerrit.wikimedia.org/r/1199783 (https://phabricator.wikimedia.org/T401022)'
2025-11-03 15:43:59 <wikibugs> ('CR) ''Jcrespo: [C:''+1] swift: remove 3 drained nodes for controller swap [puppet] - ''https://gerrit.wikimedia.org/r/1201089 (https://phabricator.wikimedia.org/T400876) (owner: ''MVernon)'
2025-11-03 15:44:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 15:45:56 <wikibugs> ('CR) ''MVernon: [C:''+2] swift: remove 3 drained nodes for controller swap [puppet] - ''https://gerrit.wikimedia.org/r/1201089 (https://phabricator.wikimedia.org/T400876) (owner: ''MVernon)'
2025-11-03 15:47:26 <jinxer-wm> FIRING: InboundMXQueueHigh: MX host mx-in2001:9154 has many queued messages: 1714 #page - https://wikitech.wikimedia.org/wiki/Postfix - https://grafana.wikimedia.org/d/h36Havfik/mail-postfix-servers - https://alerts.wikimedia.org/?q=alertname%3DInboundMXQueueHigh
2025-11-03 15:47:36 <volans> !incidents
2025-11-03 15:47:36 <sirenbot> 6926 (UNACKED) InboundMXQueueHigh sre (mx-in2001:9154 codfw)
2025-11-03 15:47:44 <volans> !ack 6926
2025-11-03 15:47:44 <sirenbot> 6926 (ACKED) InboundMXQueueHigh sre (mx-in2001:9154 codfw)
2025-11-03 15:47:56 <volans> jhathaway: any work in progress on the MXes?
2025-11-03 15:48:33 <wikibugs> ('CR) ''JMeybohm: [C:''+1] "LGTM" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200411 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 15:48:55 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336158 (''VRiley-WMF) Yes, I am about to swap the drive with one of our spares.'
2025-11-03 15:48:58 <jhathaway> volans: I can take a look, was in a meeting
2025-11-03 15:49:16 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336159 (''MatthewVernon) Cool, thank you :)'
2025-11-03 15:49:17 <volans> I'm looking just wanted to exclude any current work in progress
2025-11-03 15:49:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 15:49:37 <jhathaway> volans, nothing in progress, yet
2025-11-03 15:50:20 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops, ''Patch-For-Review: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11336161 (''MatthewVernon)'
2025-11-03 15:51:00 <logmsgbot> !log mvernon@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ms-be[2085-2087].codfw.wmnet with reason: awaiting controller swap
2025-11-03 15:51:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 15:51:32 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops, ''Patch-For-Review: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11336170 (''ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7624a3b2-2e40-48d3-b790-6a86e95d3ac6) set by mvernon@cu...'
2025-11-03 15:52:22 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops, ''Infrastructure-Foundations, and 2 others: Reimage failed after prompt...is prompt needed? - https://phabricator.wikimedia.org/T406656#11336179 (''LSobanski) p:''Triage''Low'
2025-11-03 15:53:04 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops, ''Patch-For-Review: Install new disk controllers to SM swift backends (codfw) - https://phabricator.wikimedia.org/T400876#11336181 (''MatthewVernon) Hi @Jhancock.wm ms-be208[5-7] are now ready for you to swap their controllers, please. I've downtimed the...'
2025-11-03 15:53:24 <icinga-wm> PROBLEM - VRRP status on cr1-eqiad is CRITICAL: VRRP CRITICAL - 6 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
2025-11-03 15:53:24 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
2025-11-03 15:53:52 <jinxer-wm> FIRING: CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:ae4 (asw2-d-eqiad:ae2) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2025-11-03 15:54:30 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
2025-11-03 15:54:36 <wikibugs> 'SRE, ''SRE-swift-storage, ''Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11336191 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be2078.codfw.wmnet with OS bullseye'
2025-11-03 15:54:38 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336192 (''VRiley-WMF) Drive has been replaced. Will keep the ticket open until the replacment comes in.'
2025-11-03 15:54:53 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.hosts.move-vlan for host ms-be2078
2025-11-03 15:54:53 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2078
2025-11-03 15:55:19 <logmsgbot> !log brouberol@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
2025-11-03 15:57:16 <logmsgbot> !log eevans@cumin1003 START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
2025-11-03 15:57:26 <jinxer-wm> RESOLVED: InboundMXQueueHigh: MX host mx-in2001:9154 has many queued messages: 1109 #page - https://wikitech.wikimedia.org/wiki/Postfix - https://grafana.wikimedia.org/d/h36Havfik/mail-postfix-servers - https://alerts.wikimedia.org/?q=alertname%3DInboundMXQueueHigh
2025-11-03 15:58:39 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2148 (T407997)', diff saved to https://phabricator.wikimedia.org/P84651 and previous config saved to /var/cache/conftool/dbconfig/20251103-155838-marostegui.json
2025-11-03 15:58:41 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 15:58:55 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
2025-11-03 15:59:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2175 (T407997)', diff saved to https://phabricator.wikimedia.org/P84652 and previous config saved to /var/cache/conftool/dbconfig/20251103-155902-marostegui.json
2025-11-03 16:04:00 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 16:04:49 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 16:05:50 <wikibugs> ('PS1) ''Muehlenhoff: Record LDAP access for blake [puppet] - ''https://gerrit.wikimedia.org/r/1201094'
2025-11-03 16:06:12 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336275 (''VRiley-WMF) Verified with @MatthewVernon, the replacment looks good.'
2025-11-03 16:07:04 <wikibugs> 'ops-codfw, ''SRE, ''DC-Ops: Alert for device ps1-b8-codfw.mgmt.codfw.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T408963#11336288 (''Jhancock.wm) ''Open''Resolved a:''Jhancock.wm balanced power'
2025-11-03 16:07:44 <wikibugs> ('CR) ''JHathaway: spamassassin: add multi.uribl.com to deny list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 16:08:28 <wikibugs> ('CR) ''Muehlenhoff: [C:''+2] Record LDAP access for blake [puppet] - ''https://gerrit.wikimedia.org/r/1201094 (owner: ''Muehlenhoff)'
2025-11-03 16:11:18 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdf) failed in thanos-be2008 - https://phabricator.wikimedia.org/T409036#11336332 (''Jhancock.wm) @MatthewVernon I have an 8tb replacement drive, but the sata speed is only 6 Gbps instead of 12. will this work? if not i can get a replacement from De...'
2025-11-03 16:11:42 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T407997)', diff saved to https://phabricator.wikimedia.org/P84653 and previous config saved to /var/cache/conftool/dbconfig/20251103-161142-marostegui.json
2025-11-03 16:11:45 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 16:12:18 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, and 2 others: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11336340 (''jhathaway) @Krd I see the junk mail queue is now at 600k, how can I help clear it out, I saw some of the sch...'
2025-11-03 16:12:29 <logmsgbot> !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
2025-11-03 16:12:56 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336350 (''VRiley-WMF) a:''VRiley-WMF'
2025-11-03 16:14:37 <wikibugs> ('CR) ''AOkoth: spamassassin: add multi.uribl.com to deny list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 16:14:41 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336389 (''VRiley-WMF) I am able to preform this. Just to verify, this can be done at anytime, correct? Also, do you happen to have a preference on which disk is pulled @Marostegui ?'
2025-11-03 16:15:14 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336402 (''Marostegui) You can do any disk any time, whatever works for you'
2025-11-03 16:16:00 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdf) failed in thanos-be2008 - https://phabricator.wikimedia.org/T409036#11336407 (''MatthewVernon) @Jhancock.wm that's a good question, to which I don't have a good answer :-/ I think my inclination would be to go for a like-for-like replacement (if...'
2025-11-03 16:16:55 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336408 (''VRiley-WMF) This has been done. Disk in slot 9 has been pulled.'
2025-11-03 16:18:02 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 16:19:14 <wikibugs> ('Abandoned) ''Muehlenhoff: maps/bookworm: Re-enable monitoring [puppet] - ''https://gerrit.wikimedia.org/r/1185048 (https://phabricator.wikimedia.org/T381565) (owner: ''Muehlenhoff)'
2025-11-03 16:19:16 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
2025-11-03 16:22:01 <topranks> !log enable row D vlan sub-interfaces on cr2-eqiad et-1/0/5 T409067
2025-11-03 16:22:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 16:22:04 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 16:22:59 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 16:24:26 <icinga-wm> RECOVERY - VRRP status on cr1-eqiad is OK: VRRP OK - 0 misconfigured interfaces, 0 inconsistent interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23VRRP_status
2025-11-03 16:26:45 <Reedy> jouncebot: nowandnext
2025-11-03 16:26:45 <jouncebot> No deployments scheduled for the next 0 hour(s) and 3 minute(s)
2025-11-03 16:26:45 <jouncebot> In 0 hour(s) and 3 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1630)
2025-11-03 16:26:53 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P84655 and previous config saved to /var/cache/conftool/dbconfig/20251103-162649-marostegui.json
2025-11-03 16:26:54 <wikibugs> ('PS2) ''Reedy: CommonSettings: Remove some OATHAuth config overrides [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1198180 (https://phabricator.wikimedia.org/T404806)'
2025-11-03 16:26:59 <wikibugs> ('CR) ''Reedy: [C:''+2] CommonSettings: Remove some OATHAuth config overrides [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1198180 (https://phabricator.wikimedia.org/T404806) (owner: ''Reedy)'
2025-11-03 16:27:46 <topranks> !log make cr2-eqiad active for row D vlan sub-interfaces on et-1/0/5 T409067
2025-11-03 16:27:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 16:27:49 <stashbot> T409067: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067
2025-11-03 16:27:57 <wikibugs> ('Merged) ''jenkins-bot: CommonSettings: Remove some OATHAuth config overrides [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1198180 (https://phabricator.wikimedia.org/T404806) (owner: ''Reedy)'
2025-11-03 16:28:52 <jinxer-wm> RESOLVED: CoreRouterInterfaceDown: Core router interface down - cr2-eqiad:ae4 (asw2-d-eqiad:ae2) - https://wikitech.wikimedia.org/wiki/Network_monitoring#Router_interface_down - https://grafana.wikimedia.org/d/fb403d62-5f03-434a-9dff-bd02b9fff504/network-device-overview?var-instance=cr2-eqiad:9804 - https://alerts.wikimedia.org/?q=alertname%3DCoreRouterInterfaceDown
2025-11-03 16:29:08 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdf) failed in thanos-be2008 - https://phabricator.wikimedia.org/T409036#11336484 (''Jhancock.wm) process started with dell: SR218123931'
2025-11-03 16:30:05 <jouncebot> jan_drewniak: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimedia Portals Update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1630).
2025-11-03 16:30:11 <wikibugs> ('PS4) ''Muehlenhoff: Add separate role for single-node staging DB [puppet] - ''https://gerrit.wikimedia.org/r/1201077 (https://phabricator.wikimedia.org/T381565)'
2025-11-03 16:30:23 <jan_drewniak> No portal deploy today
2025-11-03 16:30:56 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336489 (''Marostegui) Thanks, for now I see it on the host: ` [373161.755957] megaraid_sas 0000:af:00.0: scanning for scsi0... [373161.756083] megaraid_sas 0000:af:00.0: 2812 (815501780s/0x0001/CR...'
2025-11-03 16:32:15 <wikibugs> ('CR) ''AOkoth: spamassassin: add multi.uribl.com to deny list (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 16:32:31 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199880 (https://phabricator.wikimedia.org/T408765) (owner: ''Arlolra)'
2025-11-03 16:32:44 <logmsgbot> !log eevans@cumin1003 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
2025-11-03 16:34:08 <wikibugs> ('PS1) ''Cathal Mooney: eqiad row d: migrate CR gateway interfaces to port et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201101 (https://phabricator.wikimedia.org/T409067)'
2025-11-03 16:34:15 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336518 (''VRiley-WMF) For documenting purposes. Dell service request number is SR218119927 Inbound shipment is 1-253741250722'
2025-11-03 16:34:50 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11336522 (''Xaosflux) Has the inability to send email out from VRT been confirmed to be related to the parent task, or is this a different problem?'
2025-11-03 16:36:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 16:36:34 <wikibugs> ('PS2) ''Cathal Mooney: eqiad row d: migrate CR gateway interfaces to port et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201101 (https://phabricator.wikimedia.org/T409067)'
2025-11-03 16:36:45 <logmsgbot> !log reedy@deploy2002 Synchronized wmf-config/CommonSettings.php: T404806 (duration: 06m 27s)
2025-11-03 16:36:49 <stashbot> T404806: Remove $wgOATHAllowMultipleModules and $wgOATHAuthNewUI - https://phabricator.wikimedia.org/T404806
2025-11-03 16:36:54 <wikibugs> ('PS1) ''Ozge: feat: updates addalink docker image version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201103'
2025-11-03 16:37:39 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11336534 (''jhathaway) @Xaosflux I assume it is related, but I have not been able to confirm it yet.'
2025-11-03 16:38:45 <logmsgbot> !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
2025-11-03 16:38:52 <wikibugs> ('CR) ''Ozge: [C:''+2] feat: updates addalink docker image version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201103 (owner: ''Ozge)'
2025-11-03 16:39:02 <wikibugs> 'SRE, ''SRE-swift-storage, ''Infrastructure-Foundations: Re-IP Swift hosts to per-rack subnets in codfw rows A-D - https://phabricator.wikimedia.org/T354872#11336541 (''ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be2078.codfw.wmnet with OS bullseye compl...'
2025-11-03 16:39:57 <wikibugs> 'ops-eqiad, ''SRE, ''SRE-swift-storage, ''DC-Ops: Disk (sdl) failed in ms-be1074 - https://phabricator.wikimedia.org/T409040#11336547 (''VRiley-WMF) p:''High''Medium'
2025-11-03 16:40:47 <wikibugs> ('Merged) ''jenkins-bot: feat: updates addalink docker image version [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201103 (owner: ''Ozge)'
2025-11-03 16:41:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 16:42:03 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P84656 and previous config saved to /var/cache/conftool/dbconfig/20251103-164200-marostegui.json
2025-11-03 16:42:50 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 16:43:23 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 16:45:13 <wikibugs> ('PS1) ''Bking: WIP: opensearch-cluster: Add operator user [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201104 (https://phabricator.wikimedia.org/T408919)'
2025-11-03 16:45:43 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.dns.netbox
2025-11-03 16:48:53 <logmsgbot> !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
2025-11-03 16:49:43 <logmsgbot> !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
2025-11-03 16:51:04 <logmsgbot> !log cmooney@cumin1003 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for CR interfaces eqiad row D vlans - cmooney@cumin1003"
2025-11-03 16:51:08 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for CR interfaces eqiad row D vlans - cmooney@cumin1003"
2025-11-03 16:51:08 <logmsgbot> !log cmooney@cumin1003 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
2025-11-03 16:53:45 <wikibugs> 'SRE, ''Infrastructure-Foundations, ''netops, ''Patch-For-Review: Eqiad C/D refresh: move asw2-d-eqiad CR uplinks to Nokia switches - https://phabricator.wikimedia.org/T409067#11336622 (''cmooney) ''Open''Resolved Uplinks moved, the actual gateway move from CR to switches we will wait until Nokia...'
2025-11-03 16:54:18 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: Degraded RAID on an-worker1199 - https://phabricator.wikimedia.org/T409060#11336627 (''VRiley-WMF) Created a Service Request ticket with Dell - SR218125316 Opened inbound ticket 1-253742292236'
2025-11-03 16:54:32 <wikibugs> 'ops-eqiad, ''SRE, ''DC-Ops: Degraded RAID on an-worker1199 - https://phabricator.wikimedia.org/T409060#11336630 (''VRiley-WMF) a:''VRiley-WMF'
2025-11-03 16:55:27 <wikibugs> ('CR) ''A smart kitten: [C:''-1] "I'm not sure that this can currently be deployed on its own; xref T408110#11336607 (tldr: I'm worried that it might result in banners [lik" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201051 (https://phabricator.wikimedia.org/T408110) (owner: ''A smart kitten)'
2025-11-03 16:55:27 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Degraded RAID on es1033 - https://phabricator.wikimedia.org/T409089#11336632 (''VRiley-WMF) a:''VRiley-WMF'
2025-11-03 16:55:53 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336645 (''VRiley-WMF) It has opened the ticket T409089'
2025-11-03 16:56:44 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 16:57:10 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2175 (T407997)', diff saved to https://phabricator.wikimedia.org/P84657 and previous config saved to /var/cache/conftool/dbconfig/20251103-165709-marostegui.json
2025-11-03 16:57:11 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11336650 (''Geagea) In my opinion all the emails from VRT has a delay of six days. That means notifications and answers to customers. I'm all the time receiving six days old notification. Al...'
2025-11-03 16:57:13 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 16:57:26 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
2025-11-03 16:57:34 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2189 (T407997)', diff saved to https://phabricator.wikimedia.org/P84658 and previous config saved to /var/cache/conftool/dbconfig/20251103-165733-marostegui.json
2025-11-03 16:59:51 <wikibugs> 'SRE-swift-storage, ''Infrastructure-Foundations: UEFI installer not installing grub correctly (at least on systems where / is RAID) - https://phabricator.wikimedia.org/T404356#11336664 (''elukey) I tried to reimage ms-be1088 3 times and everything worked as expected without an issue. I had a chat with Matthe...'
2025-11-03 17:00:22 <wikibugs> ('CR) ''David Caro: [V:''+1 C:''+2] "This has been running for the whole day without issues, I'll merge" [puppet] - ''https://gerrit.wikimedia.org/r/1201011 (https://phabricator.wikimedia.org/T409047) (owner: ''David Caro)'
2025-11-03 17:00:27 <logmsgbot> !log elukey@cumin1003 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
2025-11-03 17:03:39 <wikibugs> ('PS1) ''DLynch: Edit check: allow MWVE_FORCE_EDIT_CHECK_ENABLED to override ecenable [extensions/VisualEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201119 (https://phabricator.wikimedia.org/T408890)'
2025-11-03 17:04:44 <wikibugs> ('CR) ''Vgutierrez: P:cache:haproxy: introduce ua classes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 17:05:26 <wikibugs> ('CR) ''Cathal Mooney: [C:''+2] eqiad row d: migrate CR gateway interfaces to port et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201101 (https://phabricator.wikimedia.org/T409067) (owner: ''Cathal Mooney)'
2025-11-03 17:06:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 17:06:47 <wikibugs> ('Merged) ''jenkins-bot: eqiad row d: migrate CR gateway interfaces to port et-1/0/5 [homer/public] - ''https://gerrit.wikimedia.org/r/1201101 (https://phabricator.wikimedia.org/T409067) (owner: ''Cathal Mooney)'
2025-11-03 17:07:14 <wikibugs> ('CR) ''A smart kitten: dumps: Release the new MW Content File Export. Deprecate legacy XML dumps. (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199783 (https://phabricator.wikimedia.org/T401022) (owner: ''Xcollazo)'
2025-11-03 17:07:53 <logmsgbot> !log jhancock@cumin1003 START - Cookbook sre.hosts.provision for host wikikube-worker2203.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2025-11-03 17:08:10 <wikibugs> ('CR) ''Vgutierrez: P:cache::varnish::frontend: render known-client rate limit VCL (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1198182 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
2025-11-03 17:08:18 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [extensions/VisualEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201119 (https://phabricator.wikimedia.org/T408890) (owner: ''DLynch)'
2025-11-03 17:08:54 <wikibugs> 'ops-eqiad, ''SRE, ''collaboration-services, ''DC-Ops: eqiad row C/D Collaboration Services host migrations - https://phabricator.wikimedia.org/T405940#11336723 (''RobH) I just pinged Daniel in irc, I neglected to update this and the gcal, only updating the gsheet. We've run into some issues on the nokia...'
2025-11-03 17:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 17:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 17:09:26 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T407997)', diff saved to https://phabricator.wikimedia.org/P84659 and previous config saved to /var/cache/conftool/dbconfig/20251103-170924-marostegui.json
2025-11-03 17:09:33 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 17:09:49 <wikibugs> ('CR) ''Fabfur: P:cache:haproxy: introduce ua classes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 17:11:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 17:11:28 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Degraded RAID on es1033 - https://phabricator.wikimedia.org/T409089#11336739 (''Marostegui) Excellent! @VRiley-WMF if you want to insert the disk back in, that'd be great'
2025-11-03 17:11:58 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336740 (''Marostegui) This is great! Can you put it back? Thanks'
2025-11-03 17:12:05 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336742 (''Marostegui)'
2025-11-03 17:12:06 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Degraded RAID on es1033 - https://phabricator.wikimedia.org/T409089#11336743 (''Marostegui)'
2025-11-03 17:13:33 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336747 (''VRiley-WMF) Disk has been reinserted. Closing the other ticket.'
2025-11-03 17:14:18 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Degraded RAID on es1033 - https://phabricator.wikimedia.org/T409089#11336749 (''VRiley-WMF) ''Open''Resolved This was a testing ticket. This drive has been reinsterted.'
2025-11-03 17:15:31 <wikibugs> ('PS1) ''Daimona Eaytoy: Enable $wgCampaignEventsEnableContributionTracking in beta [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201124 (https://phabricator.wikimedia.org/T408420)'
2025-11-03 17:16:28 <logmsgbot> jhancock@cumin1003 provision (PID 3273454) is awaiting input
2025-11-03 17:19:18 <Daimona> Hey folks, I have a config change that is beta-only: would someone be willing to merge it now? If not I'll schedule for the next regular window, but it feels kind of a waste, being beta-only.
2025-11-03 17:19:29 <Daimona> (It's the change linked right above)
2025-11-03 17:21:44 <bd808_> jouncebot: nowandnext
2025-11-03 17:21:44 <jouncebot> No deployments scheduled for the next 0 hour(s) and 38 minute(s)
2025-11-03 17:21:44 <jouncebot> In 0 hour(s) and 38 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800)
2025-11-03 17:21:44 <jouncebot> In 0 hour(s) and 38 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800)
2025-11-03 17:22:46 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by bd808@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201124 (https://phabricator.wikimedia.org/T408420) (owner: ''Daimona Eaytoy)'
2025-11-03 17:22:56 <bd808_> Daimona: ^ running
2025-11-03 17:23:37 <wikibugs> ('Merged) ''jenkins-bot: Enable $wgCampaignEventsEnableContributionTracking in beta [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1201124 (https://phabricator.wikimedia.org/T408420) (owner: ''Daimona Eaytoy)'
2025-11-03 17:23:39 <logmsgbot> !log jhancock@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2203.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2025-11-03 17:24:33 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P84660 and previous config saved to /var/cache/conftool/dbconfig/20251103-172433-marostegui.json
2025-11-03 17:24:52 <Daimona> Thank you <3
2025-11-03 17:26:00 <wikibugs> ('CR) ''JHathaway: [C:''+1] spamassassin: add multi.uribl.com to deny list [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 17:26:03 <bd808_> <3 to dancy or whoever made scap know how to decide "Skipping sync since all commits were beta/labs-only changes. Operation completed."
2025-11-03 17:27:39 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336819 (''VRiley-WMF) @Marostegui is this ticket safe to close as well? Or should it still remain open for the time being?'
2025-11-03 17:29:43 <Daimona> Oh nice
2025-11-03 17:29:53 <_joe_> !log ran reprepro cleanvanished on apt-staging to try to clean hanging deb file
2025-11-03 17:29:54 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 17:30:51 <wikibugs> ('PS17) ''Fabfur: P:cache:haproxy: introduce ua classes [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060)'
2025-11-03 17:31:49 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336854 (''Marostegui) Let's give it a minute to wait for the rebuild to finish. Thanks!'
2025-11-03 17:32:16 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336857 (''VRiley-WMF) No problem, thank you!'
2025-11-03 17:32:54 <wikibugs> ('CR) ''Fabfur: P:cache:haproxy: introduce ua classes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1199247 (https://phabricator.wikimedia.org/T408060) (owner: ''Fabfur)'
2025-11-03 17:36:11 <wikibugs> 'ops-eqiad, ''SRE, ''DBA, ''DC-Ops: Pull a disk out from es1033 - https://phabricator.wikimedia.org/T409030#11336899 (''Marostegui) For what is worth ` root@es1033:~# megacli -PDRbld -ShowProg -PhysDrv [32:9] -aALL Rebuild Progress on Device at Enclosure 32, Slot 9 Completed 14% in 23 Minutes. Exit Code...'
2025-11-03 17:39:01 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 17:39:27 <logmsgbot> !log elukey@cumin1003 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be1088.eqiad.wmnet with OS trixie
2025-11-03 17:39:41 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P84661 and previous config saved to /var/cache/conftool/dbconfig/20251103-173940-marostegui.json
2025-11-03 17:40:35 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
2025-11-03 17:44:29 <wikibugs> 'SRE, ''collaboration-services, ''Infrastructure-Foundations, ''vrts, and 2 others: VRTS is spammed with bounce e-mails and is going to break - https://phabricator.wikimedia.org/T408632#11336911 (''Krd) I think the focus should be to determine if the queue size is the cause of the impact or not. I.e. if t...'
2025-11-03 17:47:34 <logmsgbot> !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
2025-11-03 17:47:47 <logmsgbot> !log elukey@cumin1003 START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
2025-11-03 17:48:07 <logmsgbot> !log elukey@cumin1003 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
2025-11-03 17:49:21 <icinga-wm> PROBLEM - Swift https backend on ms-fe2019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
2025-11-03 17:50:11 <icinga-wm> RECOVERY - Swift https backend on ms-fe2019 is OK: HTTP OK: HTTP/1.1 200 OK - 506 bytes in 0.189 second response time https://wikitech.wikimedia.org/wiki/Swift
2025-11-03 17:52:21 <icinga-wm> PROBLEM - Thanos swift https on thanos-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Thanos
2025-11-03 17:54:11 <icinga-wm> RECOVERY - Thanos swift https on thanos-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 279 bytes in 0.067 second response time https://wikitech.wikimedia.org/wiki/Thanos
2025-11-03 17:54:50 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2189 (T407997)', diff saved to https://phabricator.wikimedia.org/P84662 and previous config saved to /var/cache/conftool/dbconfig/20251103-175448-marostegui.json
2025-11-03 17:54:53 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 17:54:54 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
2025-11-03 18:00:04 <jouncebot> swfrench-wmf: MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800). Please do the needful.
2025-11-03 18:00:05 <jouncebot> ryankemper: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800).
2025-11-03 18:00:17 <swfrench-wmf> o/
2025-11-03 18:01:45 <wikibugs> ('CR) ''Scott French: [C:''+2] mw-(api-ext|web): scale next releases to 30% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200410 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:03:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:03:51 <wikibugs> ('Merged) ''jenkins-bot: mw-(api-ext|web): scale next releases to 30% of main [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200410 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:04:53 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance
2025-11-03 18:05:01 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2207 (T407997)', diff saved to https://phabricator.wikimedia.org/P84663 and previous config saved to /var/cache/conftool/dbconfig/20251103-180500-marostegui.json
2025-11-03 18:05:04 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 18:05:39 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
2025-11-03 18:05:56 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
2025-11-03 18:06:02 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
2025-11-03 18:06:04 <wikibugs> ('PS1) ''Kosta Harlan: hCaptcha: use ve.newTarget hook to avoid globals [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201167 (https://phabricator.wikimedia.org/T408670)'
2025-11-03 18:06:21 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
2025-11-03 18:06:33 <kostajh> jouncebot: nowandnext
2025-11-03 18:06:33 <jouncebot> For the next 0 hour(s) and 53 minute(s): MediaWiki infrastructure (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800)
2025-11-03 18:06:33 <jouncebot> For the next 0 hour(s) and 23 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T1800)
2025-11-03 18:06:33 <jouncebot> In 2 hour(s) and 53 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T2100)
2025-11-03 18:07:01 <kostajh> can I deploy a MediaWiki patch now?
2025-11-03 18:07:08 <swfrench-wmf> kostajh: I'll probably be done in ~ 30-40 minutes
2025-11-03 18:07:29 <kostajh> swfrench-wmf: sounds good
2025-11-03 18:07:34 <kostajh> please ping me when you're done, thanks
2025-11-03 18:07:42 <swfrench-wmf> ack, can do
2025-11-03 18:08:19 <wikibugs> ('CR) ''DLynch: [C:''+1] hCaptcha: use ve.newTarget hook to avoid globals [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201167 (https://phabricator.wikimedia.org/T408670) (owner: ''Kosta Harlan)'
2025-11-03 18:08:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:08:37 <icinga-wm> PROBLEM - Check unit status of statograph_post on alert1002 is CRITICAL: CRITICAL: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2025-11-03 18:08:45 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
2025-11-03 18:09:00 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
2025-11-03 18:09:05 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
2025-11-03 18:09:23 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
2025-11-03 18:13:09 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by swfrench@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200409 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:13:56 <wikibugs> 'ops-eqiad, ''SRE, ''collaboration-services, ''DC-Ops: eqiad row C/D Collaboration Services host migrations - https://phabricator.wikimedia.org/T405940#11337034 (''Dzahn) No worries at all. For unrelated reasons we also didn't have the time to do this today anyways. And let's err on the side of caution an...'
2025-11-03 18:14:21 <wikibugs> ('Merged) ''jenkins-bot: Enroll 100% of client sessions in PHP 8.3 [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200409 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:14:42 <logmsgbot> !log swfrench@deploy2002 Started scap sync-world: Backport for [[gerrit:1200409|Enroll 100% of client sessions in PHP 8.3 (T405955)]]
2025-11-03 18:14:49 <stashbot> T405955: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955
2025-11-03 18:16:47 <logmsgbot> !log swfrench@deploy2002 swfrench: Backport for [[gerrit:1200409|Enroll 100% of client sessions in PHP 8.3 (T405955)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 18:16:51 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T407997)', diff saved to https://phabricator.wikimedia.org/P84664 and previous config saved to /var/cache/conftool/dbconfig/20251103-181650-marostegui.json
2025-11-03 18:17:02 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 18:17:50 <logmsgbot> !log swfrench@deploy2002 swfrench: Continuing with sync
2025-11-03 18:18:37 <icinga-wm> RECOVERY - Check unit status of statograph_post on alert1002 is OK: OK: Status of the systemd unit statograph_post https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
2025-11-03 18:20:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:21:48 <wikibugs> ('PS1) ''Dzahn: httpbb: adjust WDQS .git URL tests [puppet] - ''https://gerrit.wikimedia.org/r/1201176 (https://phabricator.wikimedia.org/T294917)'
2025-11-03 18:22:17 <logmsgbot> !log swfrench@deploy2002 Finished scap sync-world: Backport for [[gerrit:1200409|Enroll 100% of client sessions in PHP 8.3 (T405955)]] (duration: 07m 34s)
2025-11-03 18:22:26 <stashbot> T405955: MediaWiki on PHP 8.3 production workload migration - https://phabricator.wikimedia.org/T405955
2025-11-03 18:22:48 <wikibugs> ('CR) ''Scott French: "Thanks, Valentin!" [puppet] - ''https://gerrit.wikimedia.org/r/1198182 (https://phabricator.wikimedia.org/T403220) (owner: ''Scott French)'
2025-11-03 18:24:12 <wikibugs> ('CR) ''Scott French: [C:''+2] mw-(api-int|jobrunner): serve 50% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200411 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:25:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:26:13 <wikibugs> ('Merged) ''jenkins-bot: mw-(api-int|jobrunner): serve 50% of traffic on PHP 8.3 [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200411 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:28:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 18:28:25 <wikibugs> ('CR) ''AOkoth: [C:''+2] spamassassin: add multi.uribl.com to deny list [puppet] - ''https://gerrit.wikimedia.org/r/1201083 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 18:29:24 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
2025-11-03 18:29:39 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
2025-11-03 18:30:22 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
2025-11-03 18:30:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:30:35 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
2025-11-03 18:30:47 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:31:05 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:31:59 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P84665 and previous config saved to /var/cache/conftool/dbconfig/20251103-183159-marostegui.json
2025-11-03 18:32:09 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:32:21 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:34:52 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
2025-11-03 18:35:03 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
2025-11-03 18:36:09 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
2025-11-03 18:36:21 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
2025-11-03 18:36:37 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:36:52 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:37:24 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:37:34 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
2025-11-03 18:45:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 18:47:07 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P84666 and previous config saved to /var/cache/conftool/dbconfig/20251103-184706-marostegui.json
2025-11-03 18:49:40 <swfrench-wmf> kostajh: I think you're good to go with your patch. I'll continue to do some work in the background, but should not affect deployments.
2025-11-03 18:50:11 <kostajh> Thanks! I can’t start for another 15-20 minutes but I’ll write here when I do
2025-11-03 18:50:24 <swfrench-wmf> thumbs up
2025-11-03 18:52:08 <wikibugs> ('CR) ''Scott French: "Thanks for the review!" [puppet] - ''https://gerrit.wikimedia.org/r/1200142 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:52:16 <wikibugs> ('CR) ''Scott French: [C:''+2] deployment_server: default to PHP 8.3 in mwscript-k8s [puppet] - ''https://gerrit.wikimedia.org/r/1200142 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 18:53:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 19:02:14 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2207 (T407997)', diff saved to https://phabricator.wikimedia.org/P84667 and previous config saved to /var/cache/conftool/dbconfig/20251103-190214-marostegui.json
2025-11-03 19:02:18 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 19:02:30 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2225.codfw.wmnet with reason: Maintenance
2025-11-03 19:02:38 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2225 (T407997)', diff saved to https://phabricator.wikimedia.org/P84668 and previous config saved to /var/cache/conftool/dbconfig/20251103-190237-marostegui.json
2025-11-03 19:14:36 <wikibugs> ('PS1) ''BCornwall: ncredir: Update donate.wikipedia25.{org,com} redir [puppet] - ''https://gerrit.wikimedia.org/r/1201183 (https://phabricator.wikimedia.org/T408168)'
2025-11-03 19:14:43 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T407997)', diff saved to https://phabricator.wikimedia.org/P84669 and previous config saved to /var/cache/conftool/dbconfig/20251103-191442-marostegui.json
2025-11-03 19:14:46 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 19:16:13 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11337299 (''Aafi) Confirmation receipts are also not received by customers, and several of our community members reported that they didn't receive any response emails from the wm-deoband que...'
2025-11-03 19:16:26 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190742 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 19:16:40 <wikibugs> ('CR) ''ScheduleDeploymentBot: "Scheduled for deployment in the [Monday, November 03 UTC late backport window](https://wikitech.wikimedia.org/wiki/Deployments#deploycal-i"; [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190743 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 19:17:11 <wikibugs> ('PS2) ''BCornwall: ncredir: Update donate.wikipedia25.{org,com} redir [puppet] - ''https://gerrit.wikimedia.org/r/1201183 (https://phabricator.wikimedia.org/T408168)'
2025-11-03 19:20:23 <wikibugs> ('CR) ''Ssingh: [C:''+1] ncredir: Update donate.wikipedia25.{org,com} redir [puppet] - ''https://gerrit.wikimedia.org/r/1201183 (https://phabricator.wikimedia.org/T408168) (owner: ''BCornwall)'
2025-11-03 19:21:18 <wikibugs> ('CR) ''Ssingh: [C:''+1] "Yeah PS2 is better indeed." [puppet] - ''https://gerrit.wikimedia.org/r/1201183 (https://phabricator.wikimedia.org/T408168) (owner: ''BCornwall)'
2025-11-03 19:22:49 <wikibugs> ('CR) ''Dzahn: [C:''+2] httpbb: adjust WDQS .git URL tests [puppet] - ''https://gerrit.wikimedia.org/r/1201176 (https://phabricator.wikimedia.org/T294917) (owner: ''Dzahn)'
2025-11-03 19:23:54 <wikibugs> ('CR) ''BCornwall: [C:''+2] ncredir: Update donate.wikipedia25.{org,com} redir [puppet] - ''https://gerrit.wikimedia.org/r/1201183 (https://phabricator.wikimedia.org/T408168) (owner: ''BCornwall)'
2025-11-03 19:23:55 <wikibugs> ('CR) ''Samuel (WMF): [C:''+1] hCaptcha: use ve.newTarget hook to avoid globals [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201167 (https://phabricator.wikimedia.org/T408670) (owner: ''Kosta Harlan)'
2025-11-03 19:24:12 <wikibugs> ('CR) ''Andrew Bogott: [C:''+1] P:openstack::designate: Remove check_dns_query [puppet] - ''https://gerrit.wikimedia.org/r/1200306 (owner: ''Majavah)'
2025-11-03 19:27:05 <kostajh> swfrench-wmf: ok, I'll get started here in a minute
2025-11-03 19:27:13 <jouncebot> No deployments scheduled for the next 1 hour(s) and 32 minute(s)
2025-11-03 19:27:13 <kostajh> jouncebot: nowandnext
2025-11-03 19:27:13 <jouncebot> In 1 hour(s) and 32 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T2100)
2025-11-03 19:27:29 <wikibugs> ('CR) ''Andrew Bogott: [C:''+1] "no objection from me; I think the previous metric was just a proof of concept that was never used for much." [puppet] - ''https://gerrit.wikimedia.org/r/1199305 (https://phabricator.wikimedia.org/T408457) (owner: ''Majavah)'
2025-11-03 19:27:43 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201167 (https://phabricator.wikimedia.org/T408670) (owner: ''Kosta Harlan)'
2025-11-03 19:29:19 <wikibugs> ('Merged) ''jenkins-bot: hCaptcha: use ve.newTarget hook to avoid globals [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201167 (https://phabricator.wikimedia.org/T408670) (owner: ''Kosta Harlan)'
2025-11-03 19:29:22 <federico3> !oncall
2025-11-03 19:29:39 <logmsgbot> !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1201167|hCaptcha: use ve.newTarget hook to avoid globals (T408670)]]
2025-11-03 19:29:42 <stashbot> T408670: Uncaught TypeError: can't access property "surface", ve.init.target is null - https://phabricator.wikimedia.org/T408670
2025-11-03 19:29:47 <wikibugs> 'SRE, ''Hiddenparma, ''Traffic, ''Patch-For-Review: Integrate code from the private repository into the CDN - https://phabricator.wikimedia.org/T404826#11337351 (''Milimetric) > I don't think it should be discussed eslewhere because it's not really a valid concern here: > > * We only call the browser...'
2025-11-03 19:29:51 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P84670 and previous config saved to /var/cache/conftool/dbconfig/20251103-192950-marostegui.json
2025-11-03 19:29:58 <wikibugs> ('CR) ''Andrew Bogott: [C:''+1] "Glad to see someone is working on nagios deprecation! Will the contact_group still be honored by alert manager or do we need to test/refac" [puppet] - ''https://gerrit.wikimedia.org/r/1200016 (https://phabricator.wikimedia.org/T328502) (owner: ''Tiziano Fogli)'
2025-11-03 19:30:04 <wikibugs> ('CR) ''Andrew Bogott: [C:''+1] nova: enable nrpe2nodexp wrapper on check-flavor_aggregates [puppet] - ''https://gerrit.wikimedia.org/r/1200018 (https://phabricator.wikimedia.org/T328502) (owner: ''Tiziano Fogli)'
2025-11-03 19:31:42 <logmsgbot> !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1201167|hCaptcha: use ve.newTarget hook to avoid globals (T408670)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 19:32:59 <logmsgbot> !log kharlan@deploy2002 kharlan: Continuing with sync
2025-11-03 19:34:01 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 19:35:04 <wikibugs> ('CR) ''Dzahn: vrts: alert on vrts junk queue size (''1 comment) [alerts] - ''https://gerrit.wikimedia.org/r/1201087 (https://phabricator.wikimedia.org/T408632) (owner: ''AOkoth)'
2025-11-03 19:35:56 <wikibugs> ('CR) ''Dzahn: [C:''-1] "we decided to use HAproxy instead of envoy for this (so far). so -1 based on that." [puppet] - ''https://gerrit.wikimedia.org/r/1198281 (https://phabricator.wikimedia.org/T365259) (owner: ''Jelto)'
2025-11-03 19:37:08 <wikibugs> ('PS1) ''Kosta Harlan: SimpleCaptcha: Ensure correct instance is used on page creation [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201185 (https://phabricator.wikimedia.org/T408975)'
2025-11-03 19:37:26 <logmsgbot> !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1201167|hCaptcha: use ve.newTarget hook to avoid globals (T408670)]] (duration: 07m 47s)
2025-11-03 19:37:29 <stashbot> T408670: Uncaught TypeError: can't access property "surface", ve.init.target is null - https://phabricator.wikimedia.org/T408670
2025-11-03 19:38:07 <wikibugs> ('PS3) ''Scott French: mw-(api-ext|web): right-size given current traffic allocation [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200412 (https://phabricator.wikimedia.org/T405955)'
2025-11-03 19:38:39 <kostajh> on to the next one
2025-11-03 19:38:47 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201185 (https://phabricator.wikimedia.org/T408975) (owner: ''Kosta Harlan)'
2025-11-03 19:41:21 <wikibugs> ('PS1) ''Dzahn: admin: add dpogorzelski to ml-team-admins, analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/1201187 (https://phabricator.wikimedia.org/T408579)'
2025-11-03 19:43:50 <wikibugs> ('CR) ''Dzahn: "still needs approval from Calbon (but he already approved global root at https://phabricator.wikimedia.org/T408702)" [puppet] - ''https://gerrit.wikimedia.org/r/1201187 (https://phabricator.wikimedia.org/T408579) (owner: ''Dzahn)'
2025-11-03 19:44:59 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P84672 and previous config saved to /var/cache/conftool/dbconfig/20251103-194457-marostegui.json
2025-11-03 19:45:12 <logmsgbot> !log bking@deploy2002 helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
2025-11-03 19:45:15 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Patch-For-Review: Add dpogorzelski to ML and Data Platform posix groups - https://phabricator.wikimedia.org/T408579#11337419 (''Dzahn) a:''calbon Hello @calbon can we have one more approval over here for the ml-team-admins and analytics-privatedata part?'
2025-11-03 19:45:39 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Data-Engineering, ''Patch-For-Review: Add dpogorzelski to ML and Data Platform posix groups - https://phabricator.wikimedia.org/T408579#11337422 (''Dzahn) ''Open''In progress'
2025-11-03 19:45:47 <logmsgbot> !log bking@deploy2002 helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
2025-11-03 19:46:53 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11337430 (''Dzahn) a:''thcipriani'
2025-11-03 19:47:48 <wikibugs> ('PS32) ''CDobbins: sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240)'
2025-11-03 19:49:02 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to deployment for ItamarWMDE - https://phabricator.wikimedia.org/T408924#11337433 (''Dzahn) Looking at the reason for access line this seems like "restricted" might be enough? Because that is usually used for running maintenance scripts. But if the "dumps bash s...'
2025-11-03 19:50:23 <wikibugs> ('Merged) ''jenkins-bot: SimpleCaptcha: Ensure correct instance is used on page creation [extensions/ConfirmEdit] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201185 (https://phabricator.wikimedia.org/T408975) (owner: ''Kosta Harlan)'
2025-11-03 19:50:26 <wikibugs> ('PS33) ''CDobbins: sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240)'
2025-11-03 19:50:41 <logmsgbot> !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1201185|SimpleCaptcha: Ensure correct instance is used on page creation (T408975)]]
2025-11-03 19:50:44 <stashbot> T408975: New editors are unable to create pages with external links in them - https://phabricator.wikimedia.org/T408975
2025-11-03 19:51:11 <icinga-wm> PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
2025-11-03 19:51:30 <wikibugs> ('PS1) ''Kosta Harlan: Hooks: Fetch correct SimpleCaptcha instance in onEditPage__attemptSave_after [extensions/WikiEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201189 (https://phabricator.wikimedia.org/T408975)'
2025-11-03 19:52:42 <logmsgbot> !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1201185|SimpleCaptcha: Ensure correct instance is used on page creation (T408975)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 19:53:40 <logmsgbot> !log kharlan@deploy2002 kharlan: Continuing with sync
2025-11-03 19:54:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 19:54:36 <wikibugs> ('PS34) ''CDobbins: sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240)'
2025-11-03 19:54:49 <mutante> ^ WDQS alerts: that already has 2 tickets with attention
2025-11-03 19:56:13 <icinga-wm> RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 239.56 ms
2025-11-03 19:58:03 <logmsgbot> !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1201185|SimpleCaptcha: Ensure correct instance is used on page creation (T408975)]] (duration: 07m 22s)
2025-11-03 19:58:05 <stashbot> T408975: New editors are unable to create pages with external links in them - https://phabricator.wikimedia.org/T408975
2025-11-03 19:58:21 <kostajh> last patch being synced now
2025-11-03 19:58:26 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kharlan@deploy2002 using scap backport" [extensions/WikiEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201189 (https://phabricator.wikimedia.org/T408975) (owner: ''Kosta Harlan)'
2025-11-03 19:59:23 <wikibugs> ('CR) ''RLazarus: [C:''+1] mw-(api-ext|web): right-size given current traffic allocation [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200412 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 20:00:06 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2225 (T407997)', diff saved to https://phabricator.wikimedia.org/P84673 and previous config saved to /var/cache/conftool/dbconfig/20251103-200006-marostegui.json
2025-11-03 20:00:10 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 20:00:23 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2226.codfw.wmnet with reason: Maintenance
2025-11-03 20:00:31 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84674 and previous config saved to /var/cache/conftool/dbconfig/20251103-200030-marostegui.json
2025-11-03 20:01:28 <wikibugs> ('CR) ''CI reject: [V:''-1] sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240) (owner: ''CDobbins)'
2025-11-03 20:01:43 <wikibugs> ('PS35) ''CDobbins: sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240)'
2025-11-03 20:02:37 <icinga-wm> PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
2025-11-03 20:02:56 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84675 and previous config saved to /var/cache/conftool/dbconfig/20251103-200255-marostegui.json
2025-11-03 20:03:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 20:03:42 <swfrench-wmf> kostajh: could you ping me when you're done with your series of backports? I'd like to make some quick capacity tweaks as a follow-up to the work that happened earlier.
2025-11-03 20:04:01 <kostajh> swfrench-wmf: yes, nearly done
2025-11-03 20:04:09 <swfrench-wmf> amazing, thanks
2025-11-03 20:04:49 <kostajh> swfrench-wmf: I'd guess probably 15-20 minutes, depending on CI and scap speed etc
2025-11-03 20:05:04 <swfrench-wmf> kostajh: sounds good, I'll be around :)
2025-11-03 20:06:44 <jinxer-wm> FIRING: [2x] RipeAtlasAnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold for measurement 95145506 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
2025-11-03 20:07:32 <wikibugs> ('CR) ''CI reject: [V:''-1] sre.loadbalancer: modify admin.py to accept 'reboot' action [cookbooks] - ''https://gerrit.wikimedia.org/r/1180137 (https://phabricator.wikimedia.org/T395240) (owner: ''CDobbins)'
2025-11-03 20:07:39 <icinga-wm> RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 234.38 ms
2025-11-03 20:10:08 <wikibugs> ('Merged) ''jenkins-bot: Hooks: Fetch correct SimpleCaptcha instance in onEditPage__attemptSave_after [extensions/WikiEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201189 (https://phabricator.wikimedia.org/T408975) (owner: ''Kosta Harlan)'
2025-11-03 20:10:25 <logmsgbot> !log kharlan@deploy2002 Started scap sync-world: Backport for [[gerrit:1201189|Hooks: Fetch correct SimpleCaptcha instance in onEditPage__attemptSave_after (T408975)]]
2025-11-03 20:10:29 <stashbot> T408975: New editors are unable to create pages with external links in them - https://phabricator.wikimedia.org/T408975
2025-11-03 20:11:44 <jinxer-wm> FIRING: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold for measurement 95145506 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
2025-11-03 20:12:29 <logmsgbot> !log kharlan@deploy2002 kharlan: Backport for [[gerrit:1201189|Hooks: Fetch correct SimpleCaptcha instance in onEditPage__attemptSave_after (T408975)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 20:13:27 <logmsgbot> !log kharlan@deploy2002 kharlan: Continuing with sync
2025-11-03 20:14:07 <wikibugs> 'SRE, ''LDAP-Access-Requests: Grant Access to Superset for vicaplet-wmde - https://phabricator.wikimedia.org/T408920#11337545 (''Dzahn) Hi @Virginie.caplet can you please send an email to @KFrancis [[ https://meta.wikimedia.org/wiki/User:KFrancis_(WMF) | Katie Francis ]] and say that you would like to start t...'
2025-11-03 20:16:44 <jinxer-wm> RESOLVED: [4x] RipeAtlasAnchorUnreachable: ipv6 ping to eqsin RIPE Atlas anchor: failures over threshold for measurement 95145506 - https://wikitech.wikimedia.org/wiki/Network_monitoring#Atlas_alerts - https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DRipeAtlasAnchorUnreachable
2025-11-03 20:17:39 <jinxer-wm> FIRING: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
2025-11-03 20:17:47 <logmsgbot> !log kharlan@deploy2002 Finished scap sync-world: Backport for [[gerrit:1201189|Hooks: Fetch correct SimpleCaptcha instance in onEditPage__attemptSave_after (T408975)]] (duration: 07m 22s)
2025-11-03 20:17:51 <stashbot> T408975: New editors are unable to create pages with external links in them - https://phabricator.wikimedia.org/T408975
2025-11-03 20:18:05 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P84676 and previous config saved to /var/cache/conftool/dbconfig/20251103-201803-marostegui.json
2025-11-03 20:19:33 <kostajh> swfrench-wmf: all done
2025-11-03 20:19:35 <kostajh> thanks
2025-11-03 20:19:48 <swfrench-wmf> kostajh: great, thank you! I'll get started shortly, then.
2025-11-03 20:21:02 <wikibugs> ('CR) ''Scott French: "Thanks for the review!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200412 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 20:21:03 <wikibugs> ('CR) ''Scott French: [C:''+2] mw-(api-ext|web): right-size given current traffic allocation [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200412 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 20:23:06 <wikibugs> ('Merged) ''jenkins-bot: mw-(api-ext|web): right-size given current traffic allocation [deployment-charts] - ''https://gerrit.wikimedia.org/r/1200412 (https://phabricator.wikimedia.org/T405955) (owner: ''Scott French)'
2025-11-03 20:25:27 <jinxer-wm> RESOLVED: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 20:26:52 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
2025-11-03 20:27:08 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
2025-11-03 20:28:02 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 20:31:31 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
2025-11-03 20:31:45 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
2025-11-03 20:31:54 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
2025-11-03 20:32:09 <logmsgbot> !log swfrench@deploy2002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
2025-11-03 20:33:13 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P84677 and previous config saved to /var/cache/conftool/dbconfig/20251103-203312-marostegui.json
2025-11-03 20:38:07 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
2025-11-03 20:38:18 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
2025-11-03 20:38:28 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
2025-11-03 20:38:45 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
2025-11-03 20:38:55 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
2025-11-03 20:39:04 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
2025-11-03 20:39:14 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] START helmfile.d/services/mw-web: apply
2025-11-03 20:39:26 <logmsgbot> !log swfrench@deploy2002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
2025-11-03 20:48:20 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84678 and previous config saved to /var/cache/conftool/dbconfig/20251103-204820-marostegui.json
2025-11-03 20:48:23 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 20:48:37 <logmsgbot> !log marostegui@cumin1003 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2238.codfw.wmnet with reason: Maintenance
2025-11-03 20:48:44 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Depooling db2238 (T407997)', diff saved to https://phabricator.wikimedia.org/P84679 and previous config saved to /var/cache/conftool/dbconfig/20251103-204844-marostegui.json
2025-11-03 20:49:26 <swfrench-wmf> alright, the dust has settled after my capacity tweaks and I believe I'm done for now
2025-11-03 21:00:05 <jouncebot> RoanKattouw, Urbanecm, TheresNoTime, kindrobot, and cjming: gettimeofday() says it's time for UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T2100)
2025-11-03 21:00:05 <jouncebot> ZhaoFJx, arlolra, kemayo, Superpes, and AaronSchulz: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
2025-11-03 21:00:19 <Kemayo> o/
2025-11-03 21:00:45 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T407997)', diff saved to https://phabricator.wikimedia.org/P84680 and previous config saved to /var/cache/conftool/dbconfig/20251103-210044-marostegui.json
2025-11-03 21:00:50 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 21:00:53 <ZhaoFJx> o/
2025-11-03 21:00:59 <Superpes> \o
2025-11-03 21:01:03 <arlolra> hello
2025-11-03 21:03:39 <AaronSchulz> so, I'm going to do the sandbox change
2025-11-03 21:06:50 <wikibugs> ('PS7) ''Aaron Schulz: Set wgRestSandboxSpecs['wmf-restbase'] on testwiki to use the static specs [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190742 (https://phabricator.wikimedia.org/T396805)'
2025-11-03 21:07:01 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by aaron@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190742 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 21:07:48 <wikibugs> ('Merged) ''jenkins-bot: Set wgRestSandboxSpecs['wmf-restbase'] on testwiki to use the static specs [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190742 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 21:08:09 <logmsgbot> !log aaron@deploy2002 Started scap sync-world: Backport for [[gerrit:1190742|Set wgRestSandboxSpecs['wmf-restbase'] on testwiki to use the static specs (T396805)]]
2025-11-03 21:08:12 <stashbot> T396805: Define static OpenAPI specs per API family for RESTbase endpoints - https://phabricator.wikimedia.org/T396805
2025-11-03 21:09:01 <jinxer-wm> FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate default-staging-certificate.wmnet is about to expire - https://wikitech.wikimedia.org/wiki/Puppet#Renew_agent_certificate - TODO - https://alerts.wikimedia.org/?q=alertname%3DPuppetCertificateAboutToExpire
2025-11-03 21:09:01 <jinxer-wm> FIRING: [21x] CertAlmostExpired: Certificate for service cr2-eqsin.wikimedia.org:32767 is about to expire - TODO - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired
2025-11-03 21:10:16 <logmsgbot> !log aaron@deploy2002 aaron: Backport for [[gerrit:1190742|Set wgRestSandboxSpecs['wmf-restbase'] on testwiki to use the static specs (T396805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 21:11:13 <logmsgbot> !log aaron@deploy2002 aaron: Continuing with sync
2025-11-03 21:13:18 <Kemayo> I can get my own one via spiderpig. Not certain if I need to wait for this sandbox change to go through first to be safe.
2025-11-03 21:13:41 <wikibugs> ('PS4) ''Aaron Schulz: Set wgRestSandboxSpecs['wmf-restbase'] to use the static specs everywhere [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190743 (https://phabricator.wikimedia.org/T396805)'
2025-11-03 21:15:25 <logmsgbot> !log aaron@deploy2002 Finished scap sync-world: Backport for [[gerrit:1190742|Set wgRestSandboxSpecs['wmf-restbase'] on testwiki to use the static specs (T396805)]] (duration: 07m 16s)
2025-11-03 21:15:33 <stashbot> T396805: Define static OpenAPI specs per API family for RESTbase endpoints - https://phabricator.wikimedia.org/T396805
2025-11-03 21:15:55 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P84681 and previous config saved to /var/cache/conftool/dbconfig/20251103-211552-marostegui.json
2025-11-03 21:16:19 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by aaron@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190743 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 21:17:45 <wikibugs> ('Merged) ''jenkins-bot: Set wgRestSandboxSpecs['wmf-restbase'] to use the static specs everywhere [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1190743 (https://phabricator.wikimedia.org/T396805) (owner: ''Aaron Schulz)'
2025-11-03 21:18:03 <logmsgbot> !log aaron@deploy2002 Started scap sync-world: Backport for [[gerrit:1190743|Set wgRestSandboxSpecs['wmf-restbase'] to use the static specs everywhere (T396805)]]
2025-11-03 21:20:16 <logmsgbot> !log aaron@deploy2002 aaron: Backport for [[gerrit:1190743|Set wgRestSandboxSpecs['wmf-restbase'] to use the static specs everywhere (T396805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 21:21:27 <logmsgbot> !log aaron@deploy2002 aaron: Continuing with sync
2025-11-03 21:25:35 <logmsgbot> !log aaron@deploy2002 Finished scap sync-world: Backport for [[gerrit:1190743|Set wgRestSandboxSpecs['wmf-restbase'] to use the static specs everywhere (T396805)]] (duration: 07m 31s)
2025-11-03 21:25:38 <stashbot> T396805: Define static OpenAPI specs per API family for RESTbase endpoints - https://phabricator.wikimedia.org/T396805
2025-11-03 21:25:51 <AaronSchulz> done
2025-11-03 21:25:59 <Kemayo> Great, I will get mine next.
2025-11-03 21:26:17 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by kemayo@deploy2002 using scap backport" [extensions/VisualEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201119 (https://phabricator.wikimedia.org/T408890) (owner: ''DLynch)'
2025-11-03 21:28:27 <jinxer-wm> FIRING: [2x] ProbeDown: Service wdqs2009:443 has failed probes (http_wdqs_external_search_sparql_endpoint_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#wdqs2009:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
2025-11-03 21:31:04 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P84682 and previous config saved to /var/cache/conftool/dbconfig/20251103-213102-marostegui.json
2025-11-03 21:32:45 <jinxer-wm> FIRING: [2x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
2025-11-03 21:34:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 21:37:12 <wikibugs> ('Merged) ''jenkins-bot: Edit check: allow MWVE_FORCE_EDIT_CHECK_ENABLED to override ecenable [extensions/VisualEditor] (wmf/1.45.0-wmf.25) - ''https://gerrit.wikimedia.org/r/1201119 (https://phabricator.wikimedia.org/T408890) (owner: ''DLynch)'
2025-11-03 21:37:30 <logmsgbot> !log kemayo@deploy2002 Started scap sync-world: Backport for [[gerrit:1201119|Edit check: allow MWVE_FORCE_EDIT_CHECK_ENABLED to override ecenable (T408890)]]
2025-11-03 21:37:36 <stashbot> T408890: Write script that will cause Suggestion Mode to be enabled by default - https://phabricator.wikimedia.org/T408890
2025-11-03 21:37:45 <jinxer-wm> FIRING: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
2025-11-03 21:39:01 <jinxer-wm> FIRING: [5x] SystemdUnitFailed: docker-reporter-kubernetes-dse_eqiad-images.service on build2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
2025-11-03 21:39:10 <wikibugs> ('CR) ''C. Scott Ananian: [C:''+1] Deploy Parsoid Read Views to 7 wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199880 (https://phabricator.wikimedia.org/T408765) (owner: ''Arlolra)'
2025-11-03 21:39:29 <logmsgbot> !log kemayo@deploy2002 kemayo: Backport for [[gerrit:1201119|Edit check: allow MWVE_FORCE_EDIT_CHECK_ENABLED to override ecenable (T408890)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 21:42:37 <logmsgbot> !log kemayo@deploy2002 kemayo: Continuing with sync
2025-11-03 21:46:01 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''Data-Persistence, ''DC-Ops: Q2:rack/setup/install ms-be209[0-4] - https://phabricator.wikimedia.org/T405958#11337889 (''Jhancock.wm)'
2025-11-03 21:46:15 <logmsgbot> !log marostegui@cumin1003 dbctl commit (dc=all): 'Repooling after maintenance db2238 (T407997)', diff saved to https://phabricator.wikimedia.org/P84683 and previous config saved to /var/cache/conftool/dbconfig/20251103-214610-marostegui.json
2025-11-03 21:46:22 <stashbot> T407997: Drop the afl_ip column and the afl_ip_timestamp index from the abuse_filter_log table - https://phabricator.wikimedia.org/T407997
2025-11-03 21:46:51 <logmsgbot> !log kemayo@deploy2002 Finished scap sync-world: Backport for [[gerrit:1201119|Edit check: allow MWVE_FORCE_EDIT_CHECK_ENABLED to override ecenable (T408890)]] (duration: 09m 21s)
2025-11-03 21:46:58 <stashbot> T408890: Write script that will cause Suggestion Mode to be enabled by default - https://phabricator.wikimedia.org/T408890
2025-11-03 21:47:04 <Kemayo> Okay, whoever's up next is free to go.
2025-11-03 21:47:44 <wikibugs> 'ops-codfw, ''SRE, ''SRE-swift-storage, ''Data-Persistence, ''DC-Ops: Q2:rack/setup/install ms-be209[0-4] - https://phabricator.wikimedia.org/T405958#11337893 (''Jhancock.wm) a:''Jhancock.wm'
2025-11-03 21:47:52 <arlolra> I can go
2025-11-03 21:47:59 <arlolra> unless someone else wants to
2025-11-03 21:48:29 <ZhaoFJx> can anyone depoly 1200400?
2025-11-03 21:48:41 <arlolra> I can do that for you
2025-11-03 21:48:42 <ZhaoFJx> really quick config change
2025-11-03 21:48:49 <ZhaoFJx> thanks arlolra !
2025-11-03 21:48:57 <arlolra> I'll do it now
2025-11-03 21:49:18 <Superpes> Jut noticing that it could be merged together with my patch
2025-11-03 21:49:28 <arlolra> Ok, I can do both
2025-11-03 21:50:00 <Superpes> They are quite simple and similar so there shouldn't be any problems :)
2025-11-03 21:51:19 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by arlolra@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200475 (https://phabricator.wikimedia.org/T408885) (owner: ''Superpes15)'
2025-11-03 21:51:19 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by arlolra@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200400 (https://phabricator.wikimedia.org/T408902) (owner: ''ZhaoFJx)'
2025-11-03 21:52:45 <jinxer-wm> FIRING: [4x] Traffic bill over quota: Alert for device cr1-eqiad.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
2025-11-03 21:53:44 <wikibugs> ('Merged) ''jenkins-bot: [enwikivoyage] Enable block feature for AbuseFilter [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200475 (https://phabricator.wikimedia.org/T408885) (owner: ''Superpes15)'
2025-11-03 21:53:48 <wikibugs> ('Merged) ''jenkins-bot: zhwiki: Add SecurePoll Rights to CheckUser [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1200400 (https://phabricator.wikimedia.org/T408902) (owner: ''ZhaoFJx)'
2025-11-03 21:54:06 <logmsgbot> !log arlolra@deploy2002 Started scap sync-world: Backport for [[gerrit:1200475|[enwikivoyage] Enable block feature for AbuseFilter (T408885)]], [[gerrit:1200400|zhwiki: Add SecurePoll Rights to CheckUser (T408902)]]
2025-11-03 21:54:11 <stashbot> T408885: Enable block feature on the abuse filter on the English Wikivoyage - https://phabricator.wikimedia.org/T408885
2025-11-03 21:54:12 <stashbot> T408902: Grant securepoll-related permissions to checkuser on zhwiki - https://phabricator.wikimedia.org/T408902
2025-11-03 21:56:13 <logmsgbot> !log arlolra@deploy2002 superpes, zhaofjx, arlolra: Backport for [[gerrit:1200475|[enwikivoyage] Enable block feature for AbuseFilter (T408885)]], [[gerrit:1200400|zhwiki: Add SecurePoll Rights to CheckUser (T408902)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 21:56:38 <Superpes> I just tested both patches and they are fine :)
2025-11-03 21:56:45 <arlolra> Thank you
2025-11-03 21:56:56 <logmsgbot> !log arlolra@deploy2002 superpes, zhaofjx, arlolra: Continuing with sync
2025-11-03 21:56:56 <ZhaoFJx> tests and looks great
2025-11-03 21:56:59 <Superpes> Easy and quick :P
2025-11-03 21:57:45 <jinxer-wm> RESOLVED: [2x] Traffic bill over quota: Alert for device cr2-eqord.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota
2025-11-03 22:00:04 <jouncebot> Reedy, sbassett, Maryum, and manfredi: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20251103T2200).
2025-11-03 22:01:11 <logmsgbot> !log arlolra@deploy2002 Finished scap sync-world: Backport for [[gerrit:1200475|[enwikivoyage] Enable block feature for AbuseFilter (T408885)]], [[gerrit:1200400|zhwiki: Add SecurePoll Rights to CheckUser (T408902)]] (duration: 07m 05s)
2025-11-03 22:01:15 <stashbot> T408885: Enable block feature on the abuse filter on the English Wikivoyage - https://phabricator.wikimedia.org/T408885
2025-11-03 22:01:15 <stashbot> T408902: Grant securepoll-related permissions to checkuser on zhwiki - https://phabricator.wikimedia.org/T408902
2025-11-03 22:01:25 <Superpes> Many thanks for your assistance arlolra :3
2025-11-03 22:01:32 <arlolra> No problem
2025-11-03 22:01:55 <wikibugs> ('CR) ''TrainBranchBot: [C:''+2] "Approved by arlolra@deploy2002 using scap backport" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199880 (https://phabricator.wikimedia.org/T408765) (owner: ''Arlolra)'
2025-11-03 22:02:54 <ZhaoFJx> checked again and all good
2025-11-03 22:03:06 <ZhaoFJx> thank you arlolra :D
2025-11-03 22:03:29 <arlolra> :)
2025-11-03 22:05:32 <jinxer-wm> RESOLVED: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2009:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 22:06:46 <logmsgbot> !log bking@cumin2002 DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on wdqs2009.codfw.wmnet with reason: no SLO for this endpoint
2025-11-03 22:07:24 <inflatador> !log bking@cumin2002 suppress wdqs2009 alerts for next 90 days T409117
2025-11-03 22:07:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 22:07:26 <stashbot> T409117: wdqs2009: Disable some alerts - https://phabricator.wikimedia.org/T409117
2025-11-03 22:07:40 <wikibugs> ('Merged) ''jenkins-bot: Deploy Parsoid Read Views to 7 wikis [mediawiki-config] - ''https://gerrit.wikimedia.org/r/1199880 (https://phabricator.wikimedia.org/T408765) (owner: ''Arlolra)'
2025-11-03 22:08:01 <logmsgbot> !log arlolra@deploy2002 Started scap sync-world: Backport for [[gerrit:1199880|Deploy Parsoid Read Views to 7 wikis (T408765)]]
2025-11-03 22:08:03 <stashbot> T408765: Parsoid Read Views to deploy ~2025-10-03 - https://phabricator.wikimedia.org/T408765
2025-11-03 22:10:11 <logmsgbot> !log arlolra@deploy2002 arlolra: Backport for [[gerrit:1199880|Deploy Parsoid Read Views to 7 wikis (T408765)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
2025-11-03 22:11:47 <logmsgbot> !log arlolra@deploy2002 arlolra: Continuing with sync
2025-11-03 22:16:02 <logmsgbot> !log arlolra@deploy2002 Finished scap sync-world: Backport for [[gerrit:1199880|Deploy Parsoid Read Views to 7 wikis (T408765)]] (duration: 08m 01s)
2025-11-03 22:16:05 <stashbot> T408765: Parsoid Read Views to deploy ~2025-10-03 - https://phabricator.wikimedia.org/T408765
2025-11-03 22:16:15 <wikibugs> ('PS1) ''Ryan Kemper: wdqs: allowlist new endpoints [puppet] - ''https://gerrit.wikimedia.org/r/1201295 (https://phabricator.wikimedia.org/T407406)'
2025-11-03 22:17:27 <wikibugs> ('PS1) ''Bking: wdqs: Add new endpoints to allowlist [puppet] - ''https://gerrit.wikimedia.org/r/1201296 (https://phabricator.wikimedia.org/T407407)'
2025-11-03 22:19:52 <wikibugs> ('PS2) ''Bking: wdqs: allowlist new endpoints [puppet] - ''https://gerrit.wikimedia.org/r/1201295 (https://phabricator.wikimedia.org/T407406) (owner: ''Ryan Kemper)'
2025-11-03 22:23:14 <wikibugs> ('CR) ''Bking: "Per @dcausse@wikimedia.org comment on https://gerrit.wikimedia.org/r/c/operations/alerts/+/1130730/3/team-search-platform/blazegraph.yaml " [alerts] - ''https://gerrit.wikimedia.org/r/1198161 (https://phabricator.wikimedia.org/T389859) (owner: ''Ryan Kemper)'
2025-11-03 22:25:44 <wikibugs> ('CR) ''Ryan Kemper: "Ah yes that makes sense. We'll have to figure out another approach then" [alerts] - ''https://gerrit.wikimedia.org/r/1198161 (https://phabricator.wikimedia.org/T389859) (owner: ''Ryan Kemper)'
2025-11-03 22:29:47 <wikibugs> ('CR) ''Bking: [C:''+2] blazegraph: add cluster sync check [alerts] - ''https://gerrit.wikimedia.org/r/1174723 (https://phabricator.wikimedia.org/T408026) (owner: ''Gmodena)'
2025-11-03 22:39:46 <wikibugs> ('PS1) ''Dzahn: tcpproxy: add firewall rule to allow gerrit ssh port [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532)'
2025-11-03 22:40:28 <wikibugs> ('CR) ''CI reject: [V:''-1] tcpproxy: add firewall rule to allow gerrit ssh port [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532) (owner: ''Dzahn)'
2025-11-03 22:41:00 <wikibugs> ('PS2) ''Dzahn: tcpproxy: add firewall rule to allow gerrit ssh port [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532)'
2025-11-03 22:42:05 <wikibugs> ('PS3) ''Ryan Kemper: wdqs: allowlist new endpoints [puppet] - ''https://gerrit.wikimedia.org/r/1201295 (https://phabricator.wikimedia.org/T407406)'
2025-11-03 22:43:50 <wikibugs> ('CR) ''Bking: [C:''+2] wdqs: allowlist new endpoints [puppet] - ''https://gerrit.wikimedia.org/r/1201295 (https://phabricator.wikimedia.org/T407406) (owner: ''Ryan Kemper)'
2025-11-03 22:46:43 <jinxer-wm> FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 22:47:37 <wikibugs> ('PS1) ''Ryan Kemper: wdqs: don't sleep so long for restarts [cookbooks] - ''https://gerrit.wikimedia.org/r/1201300'
2025-11-03 22:47:39 <jinxer-wm> RESOLVED: [2x] TransitBGPDown: Transit BGP session down between cr3-eqsin and Hurricane Electric (2001:de8:4::6939:1) - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DTransitBGPDown
2025-11-03 22:48:13 <wikibugs> ('CR) ''Dzahn: [V:''+1 C:''+2] "https://puppet-compiler.wmflabs.org/output/1201299/7529/tcp-proxy1001.eqiad.wmnet/index.html"; [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532) (owner: ''Dzahn)'
2025-11-03 22:48:20 <wikibugs> ('PS3) ''Dzahn: tcpproxy: add firewall rule to allow gerrit ssh port [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532)'
2025-11-03 22:48:57 <wikibugs> ('CR) ''Bking: [C:''+2] wdqs: don't sleep so long for restarts [cookbooks] - ''https://gerrit.wikimedia.org/r/1201300 (owner: ''Ryan Kemper)'
2025-11-03 22:49:06 <wikibugs> ('PS1) ''Btullis: Add the python3-pymysql package to the analytics::refinery profile [puppet] - ''https://gerrit.wikimedia.org/r/1201301 (https://phabricator.wikimedia.org/T402943)'
2025-11-03 22:50:35 <wikibugs> ('CR) ''Dzahn: [C:''+2] tcpproxy: add firewall rule to allow gerrit ssh port [puppet] - ''https://gerrit.wikimedia.org/r/1201299 (https://phabricator.wikimedia.org/T408532) (owner: ''Dzahn)'
2025-11-03 22:51:02 <jinxer-wm> FIRING: RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2008:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 22:51:25 <ryankemper> !log [WDQS] Restarting all codfw wdqs-main hosts; we're getting slammed by increased triple count (same issue we've been seeing intermittently for a week or two)
2025-11-03 22:51:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 22:52:28 <wikibugs> ('CR) ''Btullis: [V:''+1] "PCC SUCCESS (CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7530/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1201301 (https://phabricator.wikimedia.org/T402943) (owner: ''Btullis)'
2025-11-03 22:54:20 <logmsgbot> !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
2025-11-03 22:54:25 <logmsgbot> !log ryankemper@cumin2002 END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
2025-11-03 22:54:35 <logmsgbot> !log ryankemper@cumin2002 START - Cookbook sre.wdqs.restart
2025-11-03 22:55:12 <wikibugs> ('PS2) ''Btullis: Add the python3-pymysql package to the analytics::refinery profile [puppet] - ''https://gerrit.wikimedia.org/r/1201301 (https://phabricator.wikimedia.org/T402943)'
2025-11-03 22:55:21 <wikibugs> ('Merged) ''jenkins-bot: wdqs: don't sleep so long for restarts [cookbooks] - ''https://gerrit.wikimedia.org/r/1201300 (owner: ''Ryan Kemper)'
2025-11-03 22:56:41 <inflatador> !log bking@cumin2002 depool wdqs2008 and 2012 so they can catch up on lag
2025-11-03 22:56:42 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 22:59:01 <wikibugs> ('CR) ''Btullis: [V:''+1] "PCC SUCCESS (CORE_DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7531/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1201301 (https://phabricator.wikimedia.org/T402943) (owner: ''Btullis)'
2025-11-03 23:01:37 <inflatador> !log bking@cumin2002 repool wdqs2008 and 2012
2025-11-03 23:01:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2025-11-03 23:13:13 <jinxer-wm> RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 23:13:23 <icinga-wm> PROBLEM - PyBal IPVS diff check on lvs2014 is CRITICAL: (CRITICAL: Mismatch between IPVS and PyBal https://wikitech.wikimedia.org/wiki/PyBal
2025-11-03 23:18:21 <icinga-wm> RECOVERY - PyBal IPVS diff check on lvs2014 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
2025-11-03 23:18:52 <jinxer-wm> FIRING: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2025-11-03 23:22:43 <jinxer-wm> FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 23:23:52 <jinxer-wm> RESOLVED: SLOMetricAbsent: wdqs-main-update-lag codfw - https://slo.wikimedia.org/?search=wdqs-main-update-lag - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent
2025-11-03 23:25:02 <wikibugs> ('PS1) ''BCornwall: Add profile::ncmonitor::markmonitor_api_key [labs/private] - ''https://gerrit.wikimedia.org/r/1201304 (https://phabricator.wikimedia.org/T408857)'
2025-11-03 23:25:34 <wikibugs> ('CR) ''BCornwall: [V:''+2 C:''+2] Add profile::ncmonitor::markmonitor_api_key [labs/private] - ''https://gerrit.wikimedia.org/r/1201304 (https://phabricator.wikimedia.org/T408857) (owner: ''BCornwall)'
2025-11-03 23:32:43 <wikibugs> ('CR) ''Btullis: trafficserver: rediredct growthbook-backend from public to private domains (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/1201074 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 23:33:30 <wikibugs> ('CR) ''Btullis: [C:''+1] growthbook: define public configuration for s3 file uploads [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201086 (https://phabricator.wikimedia.org/T408415) (owner: ''Brouberol)'
2025-11-03 23:34:01 <jinxer-wm> FIRING: JobUnavailable: Reduced availability for job cloud_dev_pdns_rec in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
2025-11-03 23:34:25 <wikibugs> ('CR) ''Btullis: [C:''+1] growthbook: define public configuration for s3 file uploads (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201086 (https://phabricator.wikimedia.org/T408415) (owner: ''Brouberol)'
2025-11-03 23:35:35 <wikibugs> ('CR) ''Btullis: Define the growthbook-backend domain (''1 comment) [dns] - ''https://gerrit.wikimedia.org/r/1201075 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 23:36:12 <wikibugs> ('CR) ''Btullis: Define the growthbook-backend domain (''1 comment) [dns] - ''https://gerrit.wikimedia.org/r/1201075 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 23:36:43 <wikibugs> ('PS1) ''BCornwall: ncmonitor: Add MarkMonitor API key [puppet] - ''https://gerrit.wikimedia.org/r/1201308 (https://phabricator.wikimedia.org/T408857)'
2025-11-03 23:36:54 <wikibugs> ('CR) ''Btullis: [C:''+1] postgresql-growthbook: add additional PG parameters [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201082 (https://phabricator.wikimedia.org/T406578) (owner: ''Brouberol)'
2025-11-03 23:37:56 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7532/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1201308 (https://phabricator.wikimedia.org/T408857) (owner: ''BCornwall)'
2025-11-03 23:38:29 <wikibugs> ('CR) ''Btullis: [C:''+1] "I'm in favour of getting email working, but I'm not yet convinced that we want user self-registration by email. We can discuss that bit an" [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201084 (https://phabricator.wikimedia.org/T408904) (owner: ''Brouberol)'
2025-11-03 23:39:02 <wikibugs> ('PS2) ''BCornwall: ncmonitor: Add MarkMonitor API key [puppet] - ''https://gerrit.wikimedia.org/r/1201308 (https://phabricator.wikimedia.org/T408857)'
2025-11-03 23:39:07 <wikibugs> ('CR) ''Btullis: "As discussed on other patches." [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201080 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 23:39:52 <wikibugs> ('CR) ''BCornwall: [V:''+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/7533/co"; [puppet] - ''https://gerrit.wikimedia.org/r/1201308 (https://phabricator.wikimedia.org/T408857) (owner: ''BCornwall)'
2025-11-03 23:40:02 <wikibugs> ('CR) ''Btullis: growthbook: set the APP_ORIGIN and API_HOST env vars to the public domains (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/1201081 (https://phabricator.wikimedia.org/T408903) (owner: ''Brouberol)'
2025-11-03 23:40:36 <wikibugs> ('CR) ''BCornwall: ncmonitor: Add MarkMonitor API key [puppet] - ''https://gerrit.wikimedia.org/r/1201308 (https://phabricator.wikimedia.org/T408857) (owner: ''BCornwall)'
2025-11-03 23:42:43 <jinxer-wm> RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 23:51:43 <jinxer-wm> FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
2025-11-03 23:53:04 <wikibugs> 'SRE, ''LDAP-Access-Requests: Grant Access to Superset for vicaplet-wmde - https://phabricator.wikimedia.org/T408920#11338206 (''KFrancis) Hi all, confirming I have an NDA on file for @virginie.caplet. Thanks!'
2025-11-03 23:58:03 <jinxer-wm> FIRING: [10x] RdfStreamingUpdaterHighConsumerUpdateLag: wdqs2007:9101 has fallen behind applying updates from the RDF Streaming Updater - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterHighConsumerUpdateLag
2025-11-03 23:58:11 <wikibugs> 'SRE, ''collaboration-services, ''Znuny: VRTS outbound emails not working - https://phabricator.wikimedia.org/T408967#11338227 (''Xaosflux) @jhathaway - how is the diagnosis going, the symptoms still persist.'
2025-11-03 23:59:27 <jinxer-wm> RESOLVED: [16x] ProbeDown: Service wdqs2007:443 has failed probes (http_wdqs_main_external_search_sparql_endpoint_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown

This page is generated from SQL logs, you can also download static txt files from here