[00:42:05] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:43:31] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:40:30] <jinxer-wm>	 (JobUnavailable) firing: (3) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[02:20:19] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:04:00] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Give bmansurov access necessary to support Research Airflow jobs - https://phabricator.wikimedia.org/T301215 (10bmansurov) Thanks!
[04:01:35] <icinga-wm>	 RECOVERY - Check systemd state on build2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:10:39] <icinga-wm>	 PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: debian-weekly-rebuild.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:20:17] <icinga-wm>	 PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:22:45] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:21:32] <icinga-wm>	 RECOVERY - SSH on wtp1027.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:41:05] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[06:44:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[06:46:29] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[06:49:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[06:51:11] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[07:34:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 5 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org
[07:44:49] <jinxer-wm>	 (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 5 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220213T0800)
[09:18:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20619 and previous config saved to /var/cache/conftool/dbconfig/20220213-091826-marostegui.json
[09:18:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:18:33] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[09:33:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20620 and previous config saved to /var/cache/conftool/dbconfig/20220213-093331-marostegui.json
[09:33:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:41:05] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[09:48:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20621 and previous config saved to /var/cache/conftool/dbconfig/20220213-094836-marostegui.json
[09:48:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20622 and previous config saved to /var/cache/conftool/dbconfig/20220213-100340-marostegui.json
[10:03:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:03:44] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
[10:03:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:48] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[10:03:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20623 and previous config saved to /var/cache/conftool/dbconfig/20220213-100348-marostegui.json
[10:03:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:50] <wikibugs>	 (03PS1) 10Majavah: hieradata: add new cloudinfra-db hosts [puppet] - 10https://gerrit.wikimedia.org/r/762104
[11:17:03] <wikibugs>	 (03PS1) 10Majavah: P:mariadb::cloudinfra: use Cinder volumes for storage [puppet] - 10https://gerrit.wikimedia.org/r/762105
[11:30:24] <wikibugs>	 (03PS1) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106
[11:31:39] <wikibugs>	 (03PS2) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106
[11:40:23] <wikibugs>	 (03PS3) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106
[13:41:05] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[14:33:23] <icinga-wm>	 PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:47:43] <wikibugs>	 (03PS1) 10Stang: Fix missing icons for apiportalwiki and wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762111 (https://phabricator.wikimedia.org/T301636)
[15:09:09] <icinga-wm>	 PROBLEM - Disk space on thanos-be2001 is CRITICAL: DISK CRITICAL - free space: / 2030 MB (3% inode=98%): /tmp 2030 MB (3% inode=98%): /var/tmp 2030 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=thanos-be2001&var-datasource=codfw+prometheus/ops
[15:38:46] <godog>	 unexpected ^ I've bandaided it until tomorrow
[15:39:22] <godog>	 !log shorten /var/log/swift/server.log.1 on thanos-be2001 to recover some space
[15:39:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:23] <icinga-wm>	 RECOVERY - Disk space on thanos-be2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=thanos-be2001&var-datasource=codfw+prometheus/ops
[16:13:28] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/mobile-html-offline-resources/{title} (Get offline resource links to accompany page content HTML for test page) is CRITICAL: Test Get offline resource links to accompany page content HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[16:15:32] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[16:56:38] <wikibugs>	 (03PS2) 10Stang: Fix missing icons for apiportalwiki and wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762111 (https://phabricator.wikimedia.org/T301636)
[17:02:11] <wikibugs>	 (03PS1) 10Stang: Upload logo for apiportalwiki in wmgCentralAuthLoginIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762119 (https://phabricator.wikimedia.org/T301636)
[17:04:07] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: add new cloudinfra-db hosts [puppet] - 10https://gerrit.wikimedia.org/r/762104 (owner: 10Majavah)
[17:05:14] <wikibugs>	 (03CR) 10Andrew Bogott: P:mariadb::cloudinfra: use Cinder volumes for storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah)
[17:05:44] <wikibugs>	 (03CR) 10Majavah: P:mariadb::cloudinfra: use Cinder volumes for storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah)
[17:06:29] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "lmk when you're ready to merge" [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah)
[17:07:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:mariadb::cloudinfra: use Cinder volumes for storage [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah)
[17:07:31] <wikibugs>	 (03CR) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah)
[17:07:46] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah)
[17:10:37] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 73 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:11:11] <wikibugs>	 (03PS1) 10Majavah: hieradata: cloudinfra: db switchover to db-03 [puppet] - 10https://gerrit.wikimedia.org/r/762120
[17:11:13] <wikibugs>	 (03PS1) 10Majavah: hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121
[17:17:00] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 60 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:27:37] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: cloudinfra: db switchover to db-03 [puppet] - 10https://gerrit.wikimedia.org/r/762120 (owner: 10Majavah)
[17:36:28] <icinga-wm>	 RECOVERY - SSH on wtp1027.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:41:05] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[17:51:37] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move cvn to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762122
[17:53:36] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move cvn to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762122 (owner: 10Andrew Bogott)
[17:58:44] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: fix a copy/paste error for cvn project [puppet] - 10https://gerrit.wikimedia.org/r/762123 (https://phabricator.wikimedia.org/T301280)
[17:59:24] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: fix a copy/paste error for cvn project [puppet] - 10https://gerrit.wikimedia.org/r/762123 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott)
[18:10:59] <wikibugs>	 (03PS1) 10Majavah: P:wmcs::nfs::standalone: add a motd warning [puppet] - 10https://gerrit.wikimedia.org/r/762124
[18:14:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:wmcs::nfs::standalone: add a motd warning [puppet] - 10https://gerrit.wikimedia.org/r/762124 (owner: 10Majavah)
[18:14:53] <wikibugs>	 (03PS2) 10Andrew Bogott: hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121 (owner: 10Majavah)
[18:16:12] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121 (owner: 10Majavah)
[18:23:44] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-cinder-backup-manager: add two more nfs volumes [puppet] - 10https://gerrit.wikimedia.org/r/762125 (https://phabricator.wikimedia.org/T301280)
[18:23:46] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move twl to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762126 (https://phabricator.wikimedia.org/T301280)
[18:24:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cinder-backup-manager: add two more nfs volumes [puppet] - 10https://gerrit.wikimedia.org/r/762125 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott)
[18:44:18] <wikibugs>	 (03PS1) 10ArielGlenn: do flow dumps in multiple pieces and concat them together [dumps] - 10https://gerrit.wikimedia.org/r/762127 (https://phabricator.wikimedia.org/T300760)
[18:46:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move twl to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762126 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott)
[19:00:17] <wikibugs>	 (03PS1) 10Ladsgroup: WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433)
[19:02:38] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433) (owner: 10Ladsgroup)
[19:16:58] <wikibugs>	 (03Merged) 10jenkins-bot: WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433) (owner: 10Ladsgroup)
[19:19:34] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move fastcci to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762130 (https://phabricator.wikimedia.org/T301280)
[19:20:30] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move fastcci to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762130 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott)
[19:26:37] <logmsgbot>	 !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.21/includes/page/WikiPage.php: Backport: [[gerrit:761755|WikiPage: Cast the category values to string in updateCategoryCounts (T301433)]] (duration: 00m 49s)
[19:26:39] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: remove an unwanted . in the fastcci mount definition [puppet] - 10https://gerrit.wikimedia.org/r/762131 (https://phabricator.wikimedia.org/T301280)
[19:26:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:26:43] <stashbot>	 T301433: Wikimedia\Rdbms\DBReadOnlyError: Database is read-only: The database is read-only until replication lag decreases. - https://phabricator.wikimedia.org/T301433
[19:27:36] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: remove an unwanted . in the fastcci mount definition [puppet] - 10https://gerrit.wikimedia.org/r/762131 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott)
[19:31:49] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
[19:31:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:53] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[19:35:54] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
[19:35:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:56] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
[19:39:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:34] <wikibugs>	 (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move wikidumpparse to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762133 (https://phabricator.wikimedia.org/T301280)
[20:53:55] <wikibugs>	 10SRE, 10Phabricator, 10vm-requests: VM Request template (form 90) title doesn't make sense - https://phabricator.wikimedia.org/T301387 (10Aklapper) 05Open→03Resolved a:03Aklapper Thanks! Fixed in https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-FORM-uqca36wwsldwt5w/
[21:18:59] <wikibugs>	 10SRE, 10Phabricator, 10vm-requests: VM Request template (form 90) title doesn't make sense - https://phabricator.wikimedia.org/T301387 (10RhinosF1) No problem, thanks for the quick work.
[21:29:54] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:41:05] <jinxer-wm>	 (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org
[22:31:18] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:32:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20624 and previous config saved to /var/cache/conftool/dbconfig/20220213-223228-marostegui.json
[22:32:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:32:36] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[22:43:30] <icinga-wm>	 PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:47:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20625 and previous config saved to /var/cache/conftool/dbconfig/20220213-224733-marostegui.json
[22:47:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:02:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20626 and previous config saved to /var/cache/conftool/dbconfig/20220213-230237-marostegui.json
[23:02:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:14:39] <wikibugs>	 10SRE, 10MediaWiki-extensions-PropertySuggester, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10Aklapper)
[23:17:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20627 and previous config saved to /var/cache/conftool/dbconfig/20220213-231742-marostegui.json
[23:17:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:17:49] <stashbot>	 T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775
[23:44:10] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 68 probes of 652 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:46:37] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 69 probes of 659 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:47:44] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 72 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:49:06] <icinga-wm>	 PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 69 probes of 661 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:50:27] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 65 probes of 652 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:52:57] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 62 probes of 659 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:55:24] <icinga-wm>	 RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 61 probes of 661 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas