[00:42:05] RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:43:31] RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:40:30] (JobUnavailable) firing: (3) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [02:20:19] PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:04:00] 10SRE, 10SRE-Access-Requests, 10Data-Engineering: Give bmansurov access necessary to support Research Airflow jobs - https://phabricator.wikimedia.org/T301215 (10bmansurov) Thanks! [04:01:35] RECOVERY - Check systemd state on build2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:10:39] PROBLEM - Check systemd state on build2001 is CRITICAL: CRITICAL - degraded: The following units failed: debian-weekly-rebuild.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:20:17] PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:22:45] RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:21:32] RECOVERY - SSH on wtp1027.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:41:05] (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [06:44:55] (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org [06:46:29] PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code={200,204} handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [06:49:55] (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org [06:51:11] RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [07:34:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) firing: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 5 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://alerts.wikimedia.org [07:44:49] (RdfStreamingUpdaterFlinkProcessingLatencyIsHigh) resolved: Processing latency of WDQS_Streaming_Updater in codfw (k8s) is above 5 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater - https://alerts.wikimedia.org [08:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220213T0800) [09:18:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20619 and previous config saved to /var/cache/conftool/dbconfig/20220213-091826-marostegui.json [09:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:33] T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775 [09:33:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20620 and previous config saved to /var/cache/conftool/dbconfig/20220213-093331-marostegui.json [09:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:05] (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [09:48:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20621 and previous config saved to /var/cache/conftool/dbconfig/20220213-094836-marostegui.json [09:48:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20622 and previous config saved to /var/cache/conftool/dbconfig/20220213-100340-marostegui.json [10:03:42] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance [10:03:44] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance [10:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:48] T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775 [10:03:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20623 and previous config saved to /var/cache/conftool/dbconfig/20220213-100348-marostegui.json [10:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:50] (03PS1) 10Majavah: hieradata: add new cloudinfra-db hosts [puppet] - 10https://gerrit.wikimedia.org/r/762104 [11:17:03] (03PS1) 10Majavah: P:mariadb::cloudinfra: use Cinder volumes for storage [puppet] - 10https://gerrit.wikimedia.org/r/762105 [11:30:24] (03PS1) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106 [11:31:39] (03PS2) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106 [11:40:23] (03PS3) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106 [13:41:05] (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [14:33:23] PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:47:43] (03PS1) 10Stang: Fix missing icons for apiportalwiki and wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762111 (https://phabricator.wikimedia.org/T301636) [15:09:09] PROBLEM - Disk space on thanos-be2001 is CRITICAL: DISK CRITICAL - free space: / 2030 MB (3% inode=98%): /tmp 2030 MB (3% inode=98%): /var/tmp 2030 MB (3% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=thanos-be2001&var-datasource=codfw+prometheus/ops [15:38:46] unexpected ^ I've bandaided it until tomorrow [15:39:22] !log shorten /var/log/swift/server.log.1 on thanos-be2001 to recover some space [15:39:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:23] RECOVERY - Disk space on thanos-be2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=thanos-be2001&var-datasource=codfw+prometheus/ops [16:13:28] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/page/mobile-html-offline-resources/{title} (Get offline resource links to accompany page content HTML for test page) is CRITICAL: Test Get offline resource links to accompany page content HTML for test page returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [16:15:32] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [16:56:38] (03PS2) 10Stang: Fix missing icons for apiportalwiki and wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762111 (https://phabricator.wikimedia.org/T301636) [17:02:11] (03PS1) 10Stang: Upload logo for apiportalwiki in wmgCentralAuthLoginIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/762119 (https://phabricator.wikimedia.org/T301636) [17:04:07] (03CR) 10Andrew Bogott: [C: 03+2] hieradata: add new cloudinfra-db hosts [puppet] - 10https://gerrit.wikimedia.org/r/762104 (owner: 10Majavah) [17:05:14] (03CR) 10Andrew Bogott: P:mariadb::cloudinfra: use Cinder volumes for storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah) [17:05:44] (03CR) 10Majavah: P:mariadb::cloudinfra: use Cinder volumes for storage (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah) [17:06:29] (03CR) 10Andrew Bogott: [C: 03+1] "lmk when you're ready to merge" [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah) [17:07:28] (03CR) 10Andrew Bogott: [C: 03+2] P:mariadb::cloudinfra: use Cinder volumes for storage [puppet] - 10https://gerrit.wikimedia.org/r/762105 (owner: 10Majavah) [17:07:31] (03CR) 10Majavah: P:mariadb::grants::cloudinfra: read grant hosts from hiera (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah) [17:07:46] (03CR) 10Andrew Bogott: [C: 03+2] P:mariadb::grants::cloudinfra: read grant hosts from hiera [puppet] - 10https://gerrit.wikimedia.org/r/762106 (owner: 10Majavah) [17:10:37] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 73 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [17:11:11] (03PS1) 10Majavah: hieradata: cloudinfra: db switchover to db-03 [puppet] - 10https://gerrit.wikimedia.org/r/762120 [17:11:13] (03PS1) 10Majavah: hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121 [17:17:00] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 60 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [17:27:37] (03CR) 10Andrew Bogott: [C: 03+2] hieradata: cloudinfra: db switchover to db-03 [puppet] - 10https://gerrit.wikimedia.org/r/762120 (owner: 10Majavah) [17:36:28] RECOVERY - SSH on wtp1027.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:41:05] (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [17:51:37] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move cvn to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762122 [17:53:36] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move cvn to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762122 (owner: 10Andrew Bogott) [17:58:44] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: fix a copy/paste error for cvn project [puppet] - 10https://gerrit.wikimedia.org/r/762123 (https://phabricator.wikimedia.org/T301280) [17:59:24] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: fix a copy/paste error for cvn project [puppet] - 10https://gerrit.wikimedia.org/r/762123 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott) [18:10:59] (03PS1) 10Majavah: P:wmcs::nfs::standalone: add a motd warning [puppet] - 10https://gerrit.wikimedia.org/r/762124 [18:14:20] (03CR) 10Andrew Bogott: [C: 03+2] P:wmcs::nfs::standalone: add a motd warning [puppet] - 10https://gerrit.wikimedia.org/r/762124 (owner: 10Majavah) [18:14:53] (03PS2) 10Andrew Bogott: hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121 (owner: 10Majavah) [18:16:12] (03CR) 10Andrew Bogott: [C: 03+2] hieradata: remove old cloudinfra-dbs [puppet] - 10https://gerrit.wikimedia.org/r/762121 (owner: 10Majavah) [18:23:44] (03PS1) 10Andrew Bogott: wmcs-cinder-backup-manager: add two more nfs volumes [puppet] - 10https://gerrit.wikimedia.org/r/762125 (https://phabricator.wikimedia.org/T301280) [18:23:46] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move twl to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762126 (https://phabricator.wikimedia.org/T301280) [18:24:47] (03CR) 10Andrew Bogott: [C: 03+2] wmcs-cinder-backup-manager: add two more nfs volumes [puppet] - 10https://gerrit.wikimedia.org/r/762125 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott) [18:44:18] (03PS1) 10ArielGlenn: do flow dumps in multiple pieces and concat them together [dumps] - 10https://gerrit.wikimedia.org/r/762127 (https://phabricator.wikimedia.org/T300760) [18:46:18] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move twl to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762126 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott) [19:00:17] (03PS1) 10Ladsgroup: WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433) [19:02:38] (03CR) 10Ladsgroup: [C: 03+2] WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433) (owner: 10Ladsgroup) [19:16:58] (03Merged) 10jenkins-bot: WikiPage: Cast the category values to string in updateCategoryCounts [core] (wmf/1.38.0-wmf.21) - 10https://gerrit.wikimedia.org/r/761755 (https://phabricator.wikimedia.org/T301433) (owner: 10Ladsgroup) [19:19:34] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move fastcci to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762130 (https://phabricator.wikimedia.org/T301280) [19:20:30] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: move fastcci to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762130 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott) [19:26:37] !log ladsgroup@deploy1002 Synchronized php-1.38.0-wmf.21/includes/page/WikiPage.php: Backport: [[gerrit:761755|WikiPage: Cast the category values to string in updateCategoryCounts (T301433)]] (duration: 00m 49s) [19:26:39] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: remove an unwanted . in the fastcci mount definition [puppet] - 10https://gerrit.wikimedia.org/r/762131 (https://phabricator.wikimedia.org/T301280) [19:26:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:43] T301433: Wikimedia\Rdbms\DBReadOnlyError: Database is read-only: The database is read-only until replication lag decreases. - https://phabricator.wikimedia.org/T301433 [19:27:36] (03CR) 10Andrew Bogott: [C: 03+2] nfs-mounts.yaml: remove an unwanted . in the fastcci mount definition [puppet] - 10https://gerrit.wikimedia.org/r/762131 (https://phabricator.wikimedia.org/T301280) (owner: 10Andrew Bogott) [19:31:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [19:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:53] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [19:35:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn [19:35:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:56] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [19:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:34] (03PS1) 10Andrew Bogott: nfs-mounts.yaml: move wikidumpparse to a project-local nfs server [puppet] - 10https://gerrit.wikimedia.org/r/762133 (https://phabricator.wikimedia.org/T301280) [20:53:55] 10SRE, 10Phabricator, 10vm-requests: VM Request template (form 90) title doesn't make sense - https://phabricator.wikimedia.org/T301387 (10Aklapper) 05Open→03Resolved a:03Aklapper Thanks! Fixed in https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-FORM-uqca36wwsldwt5w/ [21:18:59] 10SRE, 10Phabricator, 10vm-requests: VM Request template (form 90) title doesn't make sense - https://phabricator.wikimedia.org/T301387 (10RhinosF1) No problem, thanks for the quick work. [21:29:54] PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:41:05] (JobUnavailable) firing: (2) Reduced availability for job etherpad in eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org [22:31:18] RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:32:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20624 and previous config saved to /var/cache/conftool/dbconfig/20220213-223228-marostegui.json [22:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:32:36] T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775 [22:43:30] PROBLEM - SSH on wtp1027.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:47:33] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20625 and previous config saved to /var/cache/conftool/dbconfig/20220213-224733-marostegui.json [22:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20626 and previous config saved to /var/cache/conftool/dbconfig/20220213-230237-marostegui.json [23:02:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:39] 10SRE, 10MediaWiki-extensions-PropertySuggester, 10Service-deployment-requests: New Service Request SchemaTree - https://phabricator.wikimedia.org/T301471 (10Aklapper) [23:17:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20627 and previous config saved to /var/cache/conftool/dbconfig/20220213-231742-marostegui.json [23:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:49] T300775: Add tl_target_id column to templatelinks - https://phabricator.wikimedia.org/T300775 [23:44:10] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 68 probes of 652 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:46:37] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 69 probes of 659 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:47:44] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 72 probes of 660 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:49:06] PROBLEM - IPv6 ping to esams on ripe-atlas-esams IPv6 is CRITICAL: CRITICAL - failed 69 probes of 661 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:50:27] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 65 probes of 652 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:52:57] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 62 probes of 659 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:55:24] RECOVERY - IPv6 ping to esams on ripe-atlas-esams IPv6 is OK: OK - failed 61 probes of 661 (alerts on 65) - https://atlas.ripe.net/measurements/23449938/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas