[00:01:36] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance [00:01:39] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1101.eqiad.wmnet with reason: Maintenance [00:01:44] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31222 and previous config saved to /var/cache/conftool/dbconfig/20220717-000143-ladsgroup.json [00:01:49] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [00:17:36] * zabe is curious if SecurePoll works without problems for once this year [00:17:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31223 and previous config saved to /var/cache/conftool/dbconfig/20220717-001754-ladsgroup.json [00:17:58] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [00:32:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31224 and previous config saved to /var/cache/conftool/dbconfig/20220717-003259-ladsgroup.json [00:42:09] RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:44:15] RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:48:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P31225 and previous config saved to /var/cache/conftool/dbconfig/20220717-004804-ladsgroup.json [01:03:10] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312984)', diff saved to https://phabricator.wikimedia.org/P31226 and previous config saved to /var/cache/conftool/dbconfig/20220717-010309-ladsgroup.json [01:03:14] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [01:05:47] PROBLEM - SSH on wtp1038.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:09:36] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [01:09:38] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance [01:37:45] (JobUnavailable) firing: Reduced availability for job workhorse in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:42:45] (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:47:45] (JobUnavailable) firing: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:52:45] (JobUnavailable) resolved: (4) Reduced availability for job nginx in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:07:09] RECOVERY - SSH on wtp1038.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:38:21] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance [02:38:34] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1150.eqiad.wmnet with reason: Maintenance [04:07:58] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance [04:08:23] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2123.codfw.wmnet with reason: Maintenance [04:08:24] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on 8 hosts with reason: Maintenance [04:08:42] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 8 hosts with reason: Maintenance [04:47:43] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance [04:47:57] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1096.eqiad.wmnet with reason: Maintenance [04:48:02] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31227 and previous config saved to /var/cache/conftool/dbconfig/20220717-044802-ladsgroup.json [04:48:07] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [05:26:21] PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:26:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31228 and previous config saved to /var/cache/conftool/dbconfig/20220717-062614-ladsgroup.json [06:26:26] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [06:27:45] RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:41:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31229 and previous config saved to /var/cache/conftool/dbconfig/20220717-064119-ladsgroup.json [06:50:55] RECOVERY - HTTPS-wmfusercontent on phab.wmfusercontent.org is OK: SSL OK - Certificate *.wikipedia.org valid until 2022-10-08 06:21:52 +0000 (expires in 82 days) https://phabricator.wikimedia.org/tag/phabricator/ [06:56:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P31230 and previous config saved to /var/cache/conftool/dbconfig/20220717-065624-ladsgroup.json [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220717T0700) [07:11:30] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31231 and previous config saved to /var/cache/conftool/dbconfig/20220717-071129-ladsgroup.json [07:11:31] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance [07:11:35] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [07:11:44] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1100.eqiad.wmnet with reason: Maintenance [07:11:50] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31232 and previous config saved to /var/cache/conftool/dbconfig/20220717-071149-ladsgroup.json [07:58:56] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31233 and previous config saved to /var/cache/conftool/dbconfig/20220717-075856-ladsgroup.json [07:59:01] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [08:07:59] PROBLEM - Disk space on archiva1002 is CRITICAL: DISK CRITICAL - free space: / 1967 MB (2% inode=84%): /tmp 1967 MB (2% inode=84%): /var/tmp 1967 MB (2% inode=84%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=archiva1002&var-datasource=eqiad+prometheus/ops [08:14:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31234 and previous config saved to /var/cache/conftool/dbconfig/20220717-081401-ladsgroup.json [08:29:07] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P31235 and previous config saved to /var/cache/conftool/dbconfig/20220717-082906-ladsgroup.json [08:44:12] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1100 (T312984)', diff saved to https://phabricator.wikimedia.org/P31236 and previous config saved to /var/cache/conftool/dbconfig/20220717-084411-ladsgroup.json [08:44:13] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance [08:44:17] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [08:44:27] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1113.eqiad.wmnet with reason: Maintenance [08:44:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31237 and previous config saved to /var/cache/conftool/dbconfig/20220717-084432-ladsgroup.json [08:57:27] PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:19:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31238 and previous config saved to /var/cache/conftool/dbconfig/20220717-101948-ladsgroup.json [10:19:53] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [10:34:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31239 and previous config saved to /var/cache/conftool/dbconfig/20220717-103453-ladsgroup.json [10:44:53] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [10:47:23] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 3 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [10:49:59] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P31240 and previous config saved to /var/cache/conftool/dbconfig/20220717-104958-ladsgroup.json [11:05:04] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31241 and previous config saved to /var/cache/conftool/dbconfig/20220717-110503-ladsgroup.json [11:05:05] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance [11:05:08] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [11:05:19] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1110.eqiad.wmnet with reason: Maintenance [11:05:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31242 and previous config saved to /var/cache/conftool/dbconfig/20220717-110523-ladsgroup.json [11:42:01] PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:42:16] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31243 and previous config saved to /var/cache/conftool/dbconfig/20220717-124215-ladsgroup.json [12:42:22] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [12:43:23] RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:57:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31244 and previous config saved to /var/cache/conftool/dbconfig/20220717-125720-ladsgroup.json [13:08:00] 10SRE, 10ConfirmEdit (CAPTCHA extension), 10MediaWiki-extensions-CentralAuth, 10Platform Engineering, and 7 others: Allow Stewards to enable 'emergency CAPTCHAs' for anonymous IP edits - https://phabricator.wikimedia.org/T303433 (10Zabe) >>! In T303433#7765136, @taavi wrote: > I'm not sure if I like a patt... [13:12:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P31245 and previous config saved to /var/cache/conftool/dbconfig/20220717-131226-ladsgroup.json [13:13:40] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/814358 [13:14:29] PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:27:31] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312984)', diff saved to https://phabricator.wikimedia.org/P31246 and previous config saved to /var/cache/conftool/dbconfig/20220717-132731-ladsgroup.json [13:27:33] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance [13:27:35] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [13:27:46] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1144.eqiad.wmnet with reason: Maintenance [13:27:51] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31247 and previous config saved to /var/cache/conftool/dbconfig/20220717-132751-ladsgroup.json [14:28:37] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:28:59] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:34:16] ^^ lists.wikimedia.org is indeed inaccessible [14:50:57] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48391 bytes in 0.507 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [14:51:19] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8572 bytes in 1.788 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [15:05:10] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31248 and previous config saved to /var/cache/conftool/dbconfig/20220717-150510-ladsgroup.json [15:05:16] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [15:15:36] (03CR) 10Ori: [C: 03+1] Actually run tests on type: php scaffold [deployment-charts] - 10https://gerrit.wikimedia.org/r/813843 (owner: 10JMeybohm) [15:20:15] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31249 and previous config saved to /var/cache/conftool/dbconfig/20220717-152015-ladsgroup.json [15:35:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P31250 and previous config saved to /var/cache/conftool/dbconfig/20220717-153520-ladsgroup.json [15:50:25] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312984)', diff saved to https://phabricator.wikimedia.org/P31251 and previous config saved to /var/cache/conftool/dbconfig/20220717-155025-ladsgroup.json [15:50:27] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance [15:50:31] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [15:50:41] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1161.eqiad.wmnet with reason: Maintenance [15:50:42] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [15:50:58] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [15:51:03] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31252 and previous config saved to /var/cache/conftool/dbconfig/20220717-155102-ladsgroup.json [16:03:48] (03PS4) 10Majavah: wmcs: vps: remove_instance: add support for puppet deactivation [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 [16:03:50] (03PS4) 10Majavah: wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) [16:04:35] (03PS5) 10Majavah: wmcs: vps: remove_instance: add support for puppet deactivation [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 [16:04:37] (03PS5) 10Majavah: wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) [16:04:56] (03CR) 10Majavah: wmcs: vps: remove_instance: add support for puppet deactivation (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801784 (owner: 10Majavah) [16:10:45] (03CR) 10CI reject: [V: 04-1] wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) (owner: 10Majavah) [16:19:14] (03PS6) 10Majavah: wmcs: toolforge: add a cookbook to remove a grid node [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/801785 (https://phabricator.wikimedia.org/T309525) [16:49:19] PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:20:24] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31253 and previous config saved to /var/cache/conftool/dbconfig/20220717-172023-ladsgroup.json [17:20:28] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [17:35:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31254 and previous config saved to /var/cache/conftool/dbconfig/20220717-173528-ladsgroup.json [17:50:34] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P31255 and previous config saved to /var/cache/conftool/dbconfig/20220717-175034-ladsgroup.json [18:05:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312984)', diff saved to https://phabricator.wikimedia.org/P31256 and previous config saved to /var/cache/conftool/dbconfig/20220717-180539-ladsgroup.json [18:05:44] T312984: Adjust the field type of flaggedpages.fp_pending_since to fixed binary on wmf wikis - https://phabricator.wikimedia.org/T312984 [19:24:35] PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:41:42] (03CR) 10Stang: "Hi Krinkle, I made some amendments on my pervious patch, would you like to have it (and also I228490c25430035e8a0934c1587445e2aae41366) re" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799415 (https://phabricator.wikimedia.org/T305692) (owner: 10Stang) [21:27:27] RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:29:09] PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:36:37] RECOVERY - Check systemd state on an-worker1127 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:44:09] PROBLEM - Check systemd state on an-worker1127 is CRITICAL: CRITICAL - degraded: The following units failed: systemd-timedated.service,user-runtime-dir@116.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:56:47] RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:58:05] PROBLEM - SSH on mw1321.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:20:35] (03PS1) 10Tks4Fish: brwikimedia: Add logo and wordmark for vector-2022 and minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814372 (https://phabricator.wikimedia.org/T313194) [22:33:08] (03PS1) 10Tks4Fish: brwikimedia: Use logo and wordmark in vector-2022 and minerva [mediawiki-config] - 10https://gerrit.wikimedia.org/r/814373 (https://phabricator.wikimedia.org/T313194)