[04:05:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T401693)
[04:05:36] <stashbot>	 T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693
[04:06:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)
[04:10:06] <icinga-wm>	 PROBLEM - Host cloudcephosd1047 is DOWN: PING CRITICAL - Packet loss = 100%
[04:12:36] <icinga-wm>	 RECOVERY - Host cloudcephosd1047 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms
[04:13:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[04:23:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[04:29:46] <jinxer-wm>	 FIRING: Primary cloud switch port utilisation over 80%: Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+port+utilisation+over+80%25
[04:29:46] <jinxer-wm>	 FIRING: Primary cloud switch inbound port utilisation over 80%: Alert for device cloudsw1-f4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+inbound+port+utilisation+over+80%25
[04:29:51] <wikibugs>	 06cloud-services-team: Primary cloud switch port utilisation over 80% Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80% - https://phabricator.wikimedia.org/T402657#11113660 (10phaultfinder)
[04:29:54] <wikibugs>	 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-f4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402758 (10phaultfinder) 03NEW
[04:34:46] <jinxer-wm>	 RESOLVED: Primary cloud switch port utilisation over 80%: Device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet recovered from Primary cloud switch port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+port+utilisation+over+80%25
[04:34:46] <jinxer-wm>	 RESOLVED: Primary cloud switch inbound port utilisation over 80%: Device cloudsw1-f4-eqiad.mgmt.eqiad.wmnet recovered from Primary cloud switch inbound port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+inbound+port+utilisation+over+80%25
[07:59:06] <wikibugs>	 (03PS1) 10Muehlenhoff: Add dummy keytabs for new install servers T396487 [labs/private] - 10https://gerrit.wikimedia.org/r/1181638
[08:09:02] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11113851 (10dcaro) > Components support source_repo / source_path (maybe source_branch) in addition to source_url, which explicitly resolves the l...
[08:23:38] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764 (10dcaro) 03NEW
[08:32:50] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Monitoring/metrics for trove instances - https://phabricator.wikimedia.org/T402738#11113905 (10dcaro) This is related to {T354728}, and potentially we can add the trove metrics to the common CloudVPS project graphs for everyone to access (https://grafana-rw.wmcloud.org/dashb...
[08:33:13] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Monitoring/metrics for trove instances - https://phabricator.wikimedia.org/T402738#11113907 (10dcaro) p:05Triage→03Medium
[08:33:26] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11113908 (10dcaro) p:05Triage→03High
[08:40:44] <wikibugs>	 (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Add dummy keytabs for new install servers T396487 [labs/private] - 10https://gerrit.wikimedia.org/r/1181638 (owner: 10Muehlenhoff)
[08:53:38] <wikibugs>	 06cloud-services-team, 10Toolforge: [lima-kilo] Improve convergence - https://phabricator.wikimedia.org/T402672#11114052 (10dcaro) Ansible has many shortcomings when trying to make it re-entrant, essentially you have to implement most if not all the logic yourself. We had some of that code in lima-kilo in the...
[08:56:34] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:56:46] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:56:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:56:58] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:57:42] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1059 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:57:58] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirtlocal1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[08:58:58] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:00:18] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:00:34] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1070 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:00:42] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1071 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:00:58] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1069 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:01:22] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:01:52] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirtlocal1003 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:01:54] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1059 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:01:56] <jinxer-wm>	 FIRING: [11x] SystemdUnitDown: The service unit libvirtd-admin.socket is in failed status on host cloudvirt1070. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[09:03:52] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1056 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:04:42] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1059 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:04:54] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1059 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:05:42] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1051 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:08:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1063 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:18] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1046 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:18] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1056 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:18] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1062 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:22] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1045 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:22] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1050 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1051 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:23] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1058 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:24] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1053 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:25] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1076 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:26] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1064 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:27] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1054 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:28] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1052 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:29] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1057 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:30] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1068 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:31] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1047 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:32] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1074 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:33] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1065 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:34] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1049 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:35] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1043 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:36] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1070 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:37] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1075 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:38] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1041 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:39] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1070 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:42] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:42] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1073 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:42] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1071 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:43] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1051 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:44] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1067 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:46] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1071 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:54] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1056 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:54] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:58] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1061 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:58] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1069 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:09:59] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1072 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:10:02] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1048 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:10:30] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1065 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:22] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1076 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:26] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirtlocal1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:26] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1054 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:28] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1068 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:28] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1074 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:34] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1049 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:35] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:35] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1075 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:42] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:54] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1063 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:58] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1061 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:11:58] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1072 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:02] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1048 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:18] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1046 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:18] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1062 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:22] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1053 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:22] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1050 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1058 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:23] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1045 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:24] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1064 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:26] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1052 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:28] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1057 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:28] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1047 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:34] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:42] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1073 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:12:42] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1067 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:14:46] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirtlocal1002 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:14:54] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirtlocal1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:16:44] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloud-private network and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145#11114127 (10fgiunchedi)
[09:18:00] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirtlocal1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:18:26] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirtlocal1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:18:46] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirtlocal1002 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:18:54] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirtlocal1003 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:18:55] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirtlocal1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[09:21:51] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloud-private network and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145#11114134 (10fgiunchedi) A little bumpy since `nova-compute` and `libvirtd` were down during the first puppet run, and `nova-compute` down...
[09:27:26] <jinxer-wm>	 RESOLVED: [43x] SystemdUnitDown: The service unit libvirtd-admin.socket is in failed status on host cloudvirt1051. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[09:32:47] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Evaluate higher level signals for nova troubles rather than paging on nova-compute down - https://phabricator.wikimedia.org/T402778 (10fgiunchedi) 03NEW
[09:33:43] <wikibugs>	 (03update) 10dcaro: openapi: add the internal server and some description [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/76 (https://phabricator.wikimedia.org/T402032)
[10:01:17] <wikibugs>	 (03update) 10dcaro: openapi: add the internal server and some description [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/76 (https://phabricator.wikimedia.org/T402032)
[11:25:40] <wikibugs>	 (03update) 10dcaro: openapi: add the internal server and some description [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/76 (https://phabricator.wikimedia.org/T402032)
[11:26:34] <wikibugs>	 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [components-api] store the config used for the deployment in the deployment themselves - https://phabricator.wikimedia.org/T400064#11114435 (10dcaro) 05In progress→03Resolved
[11:27:40] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11114436 (10dcaro) a:03dcaro
[11:27:45] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors - https://phabricator.wikimedia.org/T402032#11114441 (10dcaro) 05Open→03In progress
[11:28:44] <wikibugs>	 (03update) 10damian: kubectl alias - use blockinfile [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/262
[11:29:03] <wikibugs>	 (03update) 10damian: install-binary-from-url - add checksums for dest [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/263
[11:29:15] <wikibugs>	 (03update) 10damian: harbor - only download and setup once [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/265
[11:29:21] <wikibugs>	 (03update) 10damian: harbor - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/267
[11:29:50] <wikibugs>	 (03update) 10damian: docker - move restart to handler [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/264
[11:29:59] <wikibugs>	 (03update) 10damian: tool home dir - update permissions [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/268
[11:30:10] <wikibugs>	 (03update) 10damian: deploy components - don't report as changed [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/266
[11:31:40] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787 (10fnegri) 03NEW
[11:32:25] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114465 (10Ladsgroup) We have a patch for it ready even!
[11:33:12] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114467 (10Ladsgroup) https://gerrit.wikimedia.org/r/c/operations/puppet/+/1178899  My plan was to merge this this we...
[11:33:19] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114468 (10fnegri) Nice, I missed that! :)
[11:34:19] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114471 (10fnegri) That should be fine, with the email to cloud-announce you already planned.
[11:35:19] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114473 (10fnegri)
[11:36:01] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114475 (10Ladsgroup) >>! In T402787#11114471, @fnegri wrote: > That should be fine, with the email to cloud-announce...
[11:38:31] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114479 (10Zabe) >>! In T402787#11114475, @Ladsgroup wrote: >>>! In T402787#11114471, @fnegri w...
[11:38:35] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114480 (10fnegri) > I've sent that already last week, didn't I? With corresponding tech news e...
[11:39:30] <wikibugs>	 10Toolforge (Toolforge iteration 23): [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11114482 (10dcaro)
[11:40:04] <wikibugs>	 10Toolforge (Toolforge iteration 23): [jobs-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402569#11114484 (10dcaro)
[11:41:29] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114486 (10Ladsgroup) I'm actually waiting for this issue of tech news to reach people (later t...
[11:42:07] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 13Patch-For-Review: [wikireplicas] Remove rc_new from recentchanges view definitions - https://phabricator.wikimedia.org/T402787#11114487 (10fnegri) Sounds good!  Sorry for the noise, I saw that {T36320} was resolved, so I th...
[11:43:58] <wikibugs>	 10Toolforge (Toolforge iteration 23): [jobs-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402569#11114493 (10dcaro)
[11:44:20] <wikibugs>	 10Toolforge (Toolforge iteration 23): [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11114494 (10dcaro)
[11:44:41] <wikibugs>	 10Toolforge (Toolforge iteration 23): [jobs-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402569#11114496 (10dcaro) p:05Triage→03High a:03dcaro
[11:45:03] <wikibugs>	 10Toolforge (Toolforge iteration 23): [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11114498 (10dcaro)
[11:45:13] <wikibugs>	 10Toolforge (Toolforge iteration 23): [components-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402572#11114501 (10dcaro) p:05Triage→03Medium
[11:46:41] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api,beta] Config not updated from remote source - https://phabricator.wikimedia.org/T401868#11114502 (10DamianZaremba) >>! In T401868#11113851, @dcaro wrote: >> Components support source_repo / source_path (maybe source_branch) in additio...
[11:51:34] <wikibugs>	 (03open) 10arthurtaylor: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504)
[11:55:26] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11114538 (10DamianZaremba) It would be a breaking change, but perhaps: ` source:   url: `  ` source:   repo_url:    branch: main `  That...
[11:57:24] <wikibugs>	 (03update) 10arthurtaylor: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504)
[12:01:55] <wikibugs>	 (03update) 10arthurtaylor: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504)
[12:06:47] <wikibugs>	 06cloud-services-team, 10Toolforge: [components-api] split source from config - https://phabricator.wikimedia.org/T402790 (10DamianZaremba) 03NEW
[12:07:51] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11114583 (10DamianZaremba) I made https://phabricator.wikimedia.org/T402790 as it's not directly related to this, but implementation of t...
[12:35:42] <wikibugs>	 (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/43
[12:35:43] <wikibugs>	 (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/12
[12:53:09] <wikibugs>	 (03open) 10vriaa: feat: Make editor responsive [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/21
[13:05:09] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799 (10Andrew) 03NEW
[13:05:30] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114773 (10Andrew)
[13:05:36] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114774 (10Andrew) p:05Triage→03Medium
[13:07:10] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11114777 (10Andrew)
[13:08:27] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114780 (10TheDJ) @Chippyy can you check why warper data and your home directory have this much data stored ?
[13:09:53] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114784 (10TheDJ) I also think we can delete the tiles directory, as we no longer run a tiles server as before in that group.  Do you agree @dschwen ?
[13:16:53] <wikibugs>	 10Toolforge (Toolforge iteration 23): [jobs-api] handle non-passed arguments and defaults consistently - https://phabricator.wikimedia.org/T402569#11114812 (10dcaro) 05Open→03In progress
[13:19:00] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114814 (10taavi)
[13:19:36] <wikibugs>	 (03open) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[13:20:54] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11114822 (10dcaro) +1 for both, though being non-backwards compatible we will have to support both syntaxes for a while
[13:36:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) (T401693)
[13:37:02] <stashbot>	 T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693
[13:40:09] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114906 (10Chippyy) >>! In T402799#11114779, @TheDJ wrote: > @Chippyy can you check why warper data and your home directory have this much data stored ?  /home/warperdata is the main storage for the Wikimaps...
[13:48:41] <wikibugs>	 06cloud-services-team, 10Huggle: huggle-nfs volume filling up - https://phabricator.wikimedia.org/T402806 (10Andrew) 03NEW
[13:50:40] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114963 (10dschwen) I don't see 1.4GB in my home dir when I log onto maps-wma2. The big chunk on nfs are map tiles for the WikiMiniAtlas
[13:51:41] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11114966 (10dschwen) ` dschwen@maps-wma2:/mnt/nfs/secondary-maps/home/dschwen$ du -sch * 4.0K README.check_apaches 54M apache_heartbeat.log 4.0K apache_heartbeat.sh 13M bin 7.3M git 20K hosts 4.0K install.sh 4...
[13:54:13] <wikibugs>	 06cloud-services-team: wikidumpparse NFS volume filling up - https://phabricator.wikimedia.org/T402807 (10Andrew) 03NEW
[13:59:27] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11115023 (10Andrew)
[14:14:41] <wikibugs>	 (03open) 10damian: Draft: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121
[14:14:49] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115071 (10Andrew)   ` root@maps-wma2:/home/dschwen# du -h -d1 . 8.0K ./.config 7.3M ./git 13M ./bin 8.0K ./.gnupg 57M ./.cache 772M ./.vscode-server 36K ./.ssh 223M ./.local 4.0K ./.nano 16K ./.myconfig 1.1G...
[14:16:23] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115091 (10TheDJ) > /home/warperdata is the main storage for the Wikimaps Warper application: https://warper.wmflabs.org/  FYI: I don't think we should have 1.8TB in a home directory...  this is what we have...
[14:16:40] <wikibugs>	 (03update) 10damian: Draft: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121
[14:18:00] <wikibugs>	 (03update) 10damian: Draft: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121
[14:18:03] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115098 (10dschwen) D'Uh. Ok, I can delete vscode server, but it'll be redownloaded when I connect again.
[14:18:18] <wikibugs>	 (03update) 10damian: Draft: Add validated type for git urls [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121
[14:21:15] <wikibugs>	 (03update) 10lucaswerkmeister-wmde: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504) (owner: 10arthurtaylor)
[14:21:25] <wikibugs>	 (03update) 10lucaswerkmeister-wmde: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504) (owner: 10arthurtaylor)
[14:22:29] <wikibugs>	 (03approved) 10lucaswerkmeister-wmde: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504) (owner: 10arthurtaylor)
[14:25:12] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115134 (10Andrew) side-note: this server is due for a rebuild on Trixie. If you wind up doing anything that requires scheduled downtime let me know and we can do the rebuild at the same time
[14:28:51] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[14:32:04] <wikibugs>	 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11115161 (10DamianZaremba) First stab at this:  https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121 (v...
[14:33:59] <wikibugs>	 06cloud-services-team, 10VPS-Projects: wikidumpparse NFS volume filling up - https://phabricator.wikimedia.org/T402807#11115178 (10taavi)
[14:34:37] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115183 (10TheDJ) >>! In T402799#11115098, @dschwen wrote: > D'Uh. Ok, I can delete vscode server, but it'll be redownloaded when I connect again.   Instead of worrying about 1.1GB, I suggest we delete: 4.1T...
[14:38:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[14:39:38] <wikibugs>	 06cloud-services-team, 10Maps: maps NFS volume filling up - https://phabricator.wikimedia.org/T402799#11115222 (10taavi) >>! In T402799#11115098, @dschwen wrote: > D'Uh. Ok, I can delete vscode server, but it'll be redownloaded when I connect again.   The [[ https://code.visualstudio.com/docs/remote/vscode-ser...
[14:39:56] <wikibugs>	 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693#11115223 (10Andrew)
[14:54:58] <wikibugs>	 (03open) 10damian: README - drop --workers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/122
[15:09:25] <wikibugs>	 (03approved) 10audreypenven: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504) (owner: 10arthurtaylor)
[15:10:36] <wikibugs>	 (03merge) 10audreypenven: Cache job timing information per class rather than per job [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/11 (https://phabricator.wikimedia.org/T402504) (owner: 10arthurtaylor)
[16:00:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[16:03:55] <wikibugs>	 (03open) 10fnegri: Setup pytest, add first test [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/4
[16:27:39] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[16:27:57] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[16:30:11] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06The-Wikipedia-Library, 07Epic, 10Moderator-Tools-Team (Kanban): hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056#11115842 (10jsn.sherman) After significant cleanup, we won't need to request additional storage.  I'...
[16:31:55] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[16:37:51] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[16:54:04] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[17:03:53] <wikibugs>	 (03update) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/43 (owner: 10l10n-bot)
[17:05:17] <wikibugs>	 (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/43 (owner: 10l10n-bot)
[17:05:21] <wikibugs>	 (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/43 (owner: 10l10n-bot)
[17:08:46] <wikibugs>	 (03update) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/12 (owner: 10l10n-bot)
[17:10:13] <wikibugs>	 (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/12 (owner: 10l10n-bot)
[17:10:17] <wikibugs>	 (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/12 (owner: 10l10n-bot)
[17:11:35] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[17:13:53] <wikibugs>	 10VPS-Projects, 10Content-Transform-Team (Work In Progress), 07Essential-Work: Request new VPS for Content Transform Team Visual Diff teating - https://phabricator.wikimedia.org/T402836 (10cscott) 03NEW
[17:20:52] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[17:30:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[17:30:38] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0)
[17:31:07] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node
[17:33:59] <wikibugs>	 (03open) 10dcaro: dump: skip unset keys [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/124
[17:34:16] <wikibugs>	 (03update) 10dcaro: api: add `include_unset` parameter to get_job and get_jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/205 (https://phabricator.wikimedia.org/T402569)
[17:35:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 670 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[17:35:22] <wikibugs>	 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has 670 slow ops - https://phabricator.wikimedia.org/T402839 (10phaultfinder) 03NEW
[17:35:22] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97)
[17:36:04] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node
[17:37:28] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_node (exit_code=97)
[17:38:10] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[17:38:24] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node
[17:38:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-67 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:39:17] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0)
[17:39:28] <wmcs-alerts>	 FIRING: InstanceDown: Project toolsbeta instance toolsbeta-puppetdb-03 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:40:09] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node
[17:40:57] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0)
[17:41:56] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:43:28] <wmcs-alerts>	 RESOLVED: [2x] InstanceDown: Project tools instance tools-k8s-worker-nfs-67 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:44:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project toolsbeta instance toolsbeta-puppetdb-03 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[17:46:56] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:48:46] <jinxer-wm>	 FIRING: Primary cloud switch port utilisation over 80%: Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+port+utilisation+over+80%25
[17:48:50] <wikibugs>	 06cloud-services-team: Primary cloud switch port utilisation over 80% Alert for device cloudsw1-c8-eqiad.mgmt.eqiad.wmnet - Primary cloud switch port utilisation over 80% - https://phabricator.wikimedia.org/T402657#11116181 (10phaultfinder)
[17:50:00] <jinxer-wm>	 FIRING: Primary cloud switch inbound port utilisation over 80%: Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80%   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+cloud+switch+inbound+port+utilisation+over+80%25
[17:50:12] <wikibugs>	 06cloud-services-team: Primary cloud switch inbound port utilisation over 80% Alert for device cloudsw1-e4-eqiad.mgmt.eqiad.wmnet - Primary cloud switch inbound port utilisation over 80% - https://phabricator.wikimedia.org/T402658#11116200 (10phaultfinder)
[17:51:56] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:57:31] <wmcs-alerts>	 FIRING: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of -1 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[17:57:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:02:56] <wmcs-alerts>	 FIRING: HarborComponentDown: A Harbor component is down. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[18:17:39] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 51 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[18:18:55] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[18:19:26] <jinxer-wm>	 RESOLVED: SystemdUnitDown: The service unit disable-tool.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:19:26] <wmcs-alerts>	 RESOLVED: HarborComponentDown: A Harbor component is down. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown
[18:30:01] <wmcs-alerts>	 RESOLVED: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[18:46:00] <wikibugs>	 (03update) 10vriaa: feat: Make editor responsive [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/21
[19:06:46] <wikibugs>	 (03update) 10vriaa: feat: Make editor responsive [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/21
[19:57:20] <wikibugs>	 06cloud-services-team, 10VPS-Projects: wikidumpparse NFS volume filling up - https://phabricator.wikimedia.org/T402807#11116695 (10Peachey88)
[20:22:39] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06The-Wikipedia-Library, 07Epic, 10Moderator-Tools-Team (Kanban): hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056#11116779 (10jsn.sherman) okay, new instance is up and running and the old instance is shut off; we'l...
[21:07:14] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T401693)
[21:07:22] <stashbot>	 T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693
[21:14:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[21:17:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) (T401693)
[21:17:57] <stashbot>	 T401693: Put cloudcephosd10[42-47] in service - https://phabricator.wikimedia.org/T401693
[21:24:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[21:38:56] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:43:56] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[22:18:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-81 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[22:22:01] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-81
[22:28:06] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-81
[22:33:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-81 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[23:12:49] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[23:22:48] <jinxer-wm>	 FIRING: [2x] PuppetZeroResources: Puppet has failed generate resources on cloudweb1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[23:37:48] <jinxer-wm>	 FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudweb1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources