[01:41:58] (PurgedHighEventLag) firing: High event process lag with purged on cp5019:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5019 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [01:46:58] (PurgedHighEventLag) resolved: (32) High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [05:25:12] (LVSHighCPU) firing: The host lvs1020:9100 has at least its CPU 22 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1020 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [05:30:12] (LVSHighCPU) resolved: (2) The host lvs1020:9100 has at least its CPU 22 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1020 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [05:58:12] (LVSHighCPU) firing: The host lvs1018:9100 has at least its CPU 7 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [06:03:12] (LVSHighCPU) resolved: The host lvs1018:9100 has at least its CPU 7 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [06:08:12] (LVSHighCPU) firing: The host lvs1018:9100 has at least its CPU 29 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [06:13:12] (LVSHighCPU) resolved: (2) The host lvs1018:9100 has at least its CPU 29 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1018 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [09:09:10] hello! I have a service that I'd like to move to lvs_setup if suitable https://gerrit.wikimedia.org/r/c/operations/puppet/+/920664 Would it be alright to do that today? [11:08:48] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [11:10:31] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [12:13:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) @papaul when you are back can you advise on the status of these? They all appear as connected on asw-b1-codfw... [12:25:42] (SystemdUnitFailed) firing: (3) cadvisor.service Failed on cp2027:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:30:42] (SystemdUnitFailed) firing: (30) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:35:42] (SystemdUnitFailed) firing: (33) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:37:37] 10Traffic, 10Patch-For-Review: Let HAProxy handle port 80 - https://phabricator.wikimedia.org/T323557 (10Fabfur) As the cookbook is ready for testing we can try to merge the hieradata and do the puppet disable on the codfw hosts. [12:40:42] (SystemdUnitFailed) firing: (33) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:50:42] (SystemdUnitFailed) firing: (33) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:00:42] (SystemdUnitFailed) firing: (32) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:05:42] (SystemdUnitFailed) firing: (32) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:23:14] vgutierrez: o/ do we still have a testing cp node with varnishkafka by any chance? [13:23:30] :? [13:23:33] WDYM? [13:23:39] all of them are running varnishkafka AFAIK [13:23:46] I'd like to move all the vk instances to pki but testing it on a cache node like pink unicorn would be node [13:23:49] *nice [13:23:49] or are you talking about atskafka? [13:23:55] no no [13:24:16] I meant if there is a cache node with vk not serving real traffic [13:24:17] oh... a pink unicorn node? nope [13:24:26] let's depool one in ulsfo :) [13:24:51] okok reviews are not ready, I wanted to pcc it first, will select a poor ulsfo node :) [13:24:54] thanks! [13:24:58] np [13:35:24] godog: in relation to your cadvisor revert, is it safe right now to disable puppet in upload@codfw? [13:36:25] vgutierrez: yes good to go! [13:37:09] ack [13:37:38] fabfur: ready to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/924444 :? [13:39:23] ready [13:39:39] vgutierrez: if you want you can test it before merging [13:39:43] with the new script :) [13:40:03] * fabfur sweat [13:40:06] so you can try the -h/--help, a dry-run and spot any obvious mistakes [13:40:10] yeah........... [13:40:34] volans: are you talking about hieradata or the cookbook itself? [13:40:41] the cookbook itself [13:41:38] or it's already merged that one? [13:41:48] the cookbook is merged but untested [13:42:03] ahhh ok then no need for the test-cookbook script then [13:42:23] but still you might want to test it in dry-run first [13:42:52] volans: yeah :) [13:50:42] (SystemdUnitFailed) firing: (20) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:50:45] volans: hmm dry-run should be listed as a valid option on the cookbook help? [13:51:04] no, that's on cookbook -h [13:51:11] general args, not cookbook-specific [13:51:19] gotcha [13:55:42] (SystemdUnitFailed) resolved: (20) cadvisor.service Failed on cp1089:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:21:27] volans: could you please check https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/924527 ? [14:27:10] sure [14:30:25] thanks for the comment [14:38:24] volans: how the cookbooks get deployed? puppet? [14:39:03] yes [14:39:07] force a puppet run [14:39:10] or wait :) [14:39:27] https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Deployment [15:08:32] volans: [cumin.transports.Command('run-puppet-agent --enable "vgutierrez - T323557 - vgutierrez@cumin1001"', timeout=300.0)] --> it looks like that spicerack automatically adds " - username@host" but disable-puppet won't do it [15:08:33] T323557: Let HAProxy handle port 80 - https://phabricator.wikimedia.org/T323557 [15:10:11] vgutierrez: ah right you want the .reason of that [15:10:17] self._puppet_reason = spicerack.admin_reason(reason=args.puppet_reason).reason [15:10:28] https://doc.wikimedia.org/spicerack/master/api/spicerack.administrative.html#spicerack.administrative.Reason.reason [15:10:47] my bad I missed that [15:11:10] no problem [15:20:53] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [15:21:11] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [22:48:35] 10Domains, 10Traffic, 10DNS, 10SRE: Update DNS records for mastodon.wikimedia.org - https://phabricator.wikimedia.org/T337586 (10Dzahn) Is there a place where we can read about this project and the general plan around it?