[00:59:47] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp2041.codfw.wmnet with OS bullseye [00:59:55] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp2041.codfw.wmnet with OS bullseye executed with errors: - cp2041 (**FAIL**) - Removed from Pu... [01:01:23] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host cp2041.codfw.wmnet with OS bullseye [01:11:38] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bullseye - https://phabricator.wikimedia.org/T321309 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host cp2041.codfw.wmnet with OS bullseye executed with errors: - cp2041 (**FAIL**) - Removed from Pu... [01:19:16] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp2042:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [01:24:16] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp2042:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [09:45:48] godog: it looks like https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile?orgId=1 needs to be updated as well? [09:46:12] is ok to edit the dashboard in grafana-rw or is it stored as code somewhere else? [09:47:22] 10Traffic, 10Data Pipelines, 10Data-Engineering-Planning, 10Foundational Technology Requests, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10elukey) After a chat with Filippo, we agreed that the work on this task seems done. There m... [09:48:02] godog: and it seems that drmrs is missing there :) [09:49:05] * vgutierrez fixing the dashboard [09:54:52] vgutierrez: thank you! yeah needs updating for sure [09:58:48] godog: moved the dashboard to thanos, using one row and repeating vertically by $site [09:59:15] and excluded trafficserver_config_.+ [09:59:25] <3 <3 vgutierrez [09:59:34] that should do it [10:00:09] yeah LGTM (i.e. no data atm) [10:14:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [11:01:38] 10Traffic, 10Data Pipelines, 10Data-Engineering-Planning, 10Foundational Technology Requests, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Volans) I agree. The only thing maybe left is to check if the segment size is the correct o... [11:01:55] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Make mw-web and mw-api-ext available behind LVS - https://phabricator.wikimedia.org/T323621 (10Clement_Goubert) p:05Triage→03Medium [11:07:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1047.eqiad.wmnet with O... [11:52:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bu... [11:53:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1047.eqiad.wmnet with OS bu... [12:11:41] Hey y'all, we'd like to put some of the new mw-on-k8s services behind LVS https://gerrit.wikimedia.org/r/c/operations/puppet/+/859974 [12:14:08] Can you confirm primay/secondaries for low-traffic are eqiad lvs1019/lvs1020 and codfw lvs2009/lvs2010 [12:14:54] that's right claime, lvs1020 and lvs2010 are the secondaries [12:16:24] vgutierrez: do you have objections to me merging/pushing this change ? [12:16:31] nope [12:16:35] please go aheaed [12:16:36] *ahead [12:16:39] thanks :) [12:18:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [12:18:55] vgutierrez: The doc is not very precise on "Acknowledge upcoming PyBal IPVS diff check and PyBal connections to etcd icinga alerts regarding your change", am I supposed to preemptively silence them or just ack as they come? [12:19:17] ack them [12:19:23] ok [12:19:25] thx [12:19:34] if you silence them you could potentially hide issues in other services [12:19:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1046.eqiad.wmnet with O... [12:19:49] ack [12:37:07] Hmm I have an issue [12:37:23] One of my service ips is not showing up in ipvsadm -L -n on lvs1020: [12:37:33] cgoubert@lvs1020:~$ sudo ipvsadm -L -n | grep 10.2.2.76 [12:37:35] cgoubert@lvs1020:~$ [12:38:06] Bad patch [12:38:12] Fixing asap [12:40:10] https://gerrit.wikimedia.org/r/c/operations/puppet/+/860015 [12:44:19] +2'ing myself on this, I don't want it to linger [12:59:49] Fixed, all good. [13:02:37] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1046.eqiad.wmnet with OS bu... [13:53:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudvirt1045.eqiad.wmnet with O... [13:54:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [14:36:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudvirt1045.eqiad.wmnet with OS bu... [15:01:31] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [15:01:41] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Make mw-web and mw-api-ext available behind LVS - https://phabricator.wikimedia.org/T323621 (10Clement_Goubert) 05In progress→03Resolved [15:17:52] 10Traffic, 10SRE, 10Patch-For-Review: Deprecate and disable port 80 for one-off sites under canonical domains - https://phabricator.wikimedia.org/T238720 (10Vgutierrez) [15:34:23] 10Traffic, 10Data Pipelines, 10Data-Engineering-Planning, 10Foundational Technology Requests, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Milimetric) @Volans asked me, basically, how come `count(distinct ip)` gives slightly inacc... [15:53:11] 10Traffic, 10Data Pipelines, 10Data-Engineering-Planning, 10Foundational Technology Requests, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10Volans) Thanks a lot for the deep dive and the explanation with examples @Milimetric, much... [16:08:32] 10Traffic, 10Infrastructure-Foundations: Feature request: sre.hardware.upgrade-firmware should allow option to defer firmware installation to next reboot - https://phabricator.wikimedia.org/T323717 (10ssingh) [16:08:59] 10Traffic, 10Infrastructure-Foundations: Feature request: sre.hardware.upgrade-firmware should allow option to defer NIC firmware installation to next reboot - https://phabricator.wikimedia.org/T323717 (10ssingh) [16:09:21] 10Traffic, 10Infrastructure-Foundations: Feature request: sre.hardware.upgrade-firmware should allow option to defer NIC firmware installation to next reboot - https://phabricator.wikimedia.org/T323717 (10ssingh) p:05Triage→03Medium [16:53:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) [17:03:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10nskaggs) It's exciting to see so many successful transitions to single NIC here already! Great work! However, I also want to ask tha... [17:47:12] 10Traffic: Alert on Varnish high thread count - https://phabricator.wikimedia.org/T323723 (10BCornwall) [17:47:57] 10Traffic: Alert on Varnish high thread count - https://phabricator.wikimedia.org/T323723 (10BCornwall) p:05Triage→03Low [18:16:37] 10Traffic, 10SRE, 10ops-eqiad: Host lvs1014.mgmt is down - https://phabricator.wikimedia.org/T322933 (10Jclark-ctr) 05Open→03Resolved Replaced cable Error has cleared [18:45:33] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs4010.ulsfo.wmnet with OS buster [19:26:34] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs4010.ulsfo.wmnet with OS buster completed: - lvs4010 (**... [19:59:45] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `lvs4007.ulsfo.wmnet` - lvs4007.ulsfo.wmnet (**WARN**) - D... [20:06:21] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ssingh)