[06:34:28] 10netops, 10Infrastructure-Foundations, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) p:05Triage→03Low [06:35:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10ayounsi) [06:35:11] 10netops, 10Infrastructure-Foundations, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) [07:08:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) [07:33:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Decommission asw-b1-codfw - https://phabricator.wikimedia.org/T342076 (10ayounsi) a:03ayounsi [07:42:58] 10netops, 10Infrastructure-Foundations, 10SRE: Packet Drops on Eqiad ASW -> CR uplinks - https://phabricator.wikimedia.org/T291627 (10cmooney) 05Open→03Resolved I’m going to close this task for now. The problem has been mitigated as best as possible with the current equipment we have. In time replacing... [09:57:31] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10fgiunchedi) [10:14:22] 10Traffic: Let HAProxy handle port 80 - https://phabricator.wikimedia.org/T323557 (10Fabfur) The HAProxy configuration on all DCs has been updated to apply `silent-drop` to abusive clients hitting port 80, as been already done for port 443. To check (eg. from cumin) if HAProxy is "silent-dropping" connections:... [10:16:03] 10Traffic, 10SRE: port 80 paging on scheduled single host maintenance in text@esams - https://phabricator.wikimedia.org/T339898 (10Fabfur) [10:16:07] 10Traffic, 10SRE: provide haproxy silent-drop support for port 80 as well - https://phabricator.wikimedia.org/T340983 (10Fabfur) 05Open→03Resolved a:03Fabfur The HAProxy configuration on all DCs has been updated to apply silent-drop to abusive clients hitting port 80, as been already done for port 443.... [11:26:35] 10netops, 10Infrastructure-Foundations, 10SRE: TLS certificates for network devices - https://phabricator.wikimedia.org/T334594 (10ayounsi) `name=SONiC refresh needed verbose ayounsi@cumin1001:~$ sudo cookbook -v sre.network.tls lsw1-e8-eqiad START - Cookbook sre.network.tls for network device lsw1-e8-eqiad... [13:31:53] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) >>! In T341992#9019625, @Vgutierrez wrote: > @ayounsi @cmooney could you let DCops know which racks would be better for these boxes? Thanks! I am on-site this week in eqiad. Can I get... [13:33:19] ^^ XioNoX [13:51:46] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10cmooney) @RobH There is no real preference on my side. I would say pick one rack from E1/E2/E3/F1/F2/F3 and put the first 3 of them in that one, then place lvs1016 in a different rack from th... [14:00:09] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Vgutierrez) Thanks @cmooney, @Fabfur will take care of running the decom cookbook (thanks!) [14:03:07] 10Traffic, 10SRE: increased 5xx rate for esams frontend traffic - https://phabricator.wikimedia.org/T342121 (10Joe) [14:05:55] sukhe: ^^ wanna reply on that one? [14:09:25] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) p:05Triage→03Medium [14:15:04] sure [14:23:15] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: `lvs1013.eqiad.wmnet` - lvs1013.eqiad.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager... [14:24:46] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) lvs1013.eqiad.wmnet has been decommissioned via cookbook @Tue 18 Jul 2023 02:24:10 PM UTC [14:25:05] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [14:38:02] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: `lvs1014.eqiad.wmnet` - lvs1014.eqiad.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager... [14:38:27] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [14:39:11] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [14:40:09] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [14:42:10] heads-up: Traffic is upgrading the internal recursors in core sites this week. this is mostly a heads-up because of the EDNS client subnet issues we saw once after the upgrade [14:42:14] those have been resolved but if you see something, please let us know, thanks [14:42:34] 10Traffic, 10SRE: increased 5xx rate for esams frontend traffic - https://phabricator.wikimedia.org/T342121 (10cmooney) @TheDJ thanks for reporting this, indeed it does not look right and was an oversight by myself after we re-pooled esams earlier today. We did some work earlier moving equipment in one of our... [14:42:53] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/e8c3cdba56419c16d3bda8384eb7cd1c380ded9f%5E%21/#F0 for the last fix [14:50:13] 10Traffic, 10SRE: increased 5xx rate for esams frontend traffic - https://phabricator.wikimedia.org/T342121 (10TheDJ) 05Open→03Resolved a:03TheDJ Thank you, seems fixed now indeed. [15:02:30] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: `lvs1015.eqiad.wmnet` - lvs1015.eqiad.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager... [15:05:36] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [15:18:20] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host lvs1013.eqiad.wmnet with OS bullseye [15:20:51] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host lvs1013.eqiad.wmnet with OS bullseye executed with errors: - lvs1013 (**FAIL**) - Removed from Pup... [15:22:59] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host lvs1013.eqiad.wmnet with OS bullseye [15:31:36] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [15:32:02] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by fabfur@cumin1001 for hosts: `lvs1016.eqiad.wmnet` - lvs1016.eqiad.wmnet (**WARN**) - Downtimed host on Icinga/Alertmanager... [16:01:34] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host lvs1013.eqiad.wmnet with OS bullseye executed with errors: - lvs1013 (**FAIL**) - Removed from Pup... [16:02:57] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) [16:04:17] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) [17:30:58] 10Traffic, 10SRE, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10RobH) [18:28:29] 10Traffic, 10SRE: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ssingh) [18:28:39] 10Traffic, 10SRE: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ssingh) p:05Triage→03Medium [19:08:12] 10Traffic, 10DC-Ops, 10ops-eqiad: Q3:rack/setup/install cp1[098-113] - https://phabricator.wikimedia.org/T342159 (10RobH) [19:10:36] 10Traffic, 10DC-Ops, 10ops-eqiad: Q3:rack/setup/install cp1[098-113] - https://phabricator.wikimedia.org/T342159 (10RobH) a:03ssingh Please note parent task 341588 has the range of cp1[090-105] however, cp1090 is already live/in use. Additionally, we have 4 cp hosts from eqsin to use for CP in eqiad (so c... [19:10:44] 10Traffic, 10DC-Ops, 10ops-eqiad: Q3:rack/setup/install cp1[098-113] - https://phabricator.wikimedia.org/T342159 (10RobH) [19:10:59] 10Traffic, 10DC-Ops, 10ops-eqiad: Q3:rack/setup/install cp1[098-113] - https://phabricator.wikimedia.org/T342159 (10RobH) [22:21:06] 10Traffic: Add a reboot action to the Wikimedia DNS restart cookbook - https://phabricator.wikimedia.org/T342182 (10BCornwall) [22:22:59] 10Traffic: Add a reboot action to the Wikimedia DNS restart cookbook - https://phabricator.wikimedia.org/T342182 (10BCornwall) 05Open→03In progress p:05Triage→03Medium [22:32:28] 10Traffic, 10Patch-For-Review: Add a reboot action to the Wikimedia DNS restart cookbook - https://phabricator.wikimedia.org/T342182 (10BCornwall) Since this is a task created after the commit/merge, here's an associated Gerrit link: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/937173/ [22:50:37] 10netops, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE, and 2 others: Alertmanager rule for network interface errors? - https://phabricator.wikimedia.org/T335350 (10lmata) [22:50:52] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Investigate Junos Prometheus exporter - https://phabricator.wikimedia.org/T333210 (10lmata) [23:19:59] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Prometheus: ingest SONiC metrics - https://phabricator.wikimedia.org/T335027 (10lmata) [23:24:44] 10Traffic, 10SRE, 10Incident Tooling: ncredir redirects for status.wiki* --> status.wikimedia.org - https://phabricator.wikimedia.org/T318804 (10lmata)