[00:03:54] RESOLVED: SystemdUnitFailed: haproxy_stek_job.service on cp6007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:48:50] 10netops, 06Infrastructure-Foundations, 06SRE: Create Quality of Service design for WMF internal networks - https://phabricator.wikimedia.org/T316358#9948690 (10cmooney) 05Open→03Resolved Gonna close this one as the design is finalised, see detail on wikitech here: https://wikitech.wikimedia.org/wik... [10:29:14] 06Traffic, 06SRE, 10SRE-swift-storage, 10Thumbor: Cache thumbs in our caching infrastructure (e.g. ATS) - https://phabricator.wikimedia.org/T345334#9948790 (10Midleading) Due to T266155, I have to keep refreshing the category page, about 5~10 times, until all 200 thumbnails are generated. Therefore some "c... [10:49:13] 06Traffic, 10Phabricator (Upstream), 10Release-Engineering-Team (Priority Backlog 📥), 07Upstream: Consider using preconnect for https://phab.wmfusercontent.org CDN - https://phabricator.wikimedia.org/T367290#9948879 (10Aklapper) Thanks. Waiting on getting this merged into upstream first not to increase our... [10:54:53] 06Traffic, 06Commons, 10MediaWiki-Uploading, 06SRE, and 2 others: 502 Server Hangup Error on esams for "Upload a new version of this file" on Special:Upload on Commons - https://phabricator.wikimedia.org/T247454#9948905 (10Aklapper) 05Stalled→03Invalid Unfortunately closing this Phabricator task as... [12:34:24] 06Traffic: text/02-frontend-headers.vtc seems to be broken - https://phabricator.wikimedia.org/T369162 (10Vgutierrez) 03NEW [12:35:01] 06Traffic: text/02-frontend-headers.vtc seems to be broken - https://phabricator.wikimedia.org/T369162#9949232 (10Vgutierrez) p:05Triage→03Medium [13:42:43] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949632 (10JMeybohm) [13:53:17] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949642 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=c8dbb89d-640c-4078-bc10-bbbe9c30f3ef) se... [13:56:12] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949650 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=753739a5-e1fb-44b6-9174-f7b3a8c4b73b) se... [13:58:55] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949656 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=185956f6-b0e6-4a89-9e32-6a8223f5678e) se... [14:00:06] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949655 (10JMeybohm) !log jayme@cumin1002 conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad... [14:01:25] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949662 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=11036a9f-0b48-4b07-9e63-571b4f67c201) se... [14:22:09] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949750 (10cmooney) Switch is back up, all looks good at first glance from the network side. [14:25:11] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949772 (10ABran-WMF) db hosts as well, repooling [14:31:13] 06Traffic, 13Patch-For-Review: text/02-frontend-headers.vtc seems to be broken - https://phabricator.wikimedia.org/T369162#9949814 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [14:33:14] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949834 (10JMeybohm) >>! In T365994#9949655, @JMeybohm wrote: > !log jayme@cumin1002 conftool action : set/pooled=no... [14:48:16] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Turn down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#9949876 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium [14:49:46] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9949915 (10Eevans) >>! In T365994#9949750, @cmooney wrote: > Switch is back up, all looks good at first glance from... [15:56:20] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9950212 (10ayounsi) 05Open→03Resolved All is done here. [17:14:46] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad - https://phabricator.wikimedia.org/T365994#9950763 (10cmooney) 05Open→03Resolved [18:11:40] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9951125 (10cmooney) So one thing I noticed is that we are not getting the stats for LAG/ae interfaces with the current setup, nor routed... [18:54:53] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951390 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo... [19:19:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951466 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo... [19:24:31] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951473 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bookwo... [19:25:26] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951474 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo... [19:55:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Get test host connected to codfw row c/d lsw's - https://phabricator.wikimedia.org/T367512#9951547 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1002 for host sretest2002.codfw.wmnet with OS bo... [22:37:41] 10netops, 06Infrastructure-Foundations, 06SRE: Should we add links between our spine switches aggregating each row of two? - https://phabricator.wikimedia.org/T369238 (10cmooney) 03NEW p:05Triage→03Low