[00:20:07] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) As in T300324#7752134, I've rolled out all the k8s services where Envoy version was the only diff. We're now up to 1.18 everywhere, except for k8s servi... [00:20:17] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10RLazarus) [09:34:22] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp1077.eqiad.wmnet with OS buster [10:41:27] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp1077.eqiad.wmnet with OS buster com... [11:14:13] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10Majavah) [11:14:44] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10valerio.bozzolan) It seems to me that the problematic gateway is maybe `ae2.cr2-esams.wikimedia.org` so maybe #ops-esams is interested. [11:37:04] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Hi @valerio.bozzolan thank you for the report. For the affected users can you confirm the source IP they are coming from? I want t... [11:49:37] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Also @valerio.bozzolan you should feel free to email the IPs to noc@wikimedia.org if you wish to avoid putting them here wh... [11:51:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10valerio.bozzolan) [11:51:24] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10valerio.bozzolan) I've added all the details in a nice private Paste visible to you (P22947) and added it in the Task description. T... [11:54:34] 10Traffic, 10SRE: Remove image check on Varnish Dockerized Test Environment - https://phabricator.wikimedia.org/T303794 (10MMandere) 05Open→03Resolved [11:56:02] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10valerio.bozzolan) [12:20:16] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Thanks for the info @valerio.bozzolan It seems the return traffic to that address was routing out of our network to Telia... [12:45:20] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Ok I've emailed Seabone/TI NOC now, hopefully they come back with something meaningful. There isn't a whole lot more we ca... [13:31:58] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) @valerio.bozzolan the affected users are direct Telecom Italia customers is that correct? It certainly wouldn't hurt if th... [14:27:07] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Hmm ok. I can see in the traceroute it now makes it a few hops further: ` cmooney@re0.cr2-eqiad> traceroute wait 1 no-reso... [14:29:34] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10cmooney) Hmm ok. I can see in the traceroute it now makes it a few hops further: ` cmooney@re0.cr2-eqiad> traceroute wait 1 no-reso... [15:42:07] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10Jclark-ctr) [15:47:50] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Cmjohnson) @Vgutierrez the new DIMM is here, please let me know when I can make the swap [15:48:45] ^^ bblack could you coordinate that DIMM change? [15:49:08] vgutierrez: ack, yeah [15:49:16] or mmandere, cause he is already reimaging some hosts [15:49:21] thx :) [15:51:43] that host got the reimage process interrupted due to HW issues, so it will need a reimage after the DIMM is replaced BTW [15:52:55] got it [16:02:18] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Cmjohnson) 05Open→03Resolved Received the DIMM and replaced it, resolving this task [16:11:34] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1085 memory errors on DIMM A5 - https://phabricator.wikimedia.org/T303183 (10Cmjohnson) [16:30:23] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10BBlack) Arzhel and I discussed this a bit, and we're going add a few more countries manually for now before proceeding with the esams-resiliency... [16:31:13] 10Traffic, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: drmrs: initial geodns configuration - https://phabricator.wikimedia.org/T304089 (10BBlack) [17:33:46] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10valerio.bozzolan) Maybe totally unrelated, but maybe yes: https://lists.wikimedia.org/hyperkitty/list/cloud@lists.wikimedia.org/thr... [17:55:45] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE: Route problems from some gateways of Italy to WMCloud and Toolforge - https://phabricator.wikimedia.org/T304416 (10RhinosF1) That wasn't sent until way after your issues started nor were fixed. [20:28:57] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Jclark-ctr) cloudstore1010 B7 U41 port12 cableid #5014 cloudstore1011 C4 U1 port23. cableid #20220273 [20:29:19] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Jclark-ctr) [20:29:51] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [20:41:14] 10Traffic, 10SRE, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10LGoto) 05Open→03Resolved a:03LGoto