[00:03:56] (EdgeTrafficDrop) resolved: 67% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org [10:15:33] hey, just filled a task for a maps issue https://phabricator.wikimedia.org/T302862 [10:15:58] there's a weird behavior on Maps geoshape endpoint that is a possible ddos situation [10:27:08] taavi vgutierrez could you help me to find the right people that can help me understand the issue properly? [10:38:52] * vgutierrez looking [10:55:51] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10JMeybohm) p:05Triage→03Medium deployment-mediawiki11 has been replaced by deployment-mediawiki12 (although th... [11:06:26] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Vgutierrez) hmm if that's the case horizon data for deployment-prep-cache needs to be updated as well cause right... [11:21:46] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10JMeybohm) I was really just relaying from T300525 but it looks like something is off. deployment-mediawiki11 was... [11:37:00] 10Traffic, 10Beta-Cluster-Infrastructure, 10SRE, 10Beta-Cluster-reproducible: Beta cluster down: Error: 502, Next Hop Connection Failed (Feb 2022) - https://phabricator.wikimedia.org/T302699 (10Majavah) >>! In T302699#7746972, @JMeybohm wrote: > I was really just relaying from T300525 but it looks like som... [12:09:21] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [12:15:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4034:9331 is unreachable - https://alerts.wikimedia.org [12:20:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4034:9331 is unreachable - https://alerts.wikimedia.org [12:43:34] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e... [13:50:47] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [14:24:36] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e... [14:26:21] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [14:35:03] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e... [14:35:26] cp4034 is definitely messing with me [14:38:01] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [14:38:11] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e... [14:38:17] it looks like I lose console as soon as the OS tries to boot [14:38:33] volans: are you aware of any issues with the latest HW version of the Dell servers? [14:41:02] spicerack.dhcp.DHCPError: target file ttyS1-115200/cp4034.conf exists [14:41:09] and spicerack isn't happy about retrying [14:41:51] vgutierrez: new host or reimage to buster? [14:41:58] is the firmware upgraded? [14:41:59] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [14:42:02] attempting to reinstall [14:42:06] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster e... [14:42:10] it failed several times [14:42:15] it was already running BTW [14:42:29] same OS? [14:42:31] yes [14:42:37] which hostname? I can check some logs [14:42:40] cp4034 [14:42:56] vgutierrez: is ulsfo ok network wise? [14:43:02] AFAIK yes [14:43:04] I thought we have issues [14:43:07] ouch [14:43:23] it's depooled [14:43:44] that would explain it :/ [14:43:55] so how I fix the spicerack.dhcp.DHCPError: target file ttyS1-115200/cp4034.conf exists? [14:44:08] and I'll wait till ulsfo comes back to life to continue of course :) [14:44:47] see all the backlog in the private chan, cable cut AFAIK [14:45:10] yep, I even saw the emails last night [14:47:33] if you want I can look why it's failing [14:47:41] but it seems reasonable to bet it might be related [14:48:48] so as soon as the net issue is fixed the cookbook should be able to clean that DHCP file? [14:51:36] let me check at which state it is [14:52:23] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp5014.eqsin.wmnet with OS buster [14:57:06] vgutierrez: some weird things from the logs, help me to make sense of them [14:57:27] at 14:33:48 it got Host up (Debian installer), so it did boot in d-i [14:57:55] at 14:34:54 Ctrl+c pressed [14:58:53] and right before the ctrl+c was logged it had started to cleanup the DHCP config [14:58:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp5014:9331 is unreachable - https://alerts.wikimedia.org [14:58:57] but didn't make it [14:59:07] was by any chance ctrl+c hit multuple times? [15:02:37] hmmm yeah, that could be it [15:02:41] layer 8 issue, sorry [15:03:00] * volans removing the stale file, so you don't have issues later [15:03:28] thx <3 [15:03:40] also from what I've seen in the logs the reimage might work, but given the network issues, if you don't want to check your luck or it's super urgent [15:03:46] I'd wait for when it's fixed :) [15:03:51] no prob :) [15:03:55] hmm it didn't work on the first attempt [15:03:59] so I'll wait [15:04:35] hopefully cp5014 behaves as expected [15:05:56] [installer currently running fdisk there] [15:13:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp5014:9331 is unreachable - https://alerts.wikimedia.org [15:49:40] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp5014.eqsin.wmnet with OS buster c... [15:56:39] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp3061.esams.wmnet with OS buster [16:03:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp3061:9331 is unreachable - https://alerts.wikimedia.org [16:18:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp3061:9331 is unreachable - https://alerts.wikimedia.org [16:50:47] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp3061.esams.wmnet with OS buster c... [16:54:26] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [18:16:04] 10netops, 10Discovery, 10Infrastructure-Foundations, 10SRE: Speed up network connections for Elastic hosts - https://phabricator.wikimedia.org/T301577 (10bking)