[08:54:27] 06Traffic: Upgrade haproxy to 2.8.13 on cp hosts - https://phabricator.wikimedia.org/T383111 (10Vgutierrez) 03NEW [08:54:53] 06Traffic: Upgrade haproxy to 2.8.13 on cp hosts - https://phabricator.wikimedia.org/T383111#10436517 (10Vgutierrez) p:05Triage→03Medium [09:33:33] 06Traffic, 10Phabricator, 06SRE: Phabricator should cache tasks for a few minutes for logged-out users - https://phabricator.wikimedia.org/T274228#10436598 (10kostajh) Hi, wondering if there's interest to move this forward. @DLynch and I have a [[ https://github.com/kemayo/loosephabric | tool that integrates... [10:21:45] 10netops, 06Infrastructure-Foundations, 06SRE: Routinator 0.14 causing tempfs file system to fill up - https://phabricator.wikimedia.org/T383116 (10cmooney) 03NEW p:05Triage→03Medium [10:22:29] 10netops, 06Infrastructure-Foundations, 06SRE: Routinator 0.14 causing tempfs file system to fill up - https://phabricator.wikimedia.org/T383116#10436696 (10cmooney) [10:40:22] 10netops, 06Infrastructure-Foundations, 06SRE: Routinator 0.14 causing tempfs file system to fill up - https://phabricator.wikimedia.org/T383116#10436738 (10cmooney) [13:53:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:03:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:34:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [14:39:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [15:58:49] 10netops, 06Infrastructure-Foundations, 06SRE: Routinator 0.14 causing tempfs file system to fill up - https://phabricator.wikimedia.org/T383116#10437563 (10cmooney) [16:04:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:04:30] hmmm [16:14:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:21:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:21:11] sukhe: https://grafana.wikimedia.org/goto/ltrK1WvNg?orgId=1 could be related? [16:21:44] I don't know what local_port_7231 means though [16:22:04] port 7231 is restbase apparently [16:22:13] vgutierrez: it means envoy terminated TLS for a service with bare http at that port [16:23:19] vgutierrez: matches this peak but not the earlier ones, say at ~14:34 [16:24:07] https://grafana.wikimedia.org/goto/SAR1JZvNg?orgId=1 thanos somewhat does but then again, zooming out for a longer view period, doesn't seem to be a problem? [16:24:34] btw I was staring at the "all clusters network traffic" dashboard [16:25:57] sukhe: thanos would explain the huge request side I guess [16:26:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:27:08] in a few hours, internal netflow will be available [16:27:47] yeah :] [16:28:49] seeing lvs2013 traffic it must be related to something deployed yesterday? [16:30:13] yeah.. the peaks in lvs2103 match thanos-swift traffic [16:39:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [16:39:58] ok. time to drop everything else and look at this because clearly it's not a blip. [16:46:24] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [17:04:39] ottomata: You still up for the eventproxy vcl push? [17:04:51] hello! [17:05:09] sorry about tat [17:05:32] we've never met in person or video (i think?) so I spent 5 minutes trying to remember who bcornwall was on IRC :0 [17:05:35] but hello! [17:05:37] yes let's do it! [17:05:44] haha, yeah, no prob [17:06:03] okay i'll merge the first one and apply on cp1100 and test ther [17:06:05] ya? [17:06:11] Sounds good, yeah [17:11:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [17:11:41] brett: looks good! https://phabricator.wikimedia.org/T353817#10437923 [17:12:10] nice, yeah! [17:12:13] okay to proceed with fleet? [17:12:18] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1105078 [17:12:57] We're currently having some lvs weirdness - sukhe, you want us to hold off? [17:14:56] brett: I don't think it should affect it but yeah probably best to wait in a way. [17:15:28] ok [17:15:50] ottomata: I'll keep you updated [17:16:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [17:16:32] ty [17:19:03] brett, ottomata: we got a new alert triggered on cp1100: NRPE: Command 'check_varnishkafka-eventlogging' not defined [17:19:24] ack [17:19:38] that might be resolved by running puppet on the icinga host [17:20:18] doing [17:23:02] oh [17:23:11] ty [17:24:52] yep, all green now [17:28:45] thx <3 [17:34:02] 06Traffic, 06Commons, 06SRE: Backend fetch failed - https://phabricator.wikimedia.org/T383013#10438014 (10Dzahn) [17:35:09] 06Traffic, 06SRE: "Backend fetch failed" on edit save - https://phabricator.wikimedia.org/T382790#10438017 (10Dzahn) [17:38:48] sukhe: Now that we've figured out the source of the LVS stuff, mind if we push this change fleet-wide? I don't see it having any affect on the LVS issue. [17:39:08] brett: yeah go for it [17:39:09] gl [17:41:05] okay! [17:41:28] Reviewing the patc h [17:41:34] ty [17:46:09] +1 [17:50:05] okay! merging. [17:51:26] brett I suppose we should just wait for puppet to run? or should I do a cumin run-puppet-agent thing? [17:51:26] Running puppet on the alerting hosts [17:51:40] brett: iirc, that won't do anything until after puppet runs on the cache hosts [17:51:48] i think it uses um...virtual or exported resources or something [17:51:52] Good point [17:52:03] Since I'm logged in to cumin I'll go ahead and deploy it [17:52:07] okay ty! [17:52:15] puppet-merge running [17:53:51] puppet-merge done [17:53:56] brett: go ahead and cuminize [17:54:07] ack, running now [18:01:42] done, now running on alert* [18:02:41] okay! [18:04:19] alerts are gone [18:04:39] we're golden! [18:06:22] yes! [18:06:28] thigns looking good! [18:06:28] https://phabricator.wikimedia.org/T353817#10438164 [18:08:08] thank you brett! pending any unforseen snafoos, we should be done! [18:08:11] hoooraayyy! [18:08:18] Yay. Thanks for doing this :) [18:08:50] thank youu! [18:11:36] 06Traffic, 06SRE, 10Wikidata, 06Wikidata Dev Team, 07Performance Issue: Frequent 500 Errors and Timeouts When Adding Statements to New Properties - https://phabricator.wikimedia.org/T374230#10438174 (10Dzahn) [18:56:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [19:00:35] denisse: happened again, updated task. timing matches [19:00:37] ^ [19:01:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [19:20:11] sukhe: Thanks for updating the task! [20:07:59] 10Domains, 06Traffic, 06SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080#10438607 (10CRoslof) It took quite a while to go through the formal processes (after attempting to simply acquire them directly), but the Foundation now has control of `wikipedia.ro` and `wikimedia.ro`. They ar... [20:29:20] 10Domains, 06Traffic, 06SRE, 13Patch-For-Review: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080#10438677 (10Dzahn) 05Stalled→03Open [21:51:32] 10Domains, 06Traffic, 06SRE: Register wiki(m|p)edia.ro - https://phabricator.wikimedia.org/T222080#10438925 (10BCornwall) Thank you, @CRoslof!