[07:47:53] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar), 10User-ema: Package and deploy Varnish 6.0.9 - https://phabricator.wikimedia.org/T298758 (10MMandere) 05Open→03Resolved a:03MMandere We now have varnish upgraded from `6.0.8` to `6.0.9` in all our cache instances (across all datacent... [11:00:52] cdanis: increasing it to 20 secs shows some improvement already: https://grafana.wikimedia.org/d/SKf0AM17z/envoy-drill-down?orgId=1&refresh=30s&viewPanel=9&from=now-1h&to=now [13:04:17] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10Patch-For-Review: add traceroute measurements to RIPE Atlas prometheus data - https://phabricator.wikimedia.org/T251156 (10ayounsi) 05Open→03Resolved a:05CDanis→03ayounsi This is done, opened T299640 for further improvements. [15:38:13] hi everybody, created https://gerrit.wikimedia.org/r/c/operations/puppet/+/755715 to set the inference VIP to production stage :) [15:38:18] lemme know if it is ok to proceed [16:00:45] I just realized that the paging is set to false (as expected), going to merge it since it shouldn't cause troubles :) [16:27:40] elukey: ack [16:34:19] elukey: bblack: PROBLEM - LVS inference eqiad port 30443/tcp - Inference ML service IPv4 on inference.svc.eqiad.wmnet is CRITICAL: TCP CRITICAL - Invalid hostname [16:34:47] typo in hostname / DNS maybe? [16:35:18] gooood :( [16:36:07] I was about to merge the discovery entry for the DNS repo [16:36:35] ah, I feel you, I just went through that myself the first time ever a few days ago [16:36:38] nod [16:37:11] it is weird though, the svc endpoint works [16:37:49] "invalid hostname" is the same as "cant find it at all" .. or ... "it's not in some list of valid host names" [16:38:20] yes, I can resolve it in DNS [16:38:24] I am in a meeting but will fix it in max 20 mins :( [16:38:55] feels like it is missing in a list somewhere that makes it a valid name [16:39:07] ACK, i'll make the alert "handled" [16:39:20] thanks! [16:41:34] AHA, so Icinga is saying "inference.discovery.wmnet is an invalid hostname" and that is correct, it does not exist yet, not merged yet [16:42:03] it just happens to be that this service check is on the hosts " inference.svc.eqiad.wmnet" and the same for codfw [16:42:49] so that makes more sense.. it is just saying "you havent merged yet" basically, it's not claiming the svc records are not ok, just the discovery records. probably fine once you merged later [16:49:47] ah good! So in theory https://gerrit.wikimedia.org/r/c/operations/dns/+/730541 should fix it [16:50:24] yes yes now I see the error msg [16:50:29] it was confusing at first, thanks mutante [16:57:54] same here, had to see it on web instead of IRC and it became more clear [16:57:57] np [17:03:44] just merged and re-scheduled the checks, hopefully they will clear in a sec [17:03:58] the new discovery endpoint is ready [17:04:08] ;) [18:55:57] 10Traffic, 10Foundational Technology Requests, 10SRE, 10Wikimedia Enterprise, 10Wikimedia Enterprise Discussion: Allow-Listing for Enterprise IPs - https://phabricator.wikimedia.org/T294798 (10RBrounley_WMF) 05In progress→03Resolved [20:58:01] 10Traffic, 10SRE, 10Patch-For-Review, 10SRE Observability (FY2021/2022-Q3): Remove legacy ELK LVS entries - https://phabricator.wikimedia.org/T299700 (10herron) [21:00:41] 10Wikimedia-Apache-configuration, 10SRE, 10serviceops: Build a black-box httpd testing framework - https://phabricator.wikimedia.org/T236699 (10RLazarus) [23:05:03] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10Jclark-ctr)