[00:06:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 43.86614804838863% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:06:56] (HAProxyEdgeTrafficDrop) firing: (4) 16% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:11:16] (VarnishTrafficDrop) resolved: (8) Varnish traffic in codfw has dropped 22.67551285226461% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:11:56] (HAProxyEdgeTrafficDrop) resolved: (6) 23% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:00:31] not sure who in traffic would be best, but this is a simple one: https://gerrit.wikimedia.org/r/c/operations/puppet/+/852130 [09:55:11] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [09:55:51] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Create mw-videoscaler helmfile deployment - https://phabricator.wikimedia.org/T321899 (10Clement_Goubert) 05Open→03Stalled Following https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/850095/comment/af29135f_66a53696/ We still hav... [10:06:47] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) Current config isn't valid for HAProxy 2.6.6: `vgutierrez@deployment-cache-text07:~$ sudo -i haproxy -f /etc/haproxy/haproxy.cfg -f /etc/haproxy/conf.d -c [NOTICE] (1505... [11:07:58] hi -- I have two quick (unrelated) reviews for your eyes: https://gerrit.wikimedia.org/r/c/operations/dns/+/852132 https://gerrit.wikimedia.org/r/c/operations/puppet/+/852130 [11:27:39] godog: could you trigger a pcc run for 852130? [11:28:43] totally, doing so now vgutierrez [11:31:11] 10HTTPS, 10Traffic, 10SRE, 10serviceops, and 2 others: Get new edge & internal HTTPS certificates expanded to add wikifunctions.org and *.wikifunctions.org - https://phabricator.wikimedia.org/T313227 (10Vgutierrez) DCs using the Let's Encrypt cert have the wikifunctions.org SNI available already: `vgutierr... [11:31:43] {{done}} [11:31:56] i.e. https://puppet-compiler.wmflabs.org/pcc-worker1002/37909/gerrit1001.wikimedia.org/index.html [11:32:47] going to lunch, will take a look/merge later [12:15:52] 10Domains, 10SRE: wikibase.org should redirect to wikiba.se - https://phabricator.wikimedia.org/T254957 (10jbond) 05Open→03Resolved a:03jbond This task dosen't seem actionable and there have been no updates for some time, as such im going to close this but please feel free to update if there is any updat... [12:57:03] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10Epic, 10cloud-services-team (Kanban): CloudVPS: network architecture - https://phabricator.wikimedia.org/T209460 (10jbond) [13:05:45] 10Traffic, 10Data Pipelines, 10Data-Engineering-Planning, 10Foundational Technology Requests, 10User-fgiunchedi: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10jbond) [14:14:28] 10Traffic, 10SRE, 10Patch-For-Review: Clean up after ATS 9.x upgrade - https://phabricator.wikimedia.org/T321776 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez ` vgutierrez@cumin1001:~$ sudo -i cumin 'A:cp' 'apt-cache policy trafficserver' 95 hosts will be targeted: cp[2027-2042].codfw.wmnet,cp[6001-6... [14:14:36] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Package and deploy ATS 9.1.3 - https://phabricator.wikimedia.org/T309651 (10Vgutierrez) [14:22:44] 10Traffic, 10SRE, 10Patch-For-Review: Enterprise redirect for wikimediaenterprise.com to enterprise.wikimedia.com - https://phabricator.wikimedia.org/T321804 (10Vgutierrez) 05Open→03Stalled p:05Triage→03Medium new certs have been issued for ncredir to handle wikimediaenterprise.com traffic, those wil... [14:44:01] 10Traffic, 10Observability-Alerting, 10Patch-For-Review: Drop the VarnishTrafficDrop and HAProxyEdgeTrafficDrop alerts - https://phabricator.wikimedia.org/T322220 (10ssingh) [15:01:48] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) [15:28:29] 10netops, 10Data-Services, 10Discovery-Search, 10Infrastructure-Foundations, and 4 others: Do not rate limit dumps from internal network - https://phabricator.wikimedia.org/T222349 (10jbond) [16:51:11] 10netops, 10Analytics-Radar, 10Ganeti, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10jbond) [17:30:57] (HAProxyEdgeTrafficDrop) firing: 57% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [17:35:57] (HAProxyEdgeTrafficDrop) resolved: 60% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [18:32:27] 10netops, 10Analytics-Radar, 10Ganeti, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) Removing the Ganeti tag, this is unrelated to Ganeti and only caused by ifupdown (and will eventually be solved by s... [18:32:36] 10netops, 10Analytics-Radar, 10Infrastructure-Foundations: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) [18:57:57] hey traffic. I am stumbling across some lines in the DNS repo with comments like "CNAME here due to varnish puppetization woes", like they need to exist for technical reasons. I wonder if that is still true. Because there is a CNAME to a host I want to decom. https://gerrit.wikimedia.org/r/c/operations/dns/+/852272/1/templates/wmnet#208 [18:58:42] the "host name number on the left side" seems bad ;) [18:59:06] since I grep for "phab2001" now and it's not just the target [19:37:10] mutante: yeah I did some history digging on that [19:37:31] I think we've moved on / refactored / whatever since back when we required that separate hostname to support phab's websockets [19:37:47] I don't see any need for it now (for phab[12]001-aphlict), that I can find [19:38:11] the main phabricator.wm.o hostname itself has websocket support, and I think before we had a need to differentiate it [19:40:00] still double-checking a few things [19:40:58] yeah the websockets part splits at the ats-be level currently, with: [19:41:00] - type: map [19:41:00] target: ws://phabricator.wikimedia.org [19:41:00] replacement: wss://aphlict.discovery.wmnet [19:41:41] templates/wmnet:aphlict 300 IN CNAME aphlict1001.eqiad.wmnet [19:42:26] and aphlict1001 is a ganeti VM [19:43:05] mutante: so yeah, delete it IMHO. There's no apparent ref to this internal hostname in puppet or elsewhere in DNS anyways. [19:43:33] I assume the -vcs variants can go too, since I think you recently decommed the phab-git stuff? [19:46:44] bblack: thank you very much for the research. yes, this is related to removing phab-git stuff. now it's about removing phab2001 the physical host [19:47:18] as you know it's been removed from LVS and conftool-data but there is still the "vcs" IP on the interface there, on eth0. after I had removed the LVS IP from lo before [19:47:43] and yes, aphlict is different but it only exists in eqiad so far [19:47:50] I think I need to create aphlict2001 [19:48:25] I will amend to my patch to remove that instead of renaming it [19:48:42] also about git-ssh, I still see https://config-master.wikimedia.org/pybal/codfw/git-ssh but I don't know if that ever goes away [19:48:57] conftool already forgot about it.. [19:50:18] the websockets part makes sense.. it's one of the few things using wss:// [19:50:32] going for lunch, thanks [19:52:11] mutante: yeah I suspect nothing automatically cleans up those "pybal" file outputs, but I'm not 100% sure [19:53:37] bblack: some time I'll ask joe about that. here is the DNS change that would now remove phab2001-aphlict [20:01:10] mutante: +1'd [21:42:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Decommission eqiad cage WiFi - https://phabricator.wikimedia.org/T320962 (10wiki_willy) a:03Jclark-ctr