[00:59:42] 06Traffic, 10DNS, 06SRE: benefactors.wikimedia.org should point somewhere better then the wikimedia.org homepage - https://phabricator.wikimedia.org/T367012 (10Pppery) 03NEW [01:24:45] 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013 (10Pppery) 03NEW [01:25:02] 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9874024 (10Pppery) p:05Triage→03Low [01:31:20] 06Traffic, 10DNS, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9874025 (10Pppery) [01:32:04] 06Traffic, 10DNS, 06SRE: Remove iegreview.wikimedia.org from DNS - https://phabricator.wikimedia.org/T367011#9874028 (10Pppery) In for a penny, in for a pound - I tested every wikimedia.org subdomain and filed T367012 and T367013 [01:45:40] 10Wikimedia-Apache-configuration, 10Internet-Archive: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014 (10Pppery) 03NEW [01:46:37] 10Wikimedia-Apache-configuration: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9874046 (10Pppery) [03:32:37] 10Wikimedia-Apache-configuration, 10MediaWiki-Documentation, 06serviceops, 07Documentation, 13Patch-Needs-Improvement: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950#9874068 (10Pppery) [03:38:48] 10Wikimedia-Apache-configuration, 07TestMe: wikipedia.org violates RFC2616: it breaks connections while having 'Connection: keep-alive' set - https://phabricator.wikimedia.org/T85191#9874072 (10Pppery) [03:39:36] 10Wikimedia-Apache-configuration, 10MediaWiki-Documentation, 06serviceops, 07Documentation, 13Patch-Needs-Improvement: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950#9874070 (10Pppery) I think more went inactive generally than gave up on this sp... [03:40:33] 10Wikimedia-Apache-configuration, 07Documentation: https://www.wikimedia.org/api/ links don't work - https://phabricator.wikimedia.org/T203155#9874074 (10Pppery) 05Open→03Declined https://www.wikimedia.org/api/ is now a 404, which makes sense. [07:10:15] 06Traffic, 06Content-Transform-Team, 06MW-Interfaces-Team, 10RESTBase Sunsetting: Remove long term caching and active purging for Parsoid endpoints in RESTBase - https://phabricator.wikimedia.org/T365630#9874286 (10Joe) Relaying what I said in a meeting: I think given the caching numbers I think it makes s... [08:49:19] 06Traffic, 06Content-Transform-Team, 06MW-Interfaces-Team, 10RESTBase Sunsetting: Remove long term caching and active purging for Parsoid endpoints in RESTBase - https://phabricator.wikimedia.org/T365630#9874480 (10daniel) PR on Github: https://github.com/wikimedia/restbase/pull/1345 [09:59:12] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704#9874622 (10Volans) I've took a look today and trying to manually run all the tests there isn't anyone that takes so long to trigger the 300s timeout,... [11:29:43] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Netbox network report failing - timeout errors - https://phabricator.wikimedia.org/T321704#9874878 (10cmooney) >>! In T321704#9874622, @Volans wrote: > I've took a look today and trying to manually run all the tests there isn't anyone that... [11:50:01] FIRING: PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [11:55:01] FIRING: [10x] PurgedHighEventLag: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:00:01] FIRING: [14x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:05:01] FIRING: [16x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:33:29] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9875143 (10cmooney) [12:44:57] 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875168 (10Nemoralis) [12:50:01] FIRING: [18x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:55:01] FIRING: [16x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:58:15] hey, I'd like to deploy an LVS config change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/941459), but it's been a while since I last did one. the workflow these days is to just merge the change, run puppet on the affected lvs hosts and then run the restart-pybal cookbook, right? [13:04:47] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875251 (10Aklapper) [13:04:47] 10Wikimedia-Apache-configuration, 06serviceops: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9875252 (10Aklapper) [13:05:24] taavi: yes, that should be it. we are here if required. there are aliases you can use such A:lvs-low-traffic-eqiad to get the low traffic hosts in eqiad and A:lvs-secondary-eqiad to get the backup in eqiad, etc [13:05:46] 06Traffic, 13Patch-For-Review: Use IPIP encapsulation on lvs<-->text cluster - https://phabricator.wikimedia.org/T366466#9875257 (10Vgutierrez) [13:07:28] moritzm: re https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724, do you know if is it possible to set outerface to several ifaces in 1 rule or should I create 1 rule per outerface? [13:08:27] moritzm: also.. if I provider a list of IPv4 endpoints using ferm's @ipfilter function to a ferm rule with domain set to ip6 is it gonna fail or it would be an effective NOOP? [13:08:28] vgutierrez: the normal () syntax should work fine, `outerface (iface1 iface2 iface3)` etc [13:09:09] taavi: oh.. I failed to see that on the manpage [13:14:14] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875279 (10akosiaris) I am not sure what this task asks to be honest. Care to add a bit more information as to what the problem is? [13:17:56] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875292 (10Aklapper) https://2030.wikimedia.org currently redirects to https://meta.wikimedia.org/wiki/Wikimedia_2030 but should redirect to https://meta.wikimedia.org/wiki/Moveme... [13:18:07] so something like " outerface (ens13 lo) saddr @ipfilter((208.80.154.232 2620:0:861:ed1a::9])) proto tcp sport (443 80) tcp-flags (SYN) SYN TCPMSS set-mss 1440;" should work [13:18:14] vgutierrez: for the second part, I'm pretty sure it would fail, but best to try on an sretest host for a more authoritative answer [13:19:36] sukhe: the cookbook didn't seem to like me today: https://phabricator.wikimedia.org/P64533 [13:20:05] oh wow [13:20:09] 06Traffic, 06SRE, 10SRE-swift-storage: Rise in ms-fe2* TCP retransmits since 11:40 UTC today - https://phabricator.wikimedia.org/T367056 (10MatthewVernon) 03NEW [13:20:53] taavi: are you running the cookbook from cumin1002? [13:20:58] yes [13:21:09] fetching the metrics directly works, so at least that's that [13:21:31] taavi: what was the cookbook command? [13:21:35] yeah.. 0.839s [13:21:42] taavi@cumin1002 ~ $ sudo cookbook sre.loadbalancer.restart-pybal --query "A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad" --reason "renaming labweb conftool pools to cloudweb" [13:22:21] hmmm [13:22:23] Active: active (running) since Mon 2024-06-10 13:18:22 UTC; 3min 31s ago [13:22:26] that is correct indeed [13:22:41] pybal got restarted though [13:22:43] ValueError: hosts already recorded successful: lvs1019.eqiad.wmnet [13:22:47] i wonder if the pybal web server took a bit longer to come up than what the cookbook expected [13:23:43] I wouldn't be surprised... pybal start up time on low-traffic LVS is quite high compared to high-traffic ones [13:24:18] we can try increasing this https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/loadbalancer/restart-pybal.py#L62 [13:24:44] incrementing tries sounds good to me [13:25:01] FIRING: [17x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:25:17] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875314 (10akosiaris) And I am still not sure. I assume that the double redirect is considered a problem? If yes, why? Alternatively, is there some intent to remove the redire... [13:25:32] I will patch and increase timeout [13:30:01] FIRING: [20x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:35:01] RESOLVED: [29x] PurgedHighEventLag: High event process lag with purged on cp5017:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:43:57] sukhe: I'll retry the cookbook now? [13:44:08] taavi: yes, please try [13:47:23] the cookbook passed this time [13:47:27] thank you! [13:47:53] nice, thanks! [14:25:37] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f7-eqiad - https://phabricator.wikimedia.org/T365984#9875546 (10cmooney) p:05Triage→03Medium [14:59:50] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE, 10SRE-swift-storage: Rise in ms-fe2* TCP retransmits since 11:40 UTC today - https://phabricator.wikimedia.org/T367056#9875713 (10MatthewVernon) [15:13:46] 10netops, 06Infrastructure-Foundations, 06SRE: Juniper QFX5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801#9875780 (10cmooney) 05Open→03Resolved We seem to have no such errors being logged any more, either from these switches or the d... [15:34:06] 06Traffic, 06collaboration-services, 06Release-Engineering-Team, 06SRE, 13Patch-For-Review: Move GitLab behind the CDN - https://phabricator.wikimedia.org/T366882#9875862 (10LSobanski) p:05Triage→03High [16:01:33] moritzm: hmm sretest hosts doesn't have ferm installed at the moment, I guess I should puppetize that or just disable puppet and hack it? [16:02:39] ah yes, they are on nftables actually [16:02:44] if needed hack + reimage I think works for us [16:02:49] maybe use pybal-test2003 instead? [16:11:55] pybal hosts shouldn't have firewall at all :) [16:12:49] I got some WCMS hosts with ferm in place [16:12:56] *WMCS [16:13:01] should that should work [16:13:08] arg.. *so that should work [16:18:20] vgutierrez: should we consider removing the extra links on the high-traffic LVS once the IPIP migration is done? [16:20:28] not yet... I just noticed that high-traffic2 has some stuff in there at least in eqiad, some ldap and cloudelastic services :_) [16:21:03] ok cool, yeah I guess no major pressure to do that, thought just hit me with your update :) [16:21:19] happy to migrate cloudelastic to IPIP as a test case, it is not considered a production service [16:27:21] moritzm: this works as expected, NOOP for IPv6 and expected rules for IPv4 https://www.irccloud.com/pastebin/GmBnR8kh/ [16:28:54] iptables && ip6tables -L FORWARD output https://www.irccloud.com/pastebin/jiNd8mRJ/ [16:28:57] nice [16:34:37] inflatador: I'll ping you as soon as I battle test ferm based MSS clamping on ncredir [17:34:43] vgutierrez ACK, thanks for the update [18:01:11] 06Traffic, 06DC-Ops, 10ops-ulsfo, 06SRE: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9876800 (10BCornwall) [18:29:07] 06Traffic, 06DC-Ops, 10ops-ulsfo, 06SRE: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9876950 (10BCornwall) [18:29:56] 06Traffic, 06DC-Ops, 10ops-ulsfo, 06SRE: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9876951 (10BCornwall) [18:30:30] 06Traffic, 06DC-Ops, 10ops-ulsfo, 06SRE: Q4: install PCIe NVMe SSDs into ulsfo text cp40(3[789]|4[01234] - https://phabricator.wikimedia.org/T364891#9876952 (10BCornwall) [21:13:02] 06Traffic, 10DNS, 06SRE, 13Patch-For-Review: Remove iegreview.wikimedia.org from DNS - https://phabricator.wikimedia.org/T367011#9877497 (10Dzahn) 05Open→03Resolved a:03Dzahn thanks for reporting. removed. Host iegreview.wikimedia.org not found: 3(NXDOMAIN) [21:17:01] 10Wikimedia-Apache-configuration, 06serviceops: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9877520 (10Dzahn) It would make sense to me to link to a specific version rather than a list of snapshots. But I disagree that it should have a relation to www.sep11memorie... [21:26:25] someone asked what "https://cache.wikimedia.org" does since it appears to just link to the default domain page. I used git blame and went back like 10 changes until I reached an edit by "root" in 2012 that isn't in Gerrit :) [21:27:23] so naturally I wonder.. will we break everything by deleting that?:) [21:27:58] it's a CNAME for dyna.wikimedia.org, before it was for text-lb and so on [21:32:05] don't we still have svn history somewhere? [21:34:34] I assume it wouldn't break anything, but it would be cool just to know [21:35:01] my guess would be, it was a test hostname from when someone was first implementing either varnish or squid as a caching revproxy in front of MediaWiki [21:36:04] 06Traffic, 10DNS, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9877639 (10Dzahn) cache.wikimedia.org goes so far back in history that I reached 2012 when using git blame and the change before that was made by root and isn't in gerrit anymore. langcom.wikimedia.org - same... [21:38:11] https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/ [21:39:30] 06Traffic, 10DNS, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9877666 (10taavi) >>! In T367012#9877639, @Dzahn wrote: > langcom.wikimedia.org - same. It was already there in an initial import in 2011. Apparently there once was a `langcomwiki` which was [[ https://gerrit.... [21:41:52] 06Traffic, 10DNS, 06SRE: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9877668 (10Dzahn) pk.wikimedia.org was added in 2013 in https://gerrit.wikimedia.org/r/c/operations/dns/+/86650 to add a redirect but in 2023 the redirect was removed in https://gerrit.wikimedia.org/r/c/operati... [21:42:37] https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/debs/wikimedia-task-dns-auth/ is kinda fascinating. our dns zone update stuff from back in powerdns days [21:42:41] but I didn't find actual zonefiles yet [21:43:10] bblack: I got stuck at the (seemingly broken) redirect from svn.wikimedia.org into diffusion, but there it is :) yay [21:44:50] bblack: I think we did DNS changes as root on zwinger.wikipedia.org :) [21:44:53] https://wikitech.wikimedia.org/w/index.php?title=DNS&oldid=13066 [21:45:28] once you get to bomis.com you have found the bottom of the rabbit hole, lol [21:46:06] https://wikitech.wikimedia.org/w/index.php?title=DNS&oldid=13071 :) [21:47:03] "Edit one of these files on zwinger, not forgetting to increment the SOA, and then restart with /etc/init.d/named restart. " [21:50:06] no SVN mentioned here: https://wikitech.wikimedia.org/w/index.php?title=DNS&oldid=13089#Changing_records_in_a_zonefile fun :) [21:52:42] https://wikitech.wikimedia.org/w/index.php?title=DNS&diff=prev&oldid=13131 [21:52:56] ^ this seems to be the first history on that page where "svn" is mentioned at all [21:53:05] so somewhere around there, it was finally in version control somrewhere [21:53:35] so, it wasn't veyr long from "zonefiles are in svn somewhere at all" to "migrate to git" [21:56:06] I remember Ben Hartshorne. [21:56:18] probably he was tasked with moving DNS templates to SVN [21:56:38] https://wikitech.wikimedia.org/w/index.php?title=DNS&diff=prev&oldid=83991 [21:56:49] ^ somewhere around here, a few rob changes about svn->git [21:57:38] and then Gerrit appeared in 2013:) yea [22:04:45] https://rt.wikimedia.org/Ticket/Display.html?id=2689 - #2689: migrate pdns zone files from svn to git/gerrit [22:05:02] I was hoping for one about moving it to svn :) [22:06:50] "Last week, turning off the recursive DNS server on dobson, made the entire site crash." [22:33:27] 10Wikimedia-Apache-configuration, 06serviceops: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9877807 (10Pppery) From ~2007 to August 2015 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/225043) sep11.wikipedia.org was a redirect to sep11memories.org. That's wh... [22:58:08] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9877831 (10Pppery) It's not a big problem, (hence why I triaged this as low priority) and there are no plans to do anything with the redirect. But it would still be nice to keep t... [23:14:26] 10Wikimedia-Apache-configuration, 06serviceops: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9877857 (10Dzahn) We have changed this a couple times now from 2017... T158981 to 2020... -> T202498 -> to 2030 ... T264797 the Wikimedia_2030 on meta was the last one requeste...