[06:45:55] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Joe) >>! In T300977#7899855, @jbond wrote: >>>! In T300977#7836272, @Volans wrote: >> If I may add my use case too, I w... [07:25:47] new "carrier" object in PeeringDB - https://docs.peeringdb.com/howto/get-started-carrier/ [08:10:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) [08:13:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) Thanks to o11y help, the dashboard is now much more usable. Most of the traffic dropped in iptables are RST packets, so it's now more than sporadic, see... [08:20:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) [08:36:35] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10MoritzMuehlenhoff) [08:38:49] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10MoritzMuehlenhoff) [09:04:02] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [09:04:42] 10netops, 10DBA, 10Data-Persistence, 10Infrastructure-Foundations, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [09:52:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10fgiunchedi) [09:53:38] XioNoX: That carrier thing is interesting, wonder will it take off [09:54:05] Be cool if you could just look up a facility and find out all the carriers and services available there on peeringdb [09:55:01] yeah [09:55:24] it would not make the distinction between mrs1 and 2 though [10:03:52] yeah. [10:13:31] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10cmooney) One observation from the dashboard is that the RST's aren't very "sporadic" (as per title of this task). They seem fairly evenly distributed over time a... [10:57:07] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) @Joe thanks for the input >>! In T300977#8596499, @Joe wrote: > > This would break a lot of workflows, I t wo... [11:00:28] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Enable OIDC in CAS - https://phabricator.wikimedia.org/T311999 (10jbond) sgmt just ping if/when you need more pointers [11:14:23] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10ayounsi) > I would maintain that it's more urgent to provide an artifact repository for having local npm/pypi/go packag... [11:46:56] 10Puppet, 10Infrastructure-Foundations: systemd-timer puppet code triggers an execution when applying a schedule change - https://phabricator.wikimedia.org/T329158 (10jcrespo) [11:52:12] 10Puppet, 10Infrastructure-Foundations: systemd-timer puppet code triggers an execution when applying a schedule change - https://phabricator.wikimedia.org/T329158 (10jcrespo) [12:50:02] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10taavi) [12:50:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10taavi) >>! In T133389#2230609, @BBlack wrote: > About constraints, rationales, and paths forward (some of thi... [13:38:20] 10CAS-SSO, 10Data-Catalog, 10Data-Engineering, 10Infrastructure-Foundations, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10EChetty) [13:39:10] 10CAS-SSO, 10Data-Catalog, 10Data-Engineering, 10Infrastructure-Foundations, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10EChetty) a:05BTullis→03Stevemunene [13:44:39] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10MoritzMuehlenhoff) >>! In T159412#8595266, @Dzahn wrote: > As far as I can tell nowadays there is no more node... [13:58:26] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Improve Homer output when Juniper device rejects config - https://phabricator.wikimedia.org/T328747 (10Volans) [14:30:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10jbond) >>! In T238823#8596949, @cmooney wrote: > > I've seen this half-duplex close quite often down through the years. Some firewalls do it when they see a FIN... [14:36:03] topranks, XioNoX: we got a BGP RPKI alert email for a couple of prefixes... are you working on those? expected? [14:38:34] lookin [14:38:35] g [14:40:17] might be a false positive [14:40:22] https://rpki.cloudflare.com/?view=bgp&prefix=2620%3A%3A863%3A0%3A0%3A0%3A0%3A0%2F48 says expires in 2 years [14:41:36] from PackerVis: "Possible TA malfunction: 100.00% of the ROAs disappeared from ARIN." [14:41:40] the link in the email says 2023-02-10 09:00 UTC [14:41:55] ah [14:42:08] or a problem outside our control [14:42:53] if you select "ARIN" in https://grafana-rw.wikimedia.org/d/UwUa77GZk/rpki?orgId=1&from=now-30m&to=now&viewPanel=29 [14:44:40] ~2k ROAs disappeared? [15:04:52] it seems to fluctuate a bit, but that's bigger than normal [15:05:01] there was an odd spike on Feb 1st and then a big drop too [15:47:25] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) p:05Triage→03Low [15:47:58] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) [15:49:30] XioNoX, jbond: I noticed BGP alerter seems to be running out of memory every 10 mins [15:49:33] https://phabricator.wikimedia.org/T329190 [15:50:36] seems like a simple case that the VM it's on needs to have more RAM [15:51:56] indeed! good find [15:54:37] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) [15:54:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10cmooney) [15:54:54] topranks: +1 sgtm [16:05:51] Seems the best way to proceed is just to adjust the RAM allocation directly using ganeti cli? [16:06:12] All hosts in that cluster have 30-50GB free RAM so I'd suggest we go to 4GB [16:06:36] topranks: https://wikitech.wikimedia.org/wiki/Ganeti#Increase/Decrease_CPU/RAM [16:06:59] you have to be on the master of the cluster [16:07:02] volans: yep, thanks [16:14:22] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b43e2a20-f4d1-41c3-84c0-7923683997b4) set by cmooney@cumin1001 for 0:20:00 on 1 host(s) and their services with reas... [16:38:36] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10Dzahn) Ah yeah, I remember those now. Those that are including the "role::mediawiki::common" I gave up on that... [16:47:22] 10CAS-SSO, 10Data-Catalog, 10Data-Engineering, 10Infrastructure-Foundations, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Switch DataHub authentication to OIDC - https://phabricator.wikimedia.org/T305874 (10BTullis) [17:03:30] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) I upgraded rpki1001 to 4GB RAM. Things looking stable now, service hasn't crashed. Used mem has settled down to about ~1.8GB. I'll take a look at rpki2002 shortly. {F36... [17:14:10] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=31176d14-7d44-4799-8369-4293e8a58f51) set by cmooney@cumin1001 for 0:15:00 on 1 host(s) and their services with reas... [17:44:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10cmooney) [17:45:06] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) 05Open→03Resolved a:03cmooney Change made on rpki2002 also and it seems happy. Closing task. [17:45:23] 10SRE-tools, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10aborrero) a:03Volans hey @Volans I made the required changes on our side. Please verify if Netbox is happier now and close the ticket if so... [17:45:40] 10SRE-tools, 10Infrastructure-Foundations, 10SRE: WMCS VIPs: Netbox netmask inconsistencies - https://phabricator.wikimedia.org/T295774 (10aborrero) [18:30:18] 10SRE-tools, 10Discovery-Search, 10Elasticsearch, 10Infrastructure-Foundations, 10Spicerack: elasticsearch spicerack module failes with most recent elastic-curator - https://phabricator.wikimedia.org/T328775 (10bking) Thanks @jbond ! Looking at the Spicerack changelog, I also see that [[ https://gerrit.... [18:30:52] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Technical-Debt: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412 (10Dzahn) @Muehlenhoff Here was my attempt to fix the "mediawiki::common" ones: https://gerrit.wikimedia.org/r/c...