[06:45:57] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10Joe) >>! In T300977#7899855, @jbond wrote: >>>! In T300977#7836272, @Volans wrote: >> If I may add my use case too, I w... [08:10:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) [08:13:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) Thanks to o11y help, the dashboard is now much more usable. Most of the traffic dropped in iptables are RST packets, so it's now more than sporadic, see... [08:20:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10ayounsi) [08:28:55] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [08:29:11] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) We'll depool eqiad I would assume? cc @Joe @akosiaris We'd still need to switchover m1 master (we do have m1 databases but I guess we are not swit... [08:36:37] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10MoritzMuehlenhoff) [08:38:51] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10MoritzMuehlenhoff) [08:41:33] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [09:04:04] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [09:04:44] 10Traffic, 10netops, 10DBA, 10Data-Persistence, and 9 others: codfw row B switches upgrade - https://phabricator.wikimedia.org/T327991 (10Marostegui) [09:32:33] 10Traffic, 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, 10Shared-Data-Infrastructure: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10nfraison) a:05BTullis→03nfraison [09:42:03] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10akosiaris) eqiad will still be depooled for this one. The current timeline for repooling eqiad in on March 8th, 1 day after the proposed timeline on this task. [09:45:46] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [09:52:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10fgiunchedi) [10:02:04] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [10:13:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10cmooney) One observation from the dashboard is that the RST's aren't very "sporadic" (as per title of this task). They seem fairly evenly distributed over time a... [10:21:41] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) a:03Vgutierrez 2.6.6 has been running as expected since the experiment started, next week we plan to upgrade the whole CDN [10:27:00] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 10 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10Marostegui) [10:57:07] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10jbond) @Joe thanks for the input >>! In T300977#8596499, @Joe wrote: > > This would break a lot of workflows, I t wo... [11:14:25] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10Product-Analytics, and 3 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10ayounsi) > I would maintain that it's more urgent to provide an artifact repository for having local npm/pypi/go packag... [12:50:04] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10taavi) [12:50:12] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10taavi) >>! In T133389#2230609, @BBlack wrote: > About constraints, rationales, and paths forward (some of thi... [13:43:28] 10Traffic, 10Data-Engineering-Planning, 10Observability-Alerting, 10SRE, and 2 others: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10EChetty) [13:58:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools: Improve Homer output when Juniper device rejects config - https://phabricator.wikimedia.org/T328747 (10Volans) [14:30:40] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Sporadic RST drops in the ulogd logs - https://phabricator.wikimedia.org/T238823 (10jbond) >>! In T238823#8596949, @cmooney wrote: > > I've seen this half-duplex close quite often down through the years. Some firewalls do it when they see a FIN... [14:45:37] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10MoritzMuehlenhoff) >>! In T321775#8596987, @Vgutierrez wrote: > 2.6.6 has been running as expected since the experiment started, next week we plan to upgrade the whole CDN We should upgrade to 2.6.8, thou... [14:49:08] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) That already happened along the 2.4.21 upgrade [15:05:14] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10MoritzMuehlenhoff) >>! In T321775#8597953, @Vgutierrez wrote: > That already happened along the 2.4.21 upgrade Yes, that's my point. We fixed CVE-2023-0056 with the upgrade to 2.4.21, so moving to 2.6.6 w... [15:25:25] 10Traffic, 10SRE: Upgrade HAProxy on cp nodes to 2.6.x LTS - https://phabricator.wikimedia.org/T321775 (10Vgutierrez) yeah.. I meant that along upgrading the 2.4 hosts to 2.4.21 I also updated the 2.6 ones to 2.6.8 :) [15:47:27] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) p:05Triage→03Low [15:48:00] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) [15:54:39] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) [15:54:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10cmooney) [16:14:24] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b43e2a20-f4d1-41c3-84c0-7923683997b4) set by cmooney@cumin1001 for 0:20:00 on 1 host(s) and their services with reas... [17:03:32] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) I upgraded rpki1001 to 4GB RAM. Things looking stable now, service hasn't crashed. Used mem has settled down to about ~1.8GB. I'll take a look at rpki2002 shortly. {F36... [17:14:11] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=31176d14-7d44-4799-8369-4293e8a58f51) set by cmooney@cumin1001 for 0:15:00 on 1 host(s) and their services with reas... [17:45:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Investigate the potential benefits of BGPalerter - https://phabricator.wikimedia.org/T230600 (10cmooney) [17:45:08] 10netops, 10Infrastructure-Foundations, 10SRE: BGPalerter crashing every 10 mins - https://phabricator.wikimedia.org/T329190 (10cmooney) 05Open→03Resolved a:03cmooney Change made on rpki2002 also and it seems happy. Closing task. [17:47:47] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [18:00:30] 10Traffic, 10DBA, 10Data-Engineering, 10Data-Persistence, and 9 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10BTullis) [18:06:01] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10SHust) @BCornwall would you mind emailing me with more information about what is needed here so I can better understand + contact the right Shopify representative? Thanks in advance!... [18:15:02] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10Dzahn) See this previous comment from Brandon at T128559#3440144 [18:40:30] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10BCornwall) 05Stalled→03In progress [18:40:36] 10HTTPS, 10Traffic, 10SRE, 10Tracking-Neverending: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681 (10BCornwall) [18:49:34] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10BCornwall) @SHust, as @Dzahn pointed out, it would be best if we keep this all in one place. We specifically need the `preload` and `includeSubDomains` attributes added to the `Stric... [20:02:04] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10SHust) I'll do as suggested and keep all communications here, thanks @Dzahn. @BCornwall I emailed the Shopify rep that takes care of our account and will update this thread as soon a... [20:17:14] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10Dzahn) >>! In T128559#3440144, @BBlack wrote: > It seems like Shopify has been making some improvements on this front since we last checked. > .. > The help page doesn't indicate whe... [20:54:23] 10HTTPS, 10SRE, 10Traffic-Icebox: Enable HSTS on store.wikimedia.org for HTTPS - https://phabricator.wikimedia.org/T128559 (10SHust) The additional information did help clear things up, thanks @Dzahn. I'm also glad to see that the test results have improved. I have added a substantial amount of information a... [22:14:32] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog, 10Traffic-Icebox: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10BCornwall) [22:17:37] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10BCornwall) [23:14:44] 10HTTPS, 10Diff-blog, 10SRE, 10Technical Blog: Send HSTS header on all Wordpress VIP-hosted domains - https://phabricator.wikimedia.org/T270034 (10Dzahn) >>! In T270034#6701977, @RLazarus wrote: > Thanks @Varnent for offering to look at this Any plans to still do that?