[02:14:13] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) > Observe cross-DC database connection rate, analyse sources It's not necessary to use tcpdump since we can just look at SSL connection counts... [03:15:58] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) I made this [[https://grafana-rw.wikimedia.org/d/6fLyZKG4k/all-clusters-utilization|all clusters utilization]] dashboard so that I could easily... [03:17:49] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) [03:55:31] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) MySQL cross-DC traffic is higher than expected, with 110 conns/s. Appserver CPU usage is fine. Mcrouter connection rates are fine. [04:24:26] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) [04:29:47] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) I captured cross-DC queries on the s3 master (db1157) using SHOW PROCESSLIST in a loop, once per second for 20 minutes. Out of 10 captured quer... [04:32:26] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) The serverIsReadOnly() cache key includes the DB hostname, so I should have done my calculation per section rather than globally. [04:37:47] 10serviceops, 10Performance-Team, 10SRE, 10SRE-swift-storage, and 3 others: Progressive Multi-DC roll out - https://phabricator.wikimedia.org/T279664 (10tstarling) | section | Cross-DC connection rate (req/s) | |--|--| | es4 | 0.00 | | es5 | 0.00 | | s1 | 9.21 | | s2 | 19.7 | | s3 | 53.2 | | s4 | 7.02 | |... [06:13:49] 10serviceops, 10Infrastructure-Foundations, 10netbox: Netbox and Redis - https://phabricator.wikimedia.org/T311385 (10ayounsi) The git tree is a bit confusing and needs cleanup, but that file in master seems to be on the old 2.10 version. You can see the 3.2.2 version there: https://gerrit.wikimedia.org/r/pl... [08:14:10] 10serviceops, 10Observability-Alerting, 10Prod-Kubernetes, 10Kubernetes: Migrate kubernetes alerts away from icinga - https://phabricator.wikimedia.org/T311251 (10JMeybohm) a:03JMeybohm I will take a look at this now as we need to review/refactor the alerts as part of {T303184} anyways. [11:26:26] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=8227bbf8-d91f-4d60-821f-bb5f06579d65) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 12 host(s) and their services with reason: Downtime pending i... [11:40:17] 10serviceops, 10Infrastructure-Foundations, 10netbox: Netbox and Redis - https://phabricator.wikimedia.org/T311385 (10akosiaris) >>! In T311385#8212732, @ayounsi wrote: > The git tree is a bit confusing and needs cleanup, but that file in master seems to be on the old 2.10 version. > You can see the 3.2.2 ve... [12:03:24] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=adaec891-8daa-470f-9d2f-6c2b62e7f043) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 2 host... [12:06:14] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) Icinga downtime and Alertmanager silence (ID=adaec891-8daa-470f-9d2f-6c2b62e7f043) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 2 host(s) and their services with reason: Downtiming replaced wt... [12:50:23] hello folks, qq about rsyslog and kubernetes [12:50:44] I see that there is an input file rule for /var/log/containers/* on kubernetes nodes [12:51:18] so in theory I should expect those log entries to be in logstash right? Or is there another step to make them visible? [12:57:09] self answered, yes :) [13:56:25] Hi folks; I'm trying to reduce the backlog of untriaged tasks from the Clinic duty dashboard - would you care to pick a priority for T314789 please? [14:05:01] 10serviceops, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Phonos, 10SRE, 10Community-Tech (CommTech-Sprint-32): SRE/Data Persistence consultation — use of FSFileBackend for caching audio files - https://phabricator.wikimedia.org/T314789 (10JMeybohm) p:05Triage→03Medium [14:05:16] Emperor: done, thanks [14:44:10] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: High API server request latencies (LIST) - https://phabricator.wikimedia.org/T303184 (10JMeybohm) So the culprit here is more the way we use/sum up the `apiserver_request_latencies`. The reason for the big spike in avg5m on specific api servers is that one of th... [15:12:57] 10serviceops, 10Parsoid, 10Patch-For-Review, 10Performance-Team (Radar): Parsoid migration to php 7.4 - https://phabricator.wikimedia.org/T312638 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=bdf6f9e0-2c64-471e-8b85-05f874724182) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 3 host... [15:15:57] 10serviceops: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) Icinga downtime and Alertmanager silence (ID=adaec891-8daa-470f-9d2f-6c2b62e7f043) set by cgoubert@cumin1001 for 7 days, 0:00:00 on 2 host(s) and their services with reason: Downtiming replaced wt... [16:42:57] 10serviceops, 10Patch-For-Review: Put parse parse10[01-24] in production - https://phabricator.wikimedia.org/T307219 (10Clement_Goubert) `parse1008` replaced `wtp1041` `parse1009` replaced `wtp1042` `parse1010` replaced `wtp1043` `parse1011` replaced `wtp1044` `parse1012` replaced `wtp1045` 50% of parse traff... [17:22:00] <_joe_> I am leaving puppet disabled on the registry hosts minus registry1003, I'll finish validation tomorrow [17:35:10] ack, thanks, saw it [17:49:44] 10serviceops, 10Observability-Alerting, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Migrate kubernetes alerts away from icinga - https://phabricator.wikimedia.org/T311251 (10JMeybohm) [21:44:52] 10serviceops, 10Phabricator, 10serviceops-collab, 10Patch-For-Review, 10Release-Engineering-Team (Bonus Level 🕹ī¸): move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597 (10Dzahn) [21:45:10] 10serviceops, 10Phabricator, 10serviceops-collab, 10Patch-For-Review, 10Release-Engineering-Team (Bonus Level 🕹ī¸): sort out mysql privileges for phab1004/phab2002 - https://phabricator.wikimedia.org/T315713 (10Dzahn) 05In progress→03Resolved >>! In T315713#8190804, @Marostegui wrote: > Is there a way... [23:08:08] 10serviceops, 10Phabricator, 10serviceops-collab, 10Patch-For-Review, 10Release-Engineering-Team (Bonus Level 🕹ī¸): move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597 (10Dzahn) @thcipriani @LSobanski status update. First the subtask about database privileges has been...