[08:55:32] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11547901 (10ayounsi) Probably the same root cause as {T412143}. We might need to temporarily ignore those log messages until JTAC provides us with a permanent fix. [10:04:35] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548046 (10Gehel) [10:53:56] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548385 (10Marostegui) Would it be possible to truncate it for now? [11:08:57] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548408 (10ayounsi) >>! In T415270#11548385, @Marostegui wrote: > Would it be possible to truncate it for now? Yep [11:11:12] 10netops, 06DBA, 06Infrastructure-Foundations, 10observability: librenms.syslog table is 800GB - https://phabricator.wikimedia.org/T415270#11548413 (10Marostegui) 05Open→03Resolved a:03Marostegui Thanks ` cumin2024@db1213.eqiad.wmnet[librenms]> truncate table syslog; Query OK, 0 rows affected (0.... [11:35:46] 10netops, 06Traffic, 06Infrastructure-Foundations: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11548459 (10cmooney) p:05Medium→03Low Moving this to low priority as the issue appears to be resolved. However I'm keeping it open as I want to follo... [12:30:10] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548597 (10cmooney) So looking at dse-k8s-worker1013 it has now been up for 1 day 18 hours, yet we st... [13:08:26] 06Traffic, 10Incident Tooling: Proof of Concept: SquareOne CDN Dashboards - https://phabricator.wikimedia.org/T414665#11548717 (10MLechvien-WMF) Removing serviceops tag until the parent story gets scoped and we decide who is collaborating on it (to be discussed over coming weeks) [13:19:24] 06Traffic, 07Essential-Work, 13Patch-For-Review, 06Test Kitchen (Test Kitchen (Experiment Platform Sprint 18)): Test the impact of incremental increase in traffic for cache splitting experiments - https://phabricator.wikimedia.org/T407570#11548741 (10Sfaci) [13:20:28] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.23 - 2026.02.13), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11548743 (10JAllemandou) It seems that the `dse-k8s-worker1019` still has the problem: {F71597128} [15:23:12] 06Traffic: Varnish doesn't return Retry-After for some policies - https://phabricator.wikimedia.org/T415375 (10Fabfur) 03NEW [15:28:58] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11549215 (10elukey) Replying to my own question - in `helmfile.d/dse-k8s-services/mediawiki-dumps-legacy/values-dumps.yaml` I see... [15:55:43] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp5022:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [15:55:45] FIRING: HaproxyKafkaExporterDown: HaproxyKafka on cp5022 is down - https://wikitech.wikimedia.org/wiki/HAProxyKafka#HaproxyKafkaExporterDown - https://grafana.wikimedia.org/d/d3e4e37c-c1d9-47af-9aad-a08dae2b3fd5/haproxykafka?orgId=1&var-site=eqsin&var-instance=cp5022 - https://alerts.wikimedia.org/?q=alertname%3DHaproxyKafkaExporterDown [16:04:04] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11549362 (10Vgutierrez) [16:05:36] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11549368 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=85b11191-0733-4a6c-a314-a87c77eb102d) set by vgutierrez@cumin1003 for 10 days, 0:00:00 on 1 host(s) and their services with... [16:28:13] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11549433 (10RobH) While I've contacted Jin to do this work (T415090) I'm hesitant to do so during the week of the SRE offsite. While I am attending remotely, the shift I'll have to make to attend in... [17:40:47] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11549678 (10Quiddity) [18:43:00] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.10 Standard Thumbnail Sizes Only - https://phabricator.wikimedia.org/T414805#11549814 (10Quiddity) [21:28:38] 06Traffic, 10MediaWiki-Debug-Logger, 06SRE, 06MediaWiki-Platform-Team (Q3 Kanban Board): Pass through information about the client from the CDN to MediaWiki to Logstash - https://phabricator.wikimedia.org/T412396#11550209 (10Tgr) Added to the [[https://logstash.wikimedia.org/app/dashboards#/view/3e1d0bd0-1... [21:39:46] 06Traffic, 10MediaWiki-Debug-Logger, 06SRE, 06MediaWiki-Platform-Team (Q3 Kanban Board): Pass through information about the client from the CDN to MediaWiki to Logstash - https://phabricator.wikimedia.org/T412396#11550221 (10Tgr) 05Open→03Resolved [21:57:31] 06Traffic, 10MediaWiki-Core-AuthManager, 06MediaWiki-Platform-Team, 05FY2025-26 KR 5.1: Decide how to expose session information outside of MediaWiki - https://phabricator.wikimedia.org/T394012#11550266 (10Tgr)