[01:44:09] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Pppery) [02:27:39] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (101234qwer1234qwer4) [02:28:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [02:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.887% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:39:15] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) db1154 s4 broke, which I was sort of expecting. I think at some point they will all breake [04:40:07] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [04:48:51] 10Data-Engineering, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) [04:53:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [05:11:39] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [05:14:43] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [05:20:08] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8887466, @Marostegui wrote: > db1154 s4 broke, which I was sort of expecting. I think at some point they will all breake I... [05:24:52] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8887064, @TheresNoTime wrote: > Apologies if I'm stating the obvious here, attempting to connect to `meta_p` is currently fa... [05:38:03] 10Data-Engineering: Check home/HDFS leftovers of xihua - https://phabricator.wikimedia.org/T337711 (10MoritzMuehlenhoff) [06:33:40] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [06:33:50] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) s5 is fully recloned [06:37:05] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [06:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.698% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [06:39:02] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [07:27:50] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [08:09:50] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JMeybohm) >>! In T333464#8884003, @Ottomata wrote: > [[ https://grafana-rw.wikimedia.org/d/H-sRgqLVk/flink-kuberne... [09:23:13] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10aborrero) I just did this: ` aborrero@cumin1001:~ $ sudo cumin "P{R:Profile::Mariadb::Section = 's7'} and P{P:wmcs::db::wikireplicas::mariadb_multiins... [09:49:37] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Bdijkstra) >>! In T169452#8886651, @rook wrote: > @IKhitron It would appear that hewiki is on s7 https://noc.wikimedia.org/db.php and that the replication lag on s7 is... [09:56:12] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [09:58:33] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) clouddb1014:3312 is now catching up [09:59:24] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10aborrero) [10:00:29] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Ladsgroup) Thanks to Manuel, now access to meta_p and heartbeat_p should work and replag.toolforge.org is working as intended, let us know if anything els... [10:01:59] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10Ladsgroup) [10:02:04] (03PS1) 10Nmaphophe: mend [analytics/refinery] - 10https://gerrit.wikimedia.org/r/924129 [10:05:56] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10Marostegui) [10:06:14] (03PS1) 10Nmaphophe: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/924130 [10:07:07] (03Abandoned) 10Nmaphophe: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/924130 (owner: 10Nmaphophe) [10:07:36] (03Abandoned) 10Nmaphophe: mend [analytics/refinery] - 10https://gerrit.wikimedia.org/r/924129 (owner: 10Nmaphophe) [10:17:53] 10Data-Engineering, 10Product-Analytics: Check home/HDFS leftovers of xihua - https://phabricator.wikimedia.org/T337711 (10mpopov) Hua was contractor Data Visualization Specialist for Product Analytics. I think everything that should be saved (code) got uploaded to GitHub as part of T326280 so the home dirs wi... [10:24:44] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Iniquity) There is a suspicion that this is related to this task: https://guc.toolforge.org/?by=date&user=178.66.150.82 {F37084024} [10:29:16] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8888007, @Iniquity wrote: > There is a suspicion that this is related to this task: > https://guc.toolforge.org/?by=date&user=1... [10:30:11] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Iniquity) >>! In T337446#8888023, @Marostegui wrote: >>>! In T337446#8888007, @Iniquity wrote: >> There is a suspicion that this is related to this task:... [10:32:25] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10Ladsgroup) More context: https://gerrit.wikimedia.org/r/c/operations/puppet/+/655533 The... [10:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.476% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [10:40:14] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10aborrero) p:05Triage→03High [10:43:55] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [10:44:26] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) clouddb1021:3311 (s1) is fully ready with grants, views etc. Once it has caught up I will clone the other two s1 hosts. [10:46:16] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [10:54:25] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10aborrero) `lang=irc 12:48 ok, s3 needs to be depooled entirely 12:48 (03PS1) 10Nmaphophe: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/924131 [11:09:07] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10rook) >>! In T169452#8887883, @Bdijkstra wrote: > Currently I don't see any wikis on any db when I do `show tables`. A link to that NOC page would be useful on pages wh... [12:16:05] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) > Apart from the managed flink clusters in staging-eqiad being empty I agree Ah, the value was 0 (?) so... [12:19:19] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [12:26:39] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (101234qwer1234qwer4) [12:26:43] (SystemdUnitFailed) firing: (2) cadvisor.service Failed on druid1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:28:02] PROBLEM - Check systemd state on druid1007 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:28:30] (03CR) 10Ottomata: [C: 03+1] "Shall I merge?" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/924120 (https://phabricator.wikimedia.org/T337317) (owner: 10Jameel Kaisar) [12:30:06] PROBLEM - Check systemd state on an-airflow1003 is CRITICAL: CRITICAL - degraded: The following units failed: cadvisor.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:30:10] (03CR) 10Jameel Kaisar: Add metadata to network/probe schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/924120 (https://phabricator.wikimedia.org/T337317) (owner: 10Jameel Kaisar) [12:31:43] (SystemdUnitFailed) firing: (4) cadvisor.service Failed on an-airflow1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:31:56] (03CR) 10Ottomata: [C: 03+2] Add metadata to network/probe schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/924120 (https://phabricator.wikimedia.org/T337317) (owner: 10Jameel Kaisar) [12:34:17] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team: Investigate if maintain-replica-indexes is still needed - https://phabricator.wikimedia.org/T337734 (10Marostegui) [12:48:24] RECOVERY - Check systemd state on an-airflow1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:51:43] (SystemdUnitFailed) firing: (4) cadvisor.service Failed on an-airflow1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:55:37] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [12:56:43] (SystemdUnitFailed) firing: (4) cadvisor.service Failed on an-airflow1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:57:58] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [13:25:02] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10BBlack) Note: I restored+amended https://gerrit.wikimedia.org/r/c/operations/puppet/+/924342 and merged+deployed it on lvs1018+lvs1020. This seems to wor... [13:28:05] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [13:52:22] RECOVERY - Check systemd state on druid1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:56:43] (SystemdUnitFailed) resolved: (2) cadvisor.service Failed on druid1007:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:06:54] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10aborrero) Note these 2 patches: [14:07:15] 10Data-Engineering, 10Data-Persistence, 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Wiki-replicas: investigate why some maintenance operations can cause unwanted pybal impact - https://phabricator.wikimedia.org/T337721 (10aborrero) * https://gerrit.wikimedia.org/r/c/operations/puppet/+/92... [14:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.254% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:04:24] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 A): Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T333833 (10gmodena) I marked the Google Doc as read-only and moved the draft to https://wikitech.wikimedia.org/wiki/Med... [15:17:57] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [15:18:15] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) s2 is fully recloned [15:42:24] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Bdijkstra) >>! In T169452#8888144, @rook wrote: >>>! In T169452#8887883, @Bdijkstra wrote: >> Currently I don't see any wikis on any db when I do `show tables`. A link... [15:52:15] !log created HDFS folder `/wmf/data/wmf_traffic` (T335305 and T337562) [15:52:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:52:19] T337562: Split wmf database into functional areas - https://phabricator.wikimedia.org/T337562 [15:52:19] T335305: Migrate referrer_daily to Iceberg - https://phabricator.wikimedia.org/T335305 [15:53:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [15:54:40] 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research-Backlog, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10leila) [15:55:14] 10Data-Engineering-Radar, 10Data-release, 10Privacy Engineering, 10Research-Backlog, 10Privacy: ApacheBeam prototype for DP noise addition with pageview privacy units on top of Spark - https://phabricator.wikimedia.org/T282195 (10leila) [16:03:42] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) [16:04:27] 10Data-Engineering: Increase webrequest_sampled_live Druid datasource's retention - https://phabricator.wikimedia.org/T337460 (10JAllemandou) I compared the segments stored in Druid for the `webrequest_sampled_live` and `webrequest_sampled_128` datasources. There are 2 segments per hour for each datasource, the... [16:04:36] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) Deployed in all wikikube clusters. We'll have to re-enable operator egress to Zookeeper when we figure... [16:27:52] 10Data-Engineering-Radar, 10Data-release, 10Privacy Engineering, 10Research-Backlog, 10Privacy: ApacheBeam prototype for DP noise addition with pageview privacy units on top of Spark - https://phabricator.wikimedia.org/T282195 (10Htriedman) 05Open→03Resolved [16:27:58] 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Htriedman) [16:28:09] 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research-Backlog, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Htriedman) 05Open→03Resolved [16:28:13] 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Htriedman) [17:02:33] 10Data-Engineering, 10Data-Catalog, 10Product-Analytics: Propagate field descriptions from event schemas to Hive event tables - https://phabricator.wikimedia.org/T307040 (10Ottomata) [17:03:04] 10Data-Engineering: Automating pulling schemas from eventschema to datahub - https://phabricator.wikimedia.org/T337321 (10Htriedman) 05Open→03Invalid See this task instead: https://phabricator.wikimedia.org/T318863 [17:03:38] 10Data-Engineering-Planning, 10Data-Catalog, 10Event-Platform Value Stream: Event Platform and DataHub Integration - https://phabricator.wikimedia.org/T318863 (10Ottomata) [17:34:21] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Liz) Looks like it's just s1. [17:58:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [18:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 5.032% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [18:42:06] 10Quarry: Superset not exporting csv as utf-8 - https://phabricator.wikimedia.org/T337790 (10rook) [18:42:11] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [18:42:38] 10Quarry: Superset not exporting csv as utf-8 - https://phabricator.wikimedia.org/T337790 (10rook) [18:42:57] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Patch-For-Review: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10achou) I found two problems while testing the following Change-Prop staging config: ` outlink-top... [18:44:35] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10TheresNoTime) Hi, me again :) As far as I can tell, `s7` should currently be working as expected? There's been a couple of reports of tools being unable t... [18:44:43] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [18:45:24] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [18:46:00] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) s3 is fully recloned and it is now catching up (it is 8h behind) [18:51:26] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Ladsgroup) >>! In T337446#8890101, @TheresNoTime wrote: > Hi, me again :) > As far as I can tell, `s7` should currently be working as expected? > There's... [18:53:05] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8890101, @TheresNoTime wrote: > Hi, me again :) > As far as I can tell, `s7` should currently be working as expected? > There's... [18:54:41] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8889698, @Liz wrote: > Looks like it's just s1 to be restored (or whatever the correct term is). Yes only one host for s1 (enw... [18:59:21] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10TheresNoTime) >>! In T337446#8890117, @Marostegui wrote: > Thanks for the report. It was only on clouddb1021 but not on the others (as I did the transfer)... [19:01:07] (03CR) 10Milimetric: query finetuning (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 (owner: 10Nick Ifeajika) [19:12:06] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 A): Improve Event Platform and MediaWiki Event Enrichment wikitech documentation - https://phabricator.wikimedia.org/T329629 (10Ottomata) Today I moved [[ https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Processing/Use_cases | Use... [19:15:42] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Patch-For-Review: Add a new outlink topic stream for EventGate main - https://phabricator.wikimedia.org/T328899 (10Ottomata) Ah, yes, you'll need to filter out canary events. We need better docs on this. I'm [[ https:/... [19:53:15] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10SWinxy) Is there an estimate for when things'll be fully restored? Y'all are great. [20:04:50] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8890288, @SWinxy wrote: > Is there an estimate for when things'll be fully restored? Y'all are great. If nothing happens, tomo... [20:06:48] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [20:07:42] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, and 2 others: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [20:19:09] 10Data-Engineering, 10Product-Analytics, 10Research: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10leila) moving to #research-backlog until more information becomes available. [20:19:17] 10Data-Engineering, 10Product-Analytics: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10leila) [20:19:26] 10Data-Engineering, 10Product-Analytics, 10Research-Backlog: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10leila) [20:52:14] 10Data-Engineering, 10Product-Analytics, 10Research: Use Hive/Spark timestamps in Refined event data - https://phabricator.wikimedia.org/T278467 (10leila) @Ottomata do you need something from Research for this task? (@fkaelin cc) I'm asking as we're reviewing tasks in our backlog for prioritization and I'm n... [22:37:13] (DiskSpace) firing: Disk space an-test-worker1002:9100:/ 4.811% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=an-test-worker1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:44:15] (03PS4) 10TChin: Remove is_registered field from user entity fragment [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/923253 (https://phabricator.wikimedia.org/T337395)