[00:51:43] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:19:58] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Stuartyeates) Do either Quarry or Superset store their data in one of the accessible databases? [i.e. can we write superset queries to evaluate superset or quarry usage?] [04:51:43] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:08:12] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) I have had to kill them, it's been more than 24h waiting to stop. Going to downgrade + rebuild [05:27:06] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [05:27:30] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [05:33:11] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) I am recloning: db1154:3311 db1154:3313 db1154:3315 db1155:3312 clouddb1021:3317 [06:13:42] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 13): Add Python Linter Checks to CI - https://phabricator.wikimedia.org/T318346 (10Antoine_Quhen) a:03Antoine_Quhen [06:37:28] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [06:45:45] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [07:59:06] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [07:59:46] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) clouddb1021 has been recloned, added the grants and the views. I am going to wait for it to finish catching up and reclone the other s7 replicas. [08:08:15] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8885987, @Marostegui wrote: > clouddb1021 (s7) has been recloned, added the grants and the views. I am going to wait for it to finish catchin... [08:08:33] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [08:12:55] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [08:14:50] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [08:50:17] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [08:51:43] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:29:51] 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10ORES: Backfill ORES Hadoop scores with historical data - https://phabricator.wikimedia.org/T209737 (10elukey) 05Open→03Declined We are moving to Lift Wing: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing I am closin... [09:29:57] 10Analytics-Radar, 10Data-Engineering-Icebox, 10Dumps-Generation, 10Machine-Learning-Team, and 6 others: [Epic] Make ORES scores for wikidata available as a dump - https://phabricator.wikimedia.org/T209611 (10elukey) [09:49:25] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [10:10:50] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [10:51:16] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:04:11] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10rook) @Stuartyeates no, neither of these are accessible via either Quarry or Superset. In terms of Quarry there is some discussion here T151158 {F37082314} I've attac... [11:09:18] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:19:55] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:25:47] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:26:27] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:28:47] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [11:55:02] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10IKhitron) Well, as a regular and very pleased user of Quarry, I tried Superset. Not helpful, it doesn't have most of databases. So, Quarry doesn't work any more, and I... [12:26:47] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Framawiki) >>! In T169452#8886544, @IKhitron wrote: > Well, as a regular and very pleased user of Quarry, I tried Superset. Not helpful, it doesn't have most of databas... [12:30:54] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10IKhitron) Maybe, don't know. I can't run queries any more. [12:33:38] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10rook) >>! In T169452#8886544, @IKhitron wrote: > Well, as a regular and very pleased user of Quarry, I tried Superset. Not helpful, it doesn't have most of databases.... [12:39:33] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10IKhitron) I tried to find hewiki_p. Didn't find, but while looking, I'm not even sure I met any wiki_p, besides enwiki and frwiki. Maybe I'm wrong. And maybe they... [12:51:58] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:04:07] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10rook) @IKhitron It would appear that hewiki is on s7 https://noc.wikimedia.org/db.php and that the replication lag on s7 is particularly bad https://replag.toolforge.org/ [13:19:11] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [13:21:46] (03CR) 10Nmaphophe: GDI Equity Landscape Tables (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/895737 (owner: 10Nmaphophe) [13:25:24] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) Grants, views, roles etc added to clouddb1021:3315, I will reclone the other two s5 broken ones from this once it's caught up with its master [13:46:42] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10MusikAnimal) [13:50:07] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10MusikAnimal) Something should be Tech News, as we're not just seeing replag but actual downtime (T337682). Also suggest emailing cloud-announce and wik... [14:00:06] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8886722, @MusikAnimal wrote: > Something should be in Tech News (even if it's after the maintenance is done), as we're not j... [14:18:38] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e14a6c2c-888d-45b4-94a2-edc04252cc36) set by stevemunene@cumin1001 for 7 days, 0:00:0... [14:21:30] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [14:21:41] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) s7 is fully recloned [14:24:13] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (101234qwer1234qwer4) [14:35:56] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [14:36:55] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [14:37:13] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) clouddb1021:s2 is fully ready with views, grants, users etc [14:53:03] 10Data-Engineering, 10Data Pipelines (Sprint 13): Druid Webrequest sampled 128 has missing data data for 1 hour - https://phabricator.wikimedia.org/T337088 (10Volans) 05Open→03Resolved @JAllemandou thanks a lot for fixing it, I'm resolving this as you have the other one open for potential follow ups on the... [15:02:46] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [15:39:25] 10Data-Engineering, 10Structured-Data-Backlog: Instrument {{Delete ...} template adding/removing on Commons and create a historical dataset - https://phabricator.wikimedia.org/T336955 (10Cparle) > [2] https://commons.wikimedia.org/wiki/Template:Delete (is this it? It looks like it was nominated for deletion, l... [15:47:00] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [15:47:10] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) clouddb1021:s3 is fully ready with views, grants, users etc [15:51:45] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10MusikAnimal) [16:21:28] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) [17:06:10] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10IKhitron) I see. Well, I succeeded to open s7 now and run a query. Please don't turn Quarry off until at least there will be a way to download the results in all the fo... [17:22:12] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10TheresNoTime) Apologies if I'm stating the obvious here, attempting to connect to `meta_p` is currently failing with an access denied — is a grant miss... [17:25:18] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Ladsgroup) Hi, meta_p or meta are wrong. The correct db name is metawiki or metawiki_p depending on theusecase (likewise: mediawikiwiki, wikidatawiki,... [17:29:44] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10TheresNoTime) Going by https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Metadata_database; > There is a table with automatically maintained... [18:35:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [19:10:28] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8887070, @TheresNoTime wrote: > Going by https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Metadata_database; > >... [19:52:11] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10TheresNoTime) >>! In T337446#8887132, @Marostegui wrote: > I will need someone from #cloud-services-team to look into that. I only know about the scrip... [19:54:52] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) Yeah the grants are probably there (as I copied them over) but the database isn't as I didn't run any script relates to the meta database.... [19:58:02] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10RhinosF1) https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/files/wmcs/db/wikireplicas/maintain-meta_p.py,... [20:02:22] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10Marostegui) >>! In T337446#8887153, @RhinosF1 wrote: > https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/fil... [20:04:19] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (10RhinosF1) Thanks for the quick response and for all the hard work. Please enjoy some of your Memorial Day weekend. :) [20:24:24] (03PS1) 10Jameel Kaisar: Add metadata to network/probe schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/924120 (https://phabricator.wikimedia.org/T337317) [21:44:27] 10Data-Engineering, 10DBA, 10Data-Services, 10cloud-services-team, 10User-notice: Rebuild sanitarium hosts - https://phabricator.wikimedia.org/T337446 (101234qwer1234qwer4) [22:25:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [22:38:17] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Zabe)