[00:22:30] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refinery-drop-eventlogging-legacy-raw-partitions.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:24:56] PROBLEM - Check unit status of refinery-drop-eventlogging-legacy-raw-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-eventlogging-legacy-raw-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:48:50] PROBLEM - Check unit status of refinery-drop-raw-netflow-event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-raw-netflow-event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:50:12] PROBLEM - Check unit status of drop-features-actor-rollup-hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-features-actor-rollup-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:53:06] PROBLEM - Check unit status of refinery-drop-webrequest-refined-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-webrequest-refined-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:54:10] PROBLEM - Check unit status of drop-features-actor-hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-features-actor-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:56:38] PROBLEM - Check unit status of drop-predictions-actor_label-hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-predictions-actor_label-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:01:56] PROBLEM - Check unit status of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-pageview-actor-hourly-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:26:54] PROBLEM - Check unit status of drop_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:03:24] PROBLEM - Check unit status of drop-anomaly-detection on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-anomaly-detection https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:05:22] PROBLEM - Check unit status of refinery-drop-banner-activity on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-banner-activity https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:50:20] RECOVERY - Check unit status of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 is OK: OK: Status of the systemd unit refinery-drop-pageview-actor-hourly-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:04:10] PROBLEM - Check unit status of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-pageview-actor-hourly-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:21:12] well, good morning an-launcher1002 [07:47:44] (03CR) 10Joal: [V: 03+2] Adapt sqoop to templatelinks schema changes [analytics/refinery] - 10https://gerrit.wikimedia.org/r/821312 (https://phabricator.wikimedia.org/T314666) (owner: 10Milimetric) [07:56:16] (03PS14) 10Joal: Add cassandra loading airflow queries in hql folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [07:56:27] (03CR) 10Joal: Add cassandra loading airflow queries in hql folder (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [07:57:34] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [08:04:04] 10Quarry, 10VPS-Projects, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry but for SPARQL) - https://phabricator.wikimedia.org/T104762 (10valerio.bozzolan) [08:11:34] 10Quarry, 10VPS-Projects, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry but for SPARQL) - https://phabricator.wikimedia.org/T104762 (10valerio.bozzolan) The killer feature of this tool would be: less timeouts. Lot of users have very... [08:26:05] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JAllemandou) Adding idea discussed with @Ottomata earlier on. It's probably interesting to separate str... [08:30:36] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00): Use RowTypeInfo to ensure better validation of the event data within the Mediawiki Stream Enrichment pipeline - https://phabricator.wikimedia.org/T316555 (10gmodena) [09:00:36] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Spike: [SPIKE] Decide on technical solution for page state stream backfill process - https://phabricator.wikimedia.org/T314389 (10gmodena) > For now, I'll start experimenting with Spark + Iceberg unless there's some major objection. I also read... [09:07:08] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Spike: [SPIKE] Assess what is required for the enrichment pipeline to run on k8 - https://phabricator.wikimedia.org/T315428 (10gmodena) For reference, some resources on how Google and Spotify are operating Flink on k8: - https://github.com/Goog... [09:09:06] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Spike: [SPIKE] Assess what is required for the enrichment pipeline to run on k8 - https://phabricator.wikimedia.org/T315428 (10gmodena) [09:13:12] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Spike: [SPIKE] Assess what is required for the enrichment pipeline to run on k8 - https://phabricator.wikimedia.org/T315428 (10gmodena) [09:24:48] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10codebug) [09:25:21] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10codebug) [10:03:24] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00): [Shared Event Platform] Mediawiki Stream Enrichment should consume the consolidated page-change stream. - https://phabricator.wikimedia.org/T311084 (10gmodena) [10:04:09] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00): [Shared Event Platform] Mediawiki Stream Enrichment should consume the consolidated page-change stream. - https://phabricator.wikimedia.org/T311084 (10gmodena) [10:25:41] (03CR) 10Vivian Rook: [C: 03+2] strip invalid utf-8 chars for xlsx [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/827538 (https://phabricator.wikimedia.org/T314706) (owner: 10Vivian Rook) [10:30:09] (03Merged) 10jenkins-bot: strip invalid utf-8 chars for xlsx [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/827538 (https://phabricator.wikimedia.org/T314706) (owner: 10Vivian Rook) [10:30:13] 10Quarry, 10Patch-For-Review: "Download data -> Excel XLSX" corrupted: Cut off after invalid character - https://phabricator.wikimedia.org/T314706 (10rook) I'm not sure. It may be less that the characters are invalid, and more that they are invalid for xlsxwriter/xlsx https://github.com/jmcnamara/XlsxWriter/i... [10:56:07] 10Quarry, 10Patch-For-Review: "Download data -> Excel XLSX" corrupted: Cut off after invalid character - https://phabricator.wikimedia.org/T314706 (10dcaro) The [[ https://en.wikipedia.org/wiki/File:ArthurCovey.jpg#metadata | image on wiki ]] shows that those fields with the bad bits (ExifVersion and FlashPixV... [11:08:51] 10Data-Engineering, 10Data-Engineering-Operations, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Access request to analytics system(s) for TThoabala - https://phabricator.wikimedia.org/T315409 (10Jelto) [11:30:33] 10Data-Engineering, 10Data-Engineering-Operations, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Access request to analytics system(s) for TThoabala - https://phabricator.wikimedia.org/T315409 (10Jelto) Thanks for the feedback! It seems that `analytics-privatedata-users` is the right group. >>! In T... [11:41:44] 10Data-Engineering, 10Data Pipelines: Use airflow to load cassandra - https://phabricator.wikimedia.org/T306962 (10MoritzMuehlenhoff) Checking in on this, given that there's one month left until Stretch servers need to be gone, what's the status here? [11:51:33] (03PS1) 10Btullis: Update the packaged environment for datahub clients [analytics/refinery] - 10https://gerrit.wikimedia.org/r/827987 (https://phabricator.wikimedia.org/T316336) [11:58:25] 10Data-Engineering, 10Data Pipelines: Use airflow to load cassandra - https://phabricator.wikimedia.org/T306962 (10JAllemandou) >>! In T306962#8198015, @MoritzMuehlenhoff wrote: > Checking in on this, given that there's one month left until Stretch servers need to be gone, what's the status here? Thanks for c... [12:04:01] Hi team - I'm planning on dpeloying now, as my evening will be busy - If ou have stuff you'd like me to deploy, now is the time (I'll start in 10 minutes) [12:09:19] joal: Thanks. I'm currently attempting an upgrade of Datahub to version 0.8.43, which if successful will require: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/129 and https://gerrit.wikimedia.org/r/c/analytics/refinery/+/827987 [12:09:50] However, I'm not there yet and I don't know if it will happen today, so there's currently nothing from me. [12:09:58] ack btullis [12:43:41] (03PS1) 10Btullis: Add an empty directory for GMS plugin auth resources [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/827999 (https://phabricator.wikimedia.org/T316336) [12:45:42] RECOVERY - Check unit status of drop-features-actor-rollup-hourly on an-launcher1002 is OK: OK: Status of the systemd unit drop-features-actor-rollup-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:59:30] PROBLEM - Check unit status of drop-features-actor-rollup-hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-features-actor-rollup-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:13:42] 10Data-Engineering, 10Data Pipelines: Use airflow to load cassandra - https://phabricator.wikimedia.org/T306962 (10MoritzMuehlenhoff) >>! In T306962#8198084, @JAllemandou wrote: >>>! In T306962#8198015, @MoritzMuehlenhoff wrote: >> Checking in on this, given that there's one month left until Stretch servers ne... [13:17:11] 10Data-Engineering, 10API Platform: Establish testing procedure for Druid-based endpoints - https://phabricator.wikimedia.org/T311190 (10BPirkle) [13:17:14] 10Analytics, 10API Platform (Product Roadmap), 10Code-Health-Objective, 10Epic, and 3 others: AQS 2.0 - https://phabricator.wikimedia.org/T263489 (10BPirkle) [13:18:51] 10Analytics, 10API Platform (Product Roadmap), 10Code-Health-Objective, 10Epic, and 3 others: AQS 2.0 - https://phabricator.wikimedia.org/T263489 (10DAbad) **August 30, 2022** - Not completely done with Cassandra-based endpoints - Druid endpoints still need to be done - Tracking doc: https://docs.google.co... [13:22:53] (03CR) 10Btullis: [C: 03+2] Add an empty directory for GMS plugin auth resources [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/827999 (https://phabricator.wikimedia.org/T316336) (owner: 10Btullis) [13:42:15] (03Merged) 10jenkins-bot: Add an empty directory for GMS plugin auth resources [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/827999 (https://phabricator.wikimedia.org/T316336) (owner: 10Btullis) [13:47:02] (03PS1) 10Joal: Bump eventutilities to 1.2.0 and remove duplicate dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828014 [13:49:37] (03CR) 10Gehel: [C: 03+1] "LGTM" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828014 (owner: 10Joal) [13:57:46] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Zabe) [14:00:27] (03CR) 10Joal: [C: 03+2] "Merging for next deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828014 (owner: 10Joal) [14:08:51] (03Merged) 10jenkins-bot: Bump eventutilities to 1.2.0 and remove duplicate dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828014 (owner: 10Joal) [14:16:20] (03PS1) 10Joal: Bump changelog.md to v0.2.5 before release [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828023 [14:21:47] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828023 (owner: 10Joal) [14:22:14] Starting build #110 for job analytics-refinery-maven-release-docker [14:35:01] Project analytics-refinery-maven-release-docker build #110: 09SUCCESS in 12 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/110/ [15:23:34] Starting build #69 for job analytics-refinery-update-jars-docker [15:24:08] joal: As previously mentioned, I've now finished deploying version 0.8.43 of DataHub. Now working on trying to update the packaged environment for refinery to use: https://github.com/wikimedia/analytics-refinery/tree/master/packaged-environments/datahub-cli [15:24:10] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.5 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/828038 [15:24:11] Project analytics-refinery-update-jars-docker build #69: 09SUCCESS in 36 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/69/ [15:24:39] ack btullis - I've so far deployed refinery-source, I can your stuff to refinery if needed [15:24:54] btullis: I however don't know how to build/add the artifact [15:25:58] No, I'm trying to build the artifact from my branch now. I think that the refinery CR is probably not critical: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/827987 [15:26:52] ...but the airflow-dags one (https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/129) probably is critical for a successful daily run and it refers to a previously built artifact, which is the bit I'm working on now. [15:29:07] milimetric: Are you around at the moment, by any chance? [15:41:31] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for refinery deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/828038 (owner: 10Maven-release-user) [15:44:15] btullis: I'm about to deploy refinery - I think I can manage uploading the python env for datahub - there is doc in the readme of package-environement/datahub-cli folder in refinery [15:44:22] btullis: do you wish me to try it? [15:45:50] joal: If you could please, that would be great. I'm also trying to follow the README, but finding it a bit tricky to follow. [15:45:59] ack - will try btullis [15:57:50] hm - on my install I can't run `conda dist` - the command is not recognized by conda :( [15:58:11] That's exactly what I saw as well. [15:58:15] not cool [16:02:34] I'm in meeting now, so postponing the refinery deploy - milimetric if you get nearby it'll be helpful :) [16:02:45] ping aqu - meeting :) [16:47:49] ok - deploy will continue shortly :) Thanks a lot for unlocking the situation milimetric - I'll make sure to make the conda-dist thing work later this week - that's important [16:48:27] yeah, on my environment, I can just do "conda dist". In the readme, I put two things that I needed for that to work: [16:48:41] 1. clone workflow utils and `pip install -e .` [16:48:44] 2. install miniconda [16:48:59] maybe you have some other conda distribution? [16:49:25] milimetric: I already had a conda dist [16:49:29] and it's miniconda [16:49:45] hm, yeah, maybe it's some kind of versioning problem, lemme get you what I have [16:50:01] https://www.irccloud.com/pastebin/NuMeokh2/ [16:50:17] RECOVERY - Check unit status of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 is OK: OK: Status of the systemd unit refinery-drop-pageview-actor-hourly-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:50:32] milimetric: I have 4.12 - not very different [16:50:59] heh, yeah, hardly seems like a breaking release... I'm not sure then... I don't think I did anything else, what error do you get? [16:51:51] I have 4.14. Again, it's the initial `pip install -e .` that I think failed with a sasl error, before I installed miniconda and the sasl libraries. Perhaps it needs to run twice or something? [16:51:52] no error really: `bash: conda-dist: command not found` [16:52:25] I reran the `pip install -e .` in workflow_utils - Successfully installed workflow-utils [16:52:58] k, released: https://archiva.wikimedia.org/#artifact~python/datahub/cli/0.8.43 [16:52:58] Oh: *in workflow_utils*. That I didn't do. [16:53:13] ah, we should clarify in the readme, that is confusing ben [16:53:28] jo: wait, it's `conda dist` with a space not `conda-dist`, right? [16:53:48] same I think milimetric [16:54:13] joal: which conda-dist: /home/milimetric/.local/bin/conda-dist [16:54:22] milimetric: see Successfully installed workflow-utils [16:54:43] yup, no conda-dist for me [16:54:58] mwarf - sorry wrong paste [16:55:05] https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/blob/main/setup.cfg#L32 [16:55:06] is it in /home/joal/.local/bin ? [16:55:14] no conda-dist for me [16:55:44] pip install -e has not installed it I think [16:55:55] that seems to be the case... workflow utils... hmmm [17:01:37] haha milimetric! I got it: WARNING: The scripts artifact-cache, conda-dist and package-version are installed in '/home/jo/.local/bin' which is not on PATH. [17:01:45] PROBLEM - Check unit status of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-pageview-actor-hourly-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:02:05] aha joal, that's what I was trying to say above, I didn't know your username was jo :) [17:03:08] milimetric: I didn't get that, and now re-reading it I understand what you meant :) [17:03:15] milimetric: trying to build the dist now [17:03:34] I should've been clearer. In any case, I built and uploaded 0.8.43: https://archiva.wikimedia.org/#artifact~python/datahub/cli/0.8.43 [17:03:37] milimetric: yup, success [17:03:50] awesome thank you for that [17:04:02] Next step is then merge/deploy of airflow, but that'll be tomorrow [17:04:07] I'll do refine this evening [17:11:46] !log release refinery-source v0.2.5 to archiva [17:11:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:11:55] !log deploy refinery using scap [17:11:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:11:58] 10Data-Engineering, 10Data Pipelines: Broken DAG Error when trying to import Gitlab .tgz file into airflow - https://phabricator.wikimedia.org/T316600 (10JArguello-WMF) [17:12:01] Thanks again both. [17:19:40] I'll push another change to airflow, got another db to ingest [17:21:53] ack milimetric - I'll sync up with you tomorrow on this [17:22:26] https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/130 [17:22:49] (fab, that's the MR for the ingestion, I made you the reviewer ^) [17:22:49] mforns: I'll also ask you for help as the errors we're seeing about data-deletion are related to the new limitations mechanism [17:45:27] btullis: Man, I almost forgot!!! Maybe you're not yet gone? [17:49:09] !log Deploying refinery onto HDFS [17:49:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:58:55] joal: o/ I'm still sort of around. How can I help? [17:59:02] \o/ [17:59:26] Would you please merge this patch btullis - https://gerrit.wikimedia.org/r/c/operations/puppet/+/826564 [17:59:43] this is a very simple change, that will be usefull for sqoop at the beginning of the month [17:59:58] btullis: low probability of failure (from this patch at least :) [18:01:10] joal: Merged and deployed. 👍 👍 [18:01:30] So many thanks btullis :) Enjoy your time off [18:02:02] Thank you. I'm upgrading my server at home to bullseye :-) [18:02:12] Good luck with that :) [18:02:51] That's not why I'm taking three days' leave though. :-) [18:28:32] 10Data-Engineering, 10Event-Platform Value Stream, 10Wikidata, 10Wikidata-Query-Service: Upgrade the WDQS streaming updater to latest flink (1.15) - https://phabricator.wikimedia.org/T289836 (10Aklapper) [18:41:37] hey joal looking at the deletion errors [19:07:14] Thanks a lot mforns - let's talk tomorrow about those - I'm done for today - ok? [19:11:33] ocf joal ! [19:11:35] ofc! [20:19:45] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00): Use RowTypeInfo to ensure better validation of the event data within the Mediawiki Stream Enrichment pipeline - https://phabricator.wikimedia.org/T316555 (10gmodena) These changes have been merged into https://gitlab.wikimedia.org/repos/data-engin... [20:21:24] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00): [Shared Event Platform] Mediawiki Stream Enrichment should consume the consolidated page-change stream. - https://phabricator.wikimedia.org/T311084 (10gmodena) [20:40:08] RECOVERY - Check unit status of drop-features-actor-hourly on an-launcher1002 is OK: OK: Status of the systemd unit drop-features-actor-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:53:56] PROBLEM - Check unit status of drop-features-actor-hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop-features-actor-hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers