[00:31:36] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:36:26] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_legacy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:35:28] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10BTullis) [09:37:25] (03PS4) 10Awight: New event schema for Kartographer ExternalData fetch performance [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) [09:38:36] (03CR) 10Awight: New event schema for Kartographer ExternalData fetch performance (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [09:46:33] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10BTullis) [10:03:09] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host an-tool1... [10:04:39] !log proceeding to upgrade an-tool1010 to bullseye for superset 1.5.3 upgrade T323458 [10:04:40] 10Data-Engineering, 10Equity-Landscape: Wiki DB Map - https://phabricator.wikimedia.org/T309283 (10ntsako) 05In progress→03Resolved [10:04:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:04:42] T323458: NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 [10:04:42] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10ntsako) [10:04:48] 10Data-Engineering, 10Equity-Landscape: Editorship Output Rank Metrics - https://phabricator.wikimedia.org/T306618 (10ntsako) 05In progress→03Resolved [10:04:51] 10Data-Engineering, 10Equity-Landscape: Milestone: Ingest and Transform Input Data - https://phabricator.wikimedia.org/T305475 (10ntsako) [10:04:57] 10Data-Engineering, 10Equity-Landscape: Readership Output Rank Metrics - https://phabricator.wikimedia.org/T306617 (10ntsako) 05In progress→03Resolved [10:04:59] 10Data-Engineering, 10Equity-Landscape: Milestone: Ingest and Transform Input Data - https://phabricator.wikimedia.org/T305475 (10ntsako) [10:05:06] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10ntsako) [10:05:08] 10Data-Engineering, 10Equity-Landscape: Load country data - https://phabricator.wikimedia.org/T310712 (10ntsako) 05In progress→03Resolved [10:06:57] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10ntsako) [10:06:59] 10Data-Engineering, 10Equity-Landscape: Readership input metrics - https://phabricator.wikimedia.org/T309273 (10ntsako) 05In progress→03Resolved [10:07:06] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10ntsako) [10:07:08] 10Data-Engineering, 10Equity-Landscape: Editorship Input Metrics - https://phabricator.wikimedia.org/T309274 (10ntsako) 05In progress→03Resolved [10:17:21] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10JEbe-WMF) [10:23:14] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10JEbe-WMF) [10:37:16] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-tool1010.... [10:38:23] (03CR) 10Btullis: [V: 03+2 C: 03+2] Upgrade superset to verstion 1.5.3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/865609 (https://phabricator.wikimedia.org/T323458) (owner: 10Btullis) [11:10:02] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10BTullis) I caried out the following tasksa: * reimaged an-tool1010 to bring it up to bullseye * merged htt... [11:14:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4042%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:14:42] (VarnishkafkaNoMessages) firing: varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp4042%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:15:34] (03PS1) 10Btullis: Remove a reference to a scap target that no longer exists [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/882603 (https://phabricator.wikimedia.org/T323458) [11:19:41] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:19:42] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:24:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4046%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:24:41] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:24:42] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4042 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:27:41] (VarnishkafkaNoMessages) firing: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:28:13] 10Data-Engineering, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10BTullis) I have merged the changes to `data.yaml` so Jennifer should now have production shell access and access to the... [11:29:41] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:30:09] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 07): Tests for mediawiki-stream-enrichment-python flink job via eventutilities-python - https://phabricator.wikimedia.org/T326565 (10gmodena) Patch for the test sink collection failure at: https://gitlab.wikimedia.org/repos/data-engineering/eventutiliti... [11:32:41] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:33:59] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10BTullis) The work is completed. I'll work with @JEbe-WMF to verify access. [11:34:21] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Requesting access to Data Engineering team resources for Jennifer Ebe - https://phabricator.wikimedia.org/T327406 (10BTullis) 05Open→03Resolved p:05Triage→03Medium [11:34:58] (03CR) 10Btullis: [V: 03+2 C: 03+2] Remove a reference to a scap target that no longer exists [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/882603 (https://phabricator.wikimedia.org/T323458) (owner: 10Btullis) [11:37:42] (VarnishkafkaNoMessages) firing: varnishkafka on cp4051 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4051%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:39:41] (VarnishkafkaNoMessages) firing: (6) varnishkafka on cp4046 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:40:57] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] New event schema for Kartographer ExternalData fetch performance (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [11:41:30] (03Merged) 10jenkins-bot: New event schema for Kartographer ExternalData fetch performance [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [11:42:33] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10BTullis) 05Open→03Resolved [11:42:41] (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:44:41] (VarnishkafkaNoMessages) resolved: (5) varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:54:30] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade db1108 to Bullseye - https://phabricator.wikimedia.org/T304492 (10Marostegui) @BTullis any ETA? [11:55:58] (03CR) 10Awight: New event schema for Kartographer ExternalData fetch performance (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [11:57:17] 10Data-Engineering, 10Equity-Landscape: Population input metrics - https://phabricator.wikimedia.org/T309279 (10ntsako) a:05JAnstee_WMF→03ntsako [12:28:05] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+2] New event schema for Kartographer ExternalData fetch performance (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/880479 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [13:00:02] joal: o/ Are you around, by any chance? [13:00:29] 10Data-Engineering: Requesting Kerberos identity for Hxi-ctr - https://phabricator.wikimedia.org/T325857 (10BTullis) I've deleted this principal with the following command: ` btullis@krb1001:~$ sudo manage_principals.py delete hxi-ctr Principal successfully deleted. Since the principal seems to be related to a u... [13:06:13] !log restarted webrequest_sampled_supervisor realtime druid indexation job [13:06:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:28:29] Hi b [13:28:36] Hi btullis - I am now! [13:28:40] sorry for the delay [13:30:34] Cool, I've had a query from v.olans about this patch: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/861365 [13:31:22] Essentially, even though it was deployed in November, the results aren't apparent on https://superset.wikimedia.org/superset/explore/p/6N9kQ7abEZD/ [13:33:28] Whereas the change was reflected correctly on the batch job: https://superset.wikimedia.org/superset/explore/p/6wgYQ0rb9j2/ [13:34:06] I thought that it might have been necessary to restart the realtime job, so I did that with the following: [13:34:11] https://www.irccloud.com/pastebin/KaKRUwnG/ [13:34:45] ...but we can't see any difference as a result. Do you have any insights that might help? [13:54:35] 10Data-Engineering, 10serviceops, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8544808, @Ottomata wrote: > What am I doing wrong? You're just runn... [14:00:57] volans: Hi - btullis asked me to check the time-to-first-byte metric type in druid/superset - let me know if/when you want to talk about that [14:01:22] joal: hey, thanks a lot, I'm here for the next hour (then a meeting) [14:02:29] whenever suits you [14:04:48] volans: I think the data was correctly updated in Druid, but the schema was not updated on Superset [14:05:02] I did that and now I think you have float data as expoected [14:05:05] volans: --^ [14:05:24] ohhhh nice! [14:05:40] volans: would you mind checking? [14:05:46] so even though the field had the same name [14:05:56] it did required a metadata refresh on the superset side too [14:06:07] yes, now my graph shows the same values for both metrics [14:06:15] as expected [14:07:27] 10Data-Engineering, 10Product-Analytics (Kanban), 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10BTullis) Adding this update retrospectively. it was first mentioned in the #product-analytics channel on [[https://wikime... [14:07:42] volans: I assume it's a fixed problem then :) [14:08:05] I think so, let me modify the metric and check the results, I'll confirm in 2 minutes [14:08:16] sure volans [14:08:45] volans: I told you that joal would fix it just like that :-) [14:09:01] eheeh, stupid me of not thinking tojust try that, it actually makes sense [14:10:41] Was it this? [14:10:43] https://usercontent.irccloud-cdn.com/file/c4TwIVAK/image.png [14:10:50] yes [14:11:10] that syncs the metadata from druid to superset [14:11:16] that's not what I used, I manually updated the field, but possibly it does the same [14:11:24] yes I did just run it now [14:12:02] I guess superset was treating aggregated_time_firstbyte as integer and so ignoring all the 0.x values, hence the big difference in the graph [14:12:16] that's great, thanks a lot! [14:12:17] all fixed [14:12:32] next tiem I'll try a sync datasource first, sorry for the trouble [14:12:42] np volans - happy to help :) [14:13:36] joal: 'Manually updated the field' I don't see how to do that. Could you explain please? [14:15:21] I used the "legacy datasource editor" btullis [14:15:49] Ah, thanks. Of course. [14:15:50] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) Sigh, I'm also reconsidering my [[ https://phabricator.wikimedia.org/T308017#8402493 | desire... [14:33:21] (03CR) 10Awight: "This change is ready for review." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/882631 (https://phabricator.wikimedia.org/T326637) (owner: 10Awight) [14:54:01] 10Quarry: restart quarry processes on schedule - https://phabricator.wikimedia.org/T327522 (10rook) root crontab set on web: ` 0 14 * * 2 /usr/bin/systemctl restart uwsgi-quarry-web.service ` Workers set to: ` 10 14 * * 2 /usr/bin/systemctl restart celery-quarry-worker.service ` and ` 20 14 * * 2 /usr/bin/system... [14:54:14] 10Quarry: restart quarry processes on schedule - https://phabricator.wikimedia.org/T327522 (10rook) 05Open→03Resolved a:03rook [15:37:25] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10Manuel) Hi everyone, thank you for the update! :) Unfortunately, I now have the same issues: >>! In T323... [15:44:58] o/ we've got a question here (related to pageviews) https://www.wikidata.org/wiki/Wikidata_talk:Report_a_technical_problem/WDQS_and_Search what would be the best place to ask such questions? [15:50:54] dcausse: either here, or an email to the analytics mailing list :) [15:51:20] joal: thanks! ok will point them to the ML [16:29:45] 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) Having carried out a review of the [[https://opendev.org/openstack/puppet-ceph|pupp... [16:30:24] 10Data-Engineering-Planning, 10Epic, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Decide on installation details for new ceph cluster - https://phabricator.wikimedia.org/T326945 (10BTullis) [16:36:00] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.3 - https://phabricator.wikimedia.org/T323458 (10BTullis) Hi @Manuel - I have added the `sql_lab` role to your account in Superset, so you should now be al... [16:41:15] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [16:53:17] 10Data-Engineering: Requesting Kerberos identity for Hxi-ctr - https://phabricator.wikimedia.org/T325857 (10jbond) [18:18:34] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic: [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content - https://phabricator.wikimedia.org/T307959 (10Ottomata) [18:18:57] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic: [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content - https://phabricator.wikimedia.org/T307959 (10Ottomata) [18:51:46] 10Quarry: [feedback] - https://phabricator.wikimedia.org/T327682 (10Dusan_Krehel) [18:59:17] 10Quarry: [feedback] The auto end line in the export - https://phabricator.wikimedia.org/T327682 (10Dusan_Krehel) [18:59:51] 10Quarry: Check browser user agent and provide line endings \n for Linux based browsers - https://phabricator.wikimedia.org/T327682 (10Aklapper) [19:02:28] 10Quarry: Check browser user agent and provide line endings \n for Linux based browsers - https://phabricator.wikimedia.org/T327682 (10Aklapper) Hi, how does this create an actual problem? Could you please elaborate on your workflow and which tools have problems to handle the current output? [19:20:17] 10Quarry: Check browser user agent and provide line endings \n for Linux based browsers - https://phabricator.wikimedia.org/T327682 (10Dusan_Krehel) If I click on "Download data" on https://quarry.wmcloud.org/query/70729, so the TSV export is with CRLF line format. If a person processes the entire export: file_... [19:30:42] 10Analytics: Fix broken image on front page of analytics.wikimedia.org - https://phabricator.wikimedia.org/T327687 (10Aklapper) [19:32:48] (03PS1) 10Aklapper: Correct invalid URL of image on frontpage of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/882714 (https://phabricator.wikimedia.org/T327687) [19:48:36] (03CR) 10Mforns: "I haven't reviewed to the detail, but code looks good to me overall!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu) [19:56:54] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING] Set PYTHONPATH and FLINK_CLASSPATH in Flink docker images. - https://phabricator.wikimedia.org/T327494 (10gmodena) Summary of chat with @Ottomata. There's two aspects to address: 1. Missing python deps (e.g. `protobuf`) in our docker images.... [20:07:45] 10Data-Engineering, 10serviceops, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10bking) [20:19:35] 10Data-Engineering, 10API Platform (Sprint 03), 10AQS2.0, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews: Implement Unit Tests - https://phabricator.wikimedia.org/T299735 (10BPirkle) Smoke test QA should be fine for this one. [20:25:44] hey y'all, happy new year! [20:26:07] quick one: is there a way to access an year old pagelinks table in hive/spark on the analytics cluster? [20:27:01] for instance, using snapshot as 2021-11 or 2022-01. I checked and wmf_raw only starts returning something '2022-07' [20:27:24] *since '2022-07', older months return an empty response [20:57:54] 10Data-Engineering, 10serviceops, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) FINALLY GOT flink-example-app running. YESSSS! [21:15:34] 10Data-Engineering-Radar, 10Gerrit-Privilege-Requests, 10Release-Engineering-Team (Blocking 🧱): Requesting membership of the analytics group in gerrit for 'snwachukwu', 'nokafor', and 'xcollazo' - https://phabricator.wikimedia.org/T314592 (10thcipriani) >>! In T314592#8266214, @xcollazo wrote: >>This should... [22:08:15] milimetric: joal mforns ^^^ aarora PMed me and I was starting to advise, i think the most directly way to get what thtey want might be to sqoop up a mysql dump from https://dumps.wikimedia.org/backup-index.html ? [22:08:58] aarora: the tool we use for this is called sqoop [22:09:06] it is installed on the stat boxes. [22:09:19] i don't have good advice on how to use it, but check out its help messages and google around [22:09:31] i'm sure there is some info on how to use sqoop to import a mysqldumpo [22:09:35] into a hive table [22:10:36] okay, thanks Andrew. I will have a look. If someone else has any advice or ready documentation on importing historical sql dumps (obtained via internet archive) into hive, that would be very helpful. Thanks! [22:10:52] aarora: are they .sql or .csv format? [22:12:07] I was talking about: e.g. enwiki-20220101-pagelinks.sql.gz or enwiki-20230101-page.sql.gz [22:12:20] these are available on Internet archive [22:15:10] k ya .sql format [22:17:27] aarora: all of the sqoop examples I see importing directly from a running mysql server... [22:17:28] hm. [22:18:07] but hm, i wonder... [22:18:07] i doubt the create statetments in the .sql files will work in hive QL, but [22:18:20] if you manualy created the hive tables with the same schema as the mysql ones [22:18:30] then filtered the .sql files for just the insert statements [22:18:32] it might work. [22:20:31] okay, I will try to have a look. It seems a bit more involved. Perhaps, querying these dumps natively seem simpler. Anyway, thanks a lot. I will try based on the suggested solutions. [22:49:29] 10Data-Engineering-Radar, 10MediaWiki-General: Update pingback MediaWiki versions to include new values - https://phabricator.wikimedia.org/T326825 (10CCicalese_WMF) 05Open→03Resolved a:03CCicalese_WMF Thank you @mforns! The queries since last March have been re-run, and the graph is now updated. [23:00:32] 10Data-Engineering-Planning, 10Product-Analytics, 10Data Pipelines (Sprint 07): Include EU Registered Country in the canonical country database - https://phabricator.wikimedia.org/T324995 (10odimitrijevic) Pinging Product Analytics for review.