[00:21:39] (03CR) 10Eevans: [C: 04-1] image-suggestions-feedback: Bump to version 2.0.0 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [08:50:43] (03CR) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [08:54:20] (03CR) 10Kosta Harlan: image-suggestions-feedback: Bump to version 2.0.0 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [09:06:37] Good morning btullis - SOmething weird happened on clouddumps1002 - It seems some file got copied with wrong user and group, leading to an error in hdfs-rsync job (see alert email) - Could you drop that faulty file please? /srv/dumps/xmldatadumps/public/other/pageview_complete/2022/2022-11/pageviews-20221102-automated.bz2 (or change its ownership, as you wish [09:09:40] (03PS6) 10Kosta Harlan: image-suggestions-feedback: Bump to version 1.0.1 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) [10:14:02] !log btullis@clouddumps1002:/srv/dumps/xmldatadumps/public/other/pageview_complete/2022/2022-11$ sudo chown dumpsgen:dumpsgen pageviews-20221102-automated.bz2 [10:14:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:14:14] !log btullis@clouddumps1002:/srv/dumps/xmldatadumps/public/other/pageview_complete/2022/2022-11$ sudo systemctl restart analytics-dumps-fetch-pageview_complete_dumps.service [10:14:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:15:06] PROBLEM - SSH on an-coord1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [10:19:52] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Monthly pageview stats for October 2022 missing - https://phabricator.wikimedia.org/T322239 (10Radim.kubacki) 05Open→03Resolved a:03Radim.kubacki It is already there. We can close this. [10:23:14] 10Data-Engineering-Planning, 10Data-Catalog, 10Patch-For-Review: Upgrade DataHub to v0.8.43 - https://phabricator.wikimedia.org/T316336 (10BTullis) 05Open→03Resolved THis is now resolved. We have upgraded to version 0.9.0 in {T321907} [10:23:16] 10Data-Engineering-Planning, 10Data-Catalog, 10Patch-For-Review: Create Airflow Pipeline for Ingesting/Updating Superset Data - https://phabricator.wikimedia.org/T309622 (10BTullis) [11:12:18] 10Data-Engineering, 10Cloud-Services, 10serviceops-collab, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10jbond) [11:56:38] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03): [airflow] Normalize the use of timeouts in Airflow DAGs - https://phabricator.wikimedia.org/T317549 (10EChetty) 05Open→03Resolved [11:56:43] 10Data-Engineering-Planning, 10Wikidata, 10Wikidata Analytics, 10Data Pipelines (Sprint 03): Some reliability metrics missing since June 20th '22 - https://phabricator.wikimedia.org/T314131 (10EChetty) 05Open→03Resolved [11:56:51] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03): Fix `refinery-drop-older-than` script for end-of-month/end-of-year - https://phabricator.wikimedia.org/T316746 (10EChetty) 05Open→03Resolved [13:17:58] RECOVERY - SSH on an-coord1002.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:43:46] 10Data-Engineering-Planning, 10Machine-Learning-Team, 10Research, 10Shared-Data-Infrastructure: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10Ottomata) FYI, we have deployed a `rc0.media... [13:55:15] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Spike: Easy Flink Python UDF + SQL enrichment - https://phabricator.wikimedia.org/T320968 (10tchin) Huh I guess I overlooked this since everything we've been doing has been on Yarn, but if I wanted to produce to Kafka or Hadoop using t... [14:29:27] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10xcollazo) [14:29:29] 10Data-Engineering, 10Epic: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10xcollazo) [14:43:29] (03PS1) 10Aqu: Declare HDFS fsimage dataset in hive metastore [analytics/refinery] - 10https://gerrit.wikimedia.org/r/853303 (https://phabricator.wikimedia.org/T321169) [14:44:55] (03CR) 10Eevans: image-suggestions-feedback: Bump to version 1.0.1 (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/809150 (https://phabricator.wikimedia.org/T302925) (owner: 10Kosta Harlan) [15:37:51] ottomata: o/ [15:38:36] We have a new team mate (Ilias), I am working on getting them access etc.. would you mind to review https://phabricator.wikimedia.org/T322350 for analytics-privatedata when you have a moment? [15:39:07] 10Data-Engineering, 10Cloud-Services, 10serviceops-collab, 10Patch-For-Review: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937 (10LSobanski) p:05Medium→03Triage [15:39:26] (03CR) 10Joal: "A big bunch of comments" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/853303 (https://phabricator.wikimedia.org/T321169) (owner: 10Aqu) [15:54:41] 10Data-Engineering, 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10Jelto) 05Resolved→03Open CI builds fail again with `No space left on device`. See: https://gitlab.wikimedia.org/repos/rele... [16:07:33] 10Data-Engineering, 10Data-Engineering-Kanban, 10Shared-Data-Infrastructure, 10Patch-For-Review: Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10BTullis) Adding to value stream for prioritization. [16:08:12] 10Data-Engineering: NEW FEATURE REQUEST: - https://phabricator.wikimedia.org/T322423 (10EChetty) [16:54:44] (03PS1) 10Jenniferwang: Add mediawiki_ipinfo_interaction schema fields to EventLogging allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/853376 (https://phabricator.wikimedia.org/T322379) [19:38:43] 10Data-Engineering, 10GitLab, 10Release-Engineering-Team, 10serviceops-collab: Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10Dzahn) on runner-1021: ` dzahn@runner-1021:~$ systemctl status clear-docker-cache.timer ● clear-docker-cache.timer - Periodi... [21:00:50] (03PS2) 10Aqu: Create dataset from HDFS fsimage.xml [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [21:23:45] PROBLEM - SSH on an-coord1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:50:33] 10Data-Engineering-Planning, 10Product-Analytics (Kanban): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10mpopov) So, when SQL templating was enabled I played around with it to try it out. I do not recommend it as a workaround – it will be difficult to implement. [23:33:48] 10Analytics, 10API Platform (Sprint 00), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [23:45:25] 10Analytics, 10API Platform (Sprint 00), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin)