[00:00:59] (03CR) 10Nray: [C: 03+1] Add new fragment for editattemptstep [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919157 (https://phabricator.wikimedia.org/T335309) (owner: 10Kimberly Sarabia) [00:09:52] (03CR) 10Kimberly Sarabia: [C: 03+2] Add new fragment for editattemptstep (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919157 (https://phabricator.wikimedia.org/T335309) (owner: 10Kimberly Sarabia) [00:10:26] (03Merged) 10jenkins-bot: Add new fragment for editattemptstep [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919157 (https://phabricator.wikimedia.org/T335309) (owner: 10Kimberly Sarabia) [00:10:29] (03Merged) 10jenkins-bot: Modifies AB Test Enrollment schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919879 (https://phabricator.wikimedia.org/T335309) (owner: 10Kimberly Sarabia) [00:18:36] (03CR) 10Kimberly Sarabia: [C: 03+2] Web UI Scroll: Use latest web fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919928 (https://phabricator.wikimedia.org/T335309) (owner: 10Jdlrobson) [00:19:06] (03Merged) 10jenkins-bot: Web UI Scroll: Use latest web fragment [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/919928 (https://phabricator.wikimedia.org/T335309) (owner: 10Jdlrobson) [00:31:04] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:33:29] (SystemdUnitFailed) firing: (21) monitor_refine_eventlogging_legacy.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:10:08] RECOVERY - Check systemd state on an-airflow1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:12:48] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 08), 10MW-1.40-notes (1.40.0-wmf.23; 2023-02-13), 10Patch-For-Review: mediawiki/page/change event schema - Use single array field for user attributes instead of boolean fields - https://phabricator.wikimedia.org/T336506 (10Ottomata) Had a little bra... [04:33:35] (SystemdUnitFailed) firing: (20) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:37:34] (03PS3) 10Mforns: Migrate banner activity druid loading to Spark3 and Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/919883 (https://phabricator.wikimedia.org/T336184) [05:37:59] (03CR) 10Mforns: [V: 03+2] Migrate banner activity druid loading to Spark3 and Airflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/919883 (https://phabricator.wikimedia.org/T336184) (owner: 10Mforns) [08:07:18] joal: hello! I finished the banner activity DAG, the final monthly DAG test is still running, but I tested the query separately and it worked fine, plus the daily version worked well and the data in turnilo looks great. I have to leave now, but here are the 2 code reviews (queries and DAG): https://gerrit.wikimedia.org/r/c/analytics/refinery/+/919883 [08:07:18] https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/387 I'll ping you when I'm back! Cheers :-) [08:33:25] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 3 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) Thanks @Dzahn - That's a useful reference. I've created two user accounts in Matomo for `twi... [08:33:35] (SystemdUnitFailed) firing: (20) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:54:39] 10Data-Engineering, 10Advanced-Search, 10All-and-every-Wikisource, 10ArticlePlaceholder, and 65 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10TheresNoTime) [09:08:29] (SystemdUnitFailed) firing: (20) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:10:54] 10Data-Engineering, 10DBA: dbstore1003 filling up - https://phabricator.wikimedia.org/T336733 (10Marostegui) [09:11:05] 10Data-Engineering, 10DBA: dbstore1003 filling up - https://phabricator.wikimedia.org/T336733 (10Marostegui) p:05Triage→03High [09:15:04] 10Data-Engineering, 10DBA: dbstore1003 filling up - https://phabricator.wikimedia.org/T336733 (10BTullis) Thanks @Marostegui - Let me know if there's anything I can do to help. [09:21:24] 10Data-Engineering, 10DBA: dbstore1003 filling up - https://phabricator.wikimedia.org/T336733 (10Marostegui) I have started with s5 [10:06:09] 10Data-Engineering, 10Data-Platform-SRE, 10LDAP-Access-Requests, 10SRE, and 3 others: Grant temporary access to web based Data Engineering tools to Bishop Fox - https://phabricator.wikimedia.org/T336357 (10BTullis) [10:44:31] 10Data-Engineering, 10Data Pipelines (Sprint 13): Update Sqoop for externallinks table changes - https://phabricator.wikimedia.org/T335917 (10Antoine_Quhen) a:03Antoine_Quhen [10:54:09] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) I've now prepared an update for conda-analytics that contains the spark3-yarn-shuf... [11:04:53] !log depooled schema2004 for T335042 [11:04:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:04:59] T335042: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 [11:25:21] 10Data-Engineering, 10Data-Engineering-Wikistats, 10translatewiki.net, 10I18n: Automate adding languages to wikistats - https://phabricator.wikimedia.org/T336752 (10Amire80) [11:31:54] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/919883 (https://phabricator.wikimedia.org/T336184) (owner: 10Mforns) [11:32:44] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/901545 (owner: 10Joal) [11:40:57] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/915786 (https://phabricator.wikimedia.org/T335987) (owner: 10Gerrit maintenance bot) [11:41:58] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/916861 (https://phabricator.wikimedia.org/T336115) (owner: 10Gerrit maintenance bot) [11:45:42] !log Deploy refinery using scap [11:45:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:02:51] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE-Access-Requests: Allow gmodena and tchin to merge changes to operation/deployment-charts repo - https://phabricator.wikimedia.org/T336755 (10Ottomata) [12:03:45] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE-Access-Requests: Allow gmodena and tchin to merge changes to operation/deployment-charts repo - https://phabricator.wikimedia.org/T336755 (10Ottomata) [12:06:08] 10Data-Engineering, 10Event-Platform Value Stream, 10Gerrit-Privilege-Requests: Allow gmodena and tchin to merge changes to operation/deployment-charts repo - https://phabricator.wikimedia.org/T336755 (10taavi) Both are members of the `deployment` group (via `platform-engineering` membership), so I've added... [12:06:44] 10Data-Engineering, 10Event-Platform Value Stream, 10Gerrit-Privilege-Requests: Allow gmodena and tchin to merge changes to operation/deployment-charts repo - https://phabricator.wikimedia.org/T336755 (10taavi) 05Open→03Resolved a:03taavi [12:15:44] Hi btullis - I'll need a hand on deployment today [12:18:25] btullis: deployment of the thin environment for refinery failed due to an-airflow1001 [12:18:46] I guess I should have merged/deployed https://gerrit.wikimedia.org/r/c/analytics/refinery/scap/+/919036 before attempting my deploy, right? [12:19:01] 10Data-Engineering, 10Event-Platform Value Stream, 10Machine-Learning-Team: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) [12:19:07] 10Data-Engineering, 10Event-Platform Value Stream, 10Discovery-Search (Current work), 10Patch-For-Review: Add support for redirects in CirrusSearch - https://phabricator.wikimedia.org/T325315 (10Ottomata) [12:23:22] joal: happy to help. Yes, that patch should be merged, but it was meant to have an-airflow1005 in it instead of an-airflow1001. [12:23:22] It looked like I updated the commit message but forgot the change to targets. [12:24:01] ack btullis - it seems the latest version is ok, right? [12:26:02] Oh yes, it is correct. I had the wrong diff view and couldn't see it. Feel free to merge. [12:26:16] thank you btullis - merging and dpeloying :) [12:26:58] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/919036 (https://phabricator.wikimedia.org/T333697) (owner: 10Btullis) [12:27:48] problem solved btullis - thanks a lot, deployment train continues! [12:37:18] btullis: I need you again :S We have not documented the solution to overcome the git issue we're having when deploying onto HDFS - can you tell me the trick again (I forgot :S) [13:05:04] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 14 A): Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10gmodena) @elukey @JMeybohm since it seems we reached consensus, I'd like to enable Zookeeper HA in our app. Who's currently responsible for... [13:08:29] (SystemdUnitFailed) firing: (20) refine_event_sanitized_analytics_test_immediate.service Failed on an-test-coord1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:09:53] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JArguello-WMF) [13:10:17] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JArguello-WMF) [13:11:13] (DiskSpace) firing: Disk space dbstore1003:9100:/srv 5.889% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:12:33] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: Improve mediawiki-event-enrichment test suite - https://phabricator.wikimedia.org/T328013 (10JArguello-WMF) [13:16:22] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: mediawiki/page/change event schema - Use single array field for user attributes instead of boolean fields - https://phabricator.wikimedia.org/T336506 (10JArguello-WMF) [13:26:13] (DiskSpace) resolved: Disk space dbstore1003:9100:/srv 5.615% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [13:27:21] 10Data-Engineering, 10Data-Engineering-Wikistats: Change GitHub link to Gerrit in the Wikistats frontend - https://phabricator.wikimedia.org/T336765 (10Amire80) [13:30:42] (03PS1) 10Amire80: Make consistent identation in languages.json [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/920275 [13:31:15] joal: sorry for the delay, I thought we had addressed the git issue fully. What are your seeing? [13:38:50] btullis: It was the exact same error as last time, which should be fixed for now to allow todays deploy. I am updating the ticket with the details [13:40:18] !log pooled schema2004 for T335042 [13:40:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:40:20] T335042: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 [13:42:02] stevemunene: Do you mean `OSError: [Errno 13] Permission denied: '.git/fat/objects/tmpxUXHUu'` or `detected dubious ownership in repository`? This ticket? T334493 [13:42:03] T334493: anlytics/refinery deployment broken at refinery-deploy-to-hdfs - https://phabricator.wikimedia.org/T334493 [13:42:18] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10xcollazo) >>! In T332765#8854401, @BTullis wrote: > There's an [[https://gitlab.wikimedia.o... [13:43:29] (SystemdUnitFailed) firing: (21) refine_event_sanitized_analytics_test_immediate.service Failed on an-test-coord1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:45:26] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) > I see that you have already cut the 0.0.14 release. Oh yes. I could rerun the p... [13:45:35] btullis: this was more on `fatal: detected dubious ownership in repository`details discussed on the ticket. [13:47:28] stevemunene: joal: That was definitely supposed to have been fixed permanently by this CR: That was definitely supposed to have been fixed permanently by this [13:47:59] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10xcollazo) >>! In T332765#8855428, @xcollazo wrote: >>>! In T332765#8854401, @BTullis wrote:... [13:48:35] Forgot the link: https://gerrit.wikimedia.org/r/c/operations/puppet/+/912301 [13:53:08] stevemunene: Adding a patch now [13:58:40] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 08), 10MW-1.40-notes (1.40.0-wmf.23; 2023-02-13), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) [13:59:17] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: mediawiki/page/change event schema - Use single array field for user attributes instead of boolean fields - https://phabricator.wikimedia.org/T336506 (10Ottomata) 05Open→03Declined Discussed this with the Event Platform... [14:00:06] 10Data-Engineering, 10Data-Persistence, 10Event-Platform Value Stream, 10IP Masking, 10Platform Engineering: MediaWiki user types - https://phabricator.wikimedia.org/T336176 (10Ottomata) > MediaWiki were to model user types in a more flexible and comprehensive way, we'd want the event data model to match... [14:00:38] !log Deploy refinery onto HDFS [14:00:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:03:29] (SystemdUnitFailed) firing: (21) refine_event_sanitized_analytics_test_immediate.service Failed on an-test-coord1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:10:13] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 14 A): Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10gmodena) a:03gmodena [14:13:29] (SystemdUnitFailed) firing: (21) refine_event_sanitized_analytics_test_immediate.service Failed on an-test-coord1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:19:46] 10Data-Engineering, 10Event-Platform Value Stream, 10Gerrit-Privilege-Requests: Allow gmodena and tchin to merge changes to operation/deployment-charts repo - https://phabricator.wikimedia.org/T336755 (10Ottomata) Thank you! [14:21:52] joal: stevemunene: I think you've deployed with the workaround for now, but just to let you know this SSH safe directory issue should be fixed for next time. [14:27:43] 10Data-Engineering: Codex, Graph, and Wikistats walk into a bar graph - https://phabricator.wikimedia.org/T336544 (10sbassett) Hey @Milimetric - Should we consider some of the ideas being proposed in @tgr's task (T336595)? Or maybe merge the two? Or were you thinking about going in a different direction with t... [14:28:06] thanks btullis [14:32:13] (DiskSpace) firing: Disk space dbstore1003:9100:/srv 5.915% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [14:52:13] (DiskSpace) resolved: Disk space dbstore1003:9100:/srv 5.468% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=dbstore1003 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [15:00:09] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 14 A): Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10gmodena) Flink docs recommend setting zookeeper https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#high-availabili... [15:41:06] !log Deploying analytics airflow dags [15:41:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:45:35] !log Clear failed wikidata_item_page_link sensor task after deploy - due to datacenter switcover [15:45:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:50:48] !log Start airflow mediawiki_history_reduced job with start-date to 2023-05-01 [15:50:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:51:06] !log Kill oozie mediawiki_history_reduced job [15:51:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:55:06] !log Start airflow duid_load_banner_activity_minutely [15:55:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:55:47] !log Kill oozie banner_activity_daily job [15:55:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:57:32] !log Start airflow druid_load_banner_activity_minutely_aggregated_monthly [15:57:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:58:01] !log Kill oozie banner_activity-druid-monthly-coord job [15:58:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:35:06] (03PS2) 10Nick Ifeajika: query finetuning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 [16:37:03] (03CR) 10Nick Ifeajika: query finetuning (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 (owner: 10Nick Ifeajika) [16:41:07] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade Stats clients to bullseye - https://phabricator.wikimedia.org/T329360 (10BTullis) Hi @fkaelin - I believe that we will be tackling {T336040} within the next couple of weeks, which should at least get you ROCm 5.4 on bullseye. That's as soon as... [16:43:57] 10Data-Engineering-Planning, 10SRE-swift-storage, 10Event-Platform Value Stream (Sprint 14 A): Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing - https://phabricator.wikimedia.org/T330693 (10Ottomata) [16:44:01] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10Ottomata) [16:44:07] 10Data-Engineering-Planning, 10SRE-swift-storage, 10Event-Platform Value Stream (Sprint 14 A): Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing - https://phabricator.wikimedia.org/T330693 (10Ottomata) Thanks @hnowlan took me a bit to find this, but I did and we adde... [16:53:09] (03Abandoned) 10Nick Ifeajika: query finetuning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/919898 (owner: 10Nick Ifeajika) [16:56:02] (03PS3) 10Nick Ifeajika: Add test for knowledge gap totals endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/915678 [16:57:26] (03CR) 10Nick Ifeajika: Add test for knowledge gap totals endpoint (036 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/915678 (owner: 10Nick Ifeajika) [16:59:19] (03CR) 10CI reject: [V: 04-1] Add test for knowledge gap totals endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/915678 (owner: 10Nick Ifeajika) [17:09:19] 10Data-Engineering, 10Product-Analytics, 10Research: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10mpopov) @kzimmerman to set priority after checking in with @leila & @Miriam [17:34:20] !log deploy fix for airflow druid_load_banner_activity jobs [17:34:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:34:44] !log Stop, delete then restart airflow druid_load_banner_activity jobs [17:34:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:40:15] joal: you looking at that failed monthly pageviews druid load job? [17:40:35] milimetric: not yet, currently finalizing my deploy - will do soon [17:40:36] I tried to find something quickly to help but there was nothing obvious - the payload seems to have everything spelled correctly and all that [17:40:45] k, lemme know if you want a rubber dukc [17:49:13] milimetric: the payload of the druid indexation task is wrong [17:52:57] actually I spoke too fast milimetric - my bas [17:59:43] !log rerun druid_load_pageviews_daily_aggregated_monthly [17:59:43] Schedule: @monthly info Next Run: 2023-05-01, 00:00:00 [17:59:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:00:05] I'll be monitoring the job [18:07:00] I couldn't see the full error message in Druid console, is there another way? [18:07:36] I now have an error in the druid console for the new task [18:07:40] milimetric: --^ [18:07:47] Issue with parsing timestamp :( [18:08:37] I think I know why :) [18:08:50] bummer and yay :) [18:11:58] (03PS1) 10Joal: Fix pageview_monthly HQL for druid loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/920361 [18:12:03] milimetric: --^ [18:13:35] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:14:02] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix pageview_monthly HQL for druid loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/920361 (owner: 10Joal) [18:14:26] Arg, it looked weird to my brain but I blocked it out! Nice find [18:15:06] milimetric: is it ok to wait for next dpeloy defor restarting, or are we waiting for real data? [18:15:37] I think it's fine, this is just a recompaction anyway [18:15:57] Ok, I'm adding this to the deploy for next week [18:15:58] just don't know how we'd remember to restart [18:16:04] Oh, that :) [18:19:35] milimetric: I created that ticket - https://phabricator.wikimedia.org/T336798 [18:19:43] and put it in ready to deploy, so that we don't foget [18:46:44] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10gmodena) > Checkpointing to Swift (S3 protocol) has been enabled. Here's a summary of a [[ https://gerrit.wikimedia.or... [20:18:13] (DiskSpace) firing: Disk space stat1008:9100:/srv 5.992% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [20:45:27] ACKNOWLEDGEMENT - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T336814 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:53:11] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10Ottomata) +1! And yes stick with the same name in DSE, even though the namespace is stream-enrichment-poc there. We'll... [20:56:34] 10Data-Engineering, 10Data-Engineering-Wikistats: Wikistats 2 should translate month names and abbreviations - https://phabricator.wikimedia.org/T336815 (10Milimetric) [20:56:54] 10Data-Engineering-Planning, 10Data-Engineering-Wikistats, 10Data Pipelines: Wikistats in Uzbek - https://phabricator.wikimedia.org/T314477 (10Milimetric) @Nataev: hm, that's not good. The date translations come from a different library that's not using translatewiki for translations. But it has some basic... [21:19:14] 10Data-Engineering-Planning, 10Epic, 10Event-Platform Value Stream (Sprint 14 A): Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) [21:19:27] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) [21:22:03] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) [21:26:47] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) [21:27:22] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) p:05Triage→03High [21:30:29] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: Release mediawiki.page_change.v1 stream - https://phabricator.wikimedia.org/T336817 (10Ottomata) [21:33:35] 10Data-Engineering-Planning, 10Epic, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: Deploy mediawiki-page-content-change-enrichment to wikikube k8s - https://phabricator.wikimedia.org/T325303 (10Ottomata) [21:35:41] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Consider supporting incompatible schema changes in Event Platform streams - https://phabricator.wikimedia.org/T316288 (10Ottomata) [21:35:46] 10Data-Engineering, 10Metrics-Platform-Planning, 10Product-Analytics, 10WMF-Architecture-Team, and 2 others: Major (API) versioning of Event Platform streams - https://phabricator.wikimedia.org/T332212 (10Ottomata) [21:37:36] 10Data-Engineering-Planning, 10Event-Platform Value Stream: [Shared Event Platform] Implement error handling and retry logic when fetching data from the MW api - https://phabricator.wikimedia.org/T309699 (10Ottomata) @gmodena can we close this task? [21:37:53] 10Data-Engineering-Planning, 10Event-Platform Value Stream: [NEEDS GROOMING] Improve reliability of simple stateless services - https://phabricator.wikimedia.org/T322125 (10Ottomata) @gmodena can we close this task? [21:38:43] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Epic: [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content - https://phabricator.wikimedia.org/T307959 (10Ottomata) [21:38:45] 10Data-Engineering-Planning, 10Epic, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: Deploy mediawiki-page-content-change-enrichment to wikikube k8s - https://phabricator.wikimedia.org/T325303 (10Ottomata) [21:58:13] (DiskSpace) resolved: Disk space stat1008:9100:/srv 5.745% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:13:35] (SystemdUnitFailed) firing: (19) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:23:52] 10Data-Engineering, 10Product-Analytics (Kanban): Model impact of User-Agent deprecation on top line metrics - https://phabricator.wikimedia.org/T336084 (10Mayakp.wiki) ah yes! we dont have data beyond 90 days and so I re-ran your analysis for hour on a day in Feb 13, 2023 and didnt find any differences from w... [23:40:32] 10Analytics, 10API Platform (AQS 2.0 Roadmap), 10Documentation, 10Epic, and 2 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [23:50:15] 10Analytics, 10API Platform (AQS 2.0 Roadmap), 10Documentation, 10Epic, and 2 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin)