[07:49:35] btullis: o/ [07:50:01] I saw what you wrote about log4j errors, and I noticed this in the changelog for 2.10.2 - https://issues.apache.org/jira/browse/HADOOP-18088 [07:50:42] so I am wondering if the log4j dependency was implicitly brought by hadoop packages [07:51:32] mmm even if the errors are related to a refinery dep, org/wikimedia/analytics/refinery/core/LogHelper [07:51:38] very weird [09:10:11] elukey: Thanks for looking into it. I'm still a bit baffled, I'm sorry to say. I should have checked that all of those refine jobs were green on icinga before pushing out the update on the test cluster. [09:12:39] I wonder if I should roll back to 2.10.1 on the test cluster, to see if I can rule out the hadoop upgrade as the cause. [09:35:17] !log restarting hive-server2 and hive-metastore on an-test-coord1001 [09:35:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:39:23] Aha! Restarting the hive services on an-test-coord1001 has helped with at least one of the failed jobs on the test cluster. I was testing with `refinery-drop-older-than` and the error I was getting was: [09:40:37] `SHOW TABLES failed with error code: 64` - After restarting Hive it proceeded with a clean, dry-run. [09:42:37] !log restarted drop_event.service on an-test-coord1001 [09:42:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:44:53] !log restarting refinery-drop-webrequest-refined-partitions.service on an-test-coord1001 [09:44:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:46:14] !log restarting refinery-drop-webrequest-raw-partitions.service on an-test-coord1001 [09:46:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:50:38] !log roll-restarting hadoop workers on the test cluster. [09:50:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:11:17] Some of the refine jobs on the test cluster appear now to be failing with: `java.lang.NoClassDefFoundError: io/circe/Decoder` [10:11:29] Does this ring any bells with anyone? [10:13:35] so the LogHelper one is gone? [10:13:45] or it is another issue? [10:16:15] No, it's another one. The LogHelper is coming from the `monitor_refine_*` jobs I think. The circe/Decoder is coming from the refine jobs. [10:19:40] Do we need a new refinery deploy on the test cluster, I wonder? [10:23:45] ah lovely [10:24:05] in theory no, it should be sufficient the last one [10:24:33] mmm but I am wondering if we reference specific hadoop jars in refiner [10:26:02] btullis: in refinery source we have hadoop 2.10.1 in the pom.xml [10:26:31] Aha! Thanks. [10:27:09] and it is used in various places, so mayyybeee there is something going on with deps [10:28:20] (have to go now but I'll check later :) [10:28:34] elukey: Thanks so much <3 [10:28:40] <3 [10:36:33] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:41:05] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:46:35] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:52:35] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:46:29] just passing by: in refinery those hadoop jars should be set with scope 'provided', meaning that jars are not actually bundled within our ars [13:00:27] (03PS1) 10Gerrit maintenance bot: Add bjn.wiktionary to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/811709 (https://phabricator.wikimedia.org/T312215) [14:07:57] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:09:38] (03PS1) 10Btullis: Bump hadoop minor version for CVE-2021-33036 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811727 (https://phabricator.wikimedia.org/T311807) [14:10:13] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:10:33] I have added the following CR for refinery-source: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/811727 [14:15:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:21:45] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:31:40] 10Analytics-Wikistats, 10Data-Engineering: Wikistats New Feature - bot edits / new articles - https://phabricator.wikimedia.org/T241922 (10JArguello-WMF) [14:31:57] 10Analytics-Wikistats, 10Data-Engineering: [Wikistats2] Normalize pageviews per country by population - https://phabricator.wikimedia.org/T242621 (10JArguello-WMF) [14:32:01] 10Analytics-Wikistats, 10Data-Engineering: Incorporate Erik's great work on WiViVi into Wikistats 2 - https://phabricator.wikimedia.org/T229665 (10JArguello-WMF) [14:32:42] 10Analytics-Wikistats, 10Data-Engineering: Make MapChart load async instead of parts of MapChart - https://phabricator.wikimedia.org/T229382 (10JArguello-WMF) [14:32:58] 10Analytics-Wikistats, 10Data-Engineering: Add an option to export the current graph into image file - https://phabricator.wikimedia.org/T219969 (10JArguello-WMF) [14:33:15] 10Analytics-Wikistats, 10Data-Engineering: Serve legacy code only to legacy browsers - https://phabricator.wikimedia.org/T207311 (10JArguello-WMF) [14:34:26] 10Analytics-Wikistats, 10Data-Engineering: Long annotations text being clipped - https://phabricator.wikimedia.org/T218846 (10JArguello-WMF) [14:34:42] 10Analytics-Wikistats, 10Data-Engineering: Under construction page in wikistats to take site down - https://phabricator.wikimedia.org/T192847 (10JArguello-WMF) [14:35:07] 10Analytics-Wikistats, 10Data-Engineering: Render dashboard graphs with Canvas instead of SVG - https://phabricator.wikimedia.org/T224871 (10JArguello-WMF) [14:35:36] 10Analytics-Wikistats, 10Data-Engineering: Wikistats2 metric: top article creators - https://phabricator.wikimedia.org/T210423 (10JArguello-WMF) [14:36:53] 10Analytics-Wikistats, 10Data-Engineering, 10Platform Engineering (Icebox): Annotations in wikistats that are only visible on "all" time range get bundled up (probably an issue we cannot resolve until we have a more granular time range) - https://phabricator.wikimedia.org/T200020 (10JArguello-WMF) [14:40:01] 10Analytics-Wikistats, 10Data-Engineering: Bug when toggling Chrome mobile view - https://phabricator.wikimedia.org/T217559 (10JArguello-WMF) [14:50:31] 10Analytics-Wikistats, 10Data-Engineering: [Wikistats 2] Provide metric area headings in the 'Explore Topics' dropdown - https://phabricator.wikimedia.org/T200498 (10JArguello-WMF) [14:50:53] 10Analytics, 10Data-Engineering-Icebox: WikiStats should recognize global bots - https://phabricator.wikimedia.org/T37196 (10JArguello-WMF) [14:51:01] 10Data-Engineering-Icebox: WikiStats should recognize global bots - https://phabricator.wikimedia.org/T37196 (10JArguello-WMF) [14:51:33] 10Analytics-Wikistats, 10Data-Engineering: Improve Annotations on Wikistats - https://phabricator.wikimedia.org/T207057 (10JArguello-WMF) [14:51:38] 10Analytics-Wikistats, 10Data-Engineering: Basic Wiki Numbers - https://phabricator.wikimedia.org/T224722 (10JArguello-WMF) [14:53:05] 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120 (10JArguello-WMF) [14:53:19] 10Data-Engineering-Kanban, 10Event-Platform, 10Wikidata, 10Wikidata-Campsite, and 3 others: Migrate WikibaseTermboxInteraction EventLogging Schema to new EventPlatform thingy - https://phabricator.wikimedia.org/T290303 (10JArguello-WMF) [14:54:53] 10Data-Engineering, 10Event-Platform, 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10JArguello-WMF) [14:55:06] 10Data-Engineering, 10Event-Platform, 10Product-Analytics: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10JArguello-WMF) [14:55:14] 10Data-Engineering, 10Event-Platform, 10Patch-For-Review: Refine drops $schema field values - https://phabricator.wikimedia.org/T255818 (10JArguello-WMF) [14:56:31] 10Data-Engineering-Icebox, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10JArguello-WMF) [15:03:52] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:04:28] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:14:20] 10Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 10QuickSurveys, 10Readers-Web-Backlog, 10WMDE-TechWish: QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10JArguello-WMF) [15:15:12] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:15:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:16:13] 10Data-Engineering, 10Event-Platform, 10Metrics-Platform, 10Goal: BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system - https://phabricator.wikimedia.org/T259157 (10JArguello-WMF) [15:17:40] (03CR) 10Joal: [C: 03+1] "I'm not sure I understand how this was breaking, (I have an assumption but I'd love an explanation :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811325 (https://phabricator.wikimedia.org/T310542) (owner: 10Aqu) [15:17:50] 10Analytics-Wikistats, 10Data-Engineering, 10Internet-Archive: Allow WikiStats2 to be archived by the wayback machine - https://phabricator.wikimedia.org/T206836 (10JArguello-WMF) [15:18:02] 10Analytics-Wikistats, 10Data-Engineering: Gather all constants related to mobile/responsiveness in config - https://phabricator.wikimedia.org/T190339 (10JArguello-WMF) [15:18:16] 10Analytics-Wikistats, 10Data-Engineering: We need better UI addressing when are metrics publicly available - https://phabricator.wikimedia.org/T226403 (10JArguello-WMF) [15:18:18] 10Analytics-Wikistats, 10Data-Engineering: [Wikistats] The permanent link is broken - https://phabricator.wikimedia.org/T245445 (10JArguello-WMF) [15:18:42] 10Analytics-Wikistats, 10Data-Engineering: Add flagged revision status statistics to Wikistats 2.0 - https://phabricator.wikimedia.org/T177951 (10JArguello-WMF) [15:19:16] (03CR) 10Joal: "I'm interested to see if this unbreaks our jobs - All our hadoop deps in refinery are in 'provided' scope, so this shouldn't change anythi" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811727 (https://phabricator.wikimedia.org/T311807) (owner: 10Btullis) [15:19:54] (03CR) 10Joal: [C: 03+1] "In any case we should merge this, as our version will bump :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811727 (https://phabricator.wikimedia.org/T311807) (owner: 10Btullis) [15:20:05] 10Analytics-Wikistats, 10Data-Engineering: Allow namespace selection on Top Viewed Articles - https://phabricator.wikimedia.org/T182964 (10JArguello-WMF) [15:20:35] 10Analytics-Wikistats, 10Data-Engineering: Wikistats2 and SEO - https://phabricator.wikimedia.org/T192172 (10JArguello-WMF) [15:20:53] 10Analytics-Wikistats, 10Data-Engineering, 10Product-Analytics: Contribution inequality graphs for Wikistats - https://phabricator.wikimedia.org/T195033 (10JArguello-WMF) [15:21:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:21:09] 10Analytics-Wikistats, 10Data-Engineering: [Wikistats Design] integrate browser dashboard data into wikistats - https://phabricator.wikimedia.org/T198333 (10JArguello-WMF) [15:21:26] 10Analytics-Wikistats, 10Data-Engineering: Easter Egg: wikistats classic style on wikistats 2.0 - https://phabricator.wikimedia.org/T177408 (10JArguello-WMF) [15:21:38] 10Analytics-Wikistats, 10Data-Engineering: Wikistats Bug - View numbers contradictory - https://phabricator.wikimedia.org/T261565 (10JArguello-WMF) [15:21:57] 10Analytics-Wikistats, 10Data-Engineering: Increase topojson resolution: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10JArguello-WMF) [15:22:04] 10Analytics-Wikistats, 10Data-Engineering, 10Patch-For-Review: Improve scoping of CSS - https://phabricator.wikimedia.org/T190915 (10JArguello-WMF) [15:22:52] 10Analytics-Wikistats, 10Data-Engineering: Split wikistats metrics out by namespace - https://phabricator.wikimedia.org/T275466 (10JArguello-WMF) [15:23:21] 10Analytics-Wikistats, 10Data-Engineering: Wikimedia Statistics - Horizontal (time) axis wrongly formatted when the option "Monthly" is choosen - https://phabricator.wikimedia.org/T290551 (10JArguello-WMF) [15:23:26] 10Analytics-Wikistats, 10Data-Engineering: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10JArguello-WMF) [15:23:28] 10Analytics-Wikistats, 10Data-Engineering: Add "Top used photos" metric - https://phabricator.wikimedia.org/T220485 (10JArguello-WMF) [15:23:41] 10Analytics-Wikistats, 10Data-Engineering: wikistats , move to webpack 5 - https://phabricator.wikimedia.org/T188759 (10JArguello-WMF) [15:24:07] 10Analytics-Wikistats, 10Data-Engineering: Siteviews of all Wikipedias per month - https://phabricator.wikimedia.org/T224963 (10JArguello-WMF) [15:24:22] 10Analytics-Wikistats, 10Data-Engineering: Annotations in wikistats2 can't be split on project and language - https://phabricator.wikimedia.org/T208665 (10JArguello-WMF) [15:27:04] 10Analytics, 10Data-Engineering-Icebox: Release a public dataset about percentage of referrers in wikipedia traffic - https://phabricator.wikimedia.org/T250840 (10JArguello-WMF) [15:27:32] 10Analytics, 10Patch-For-Review: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (10Ottomata) FYI! Just noticed that we closed this task without really undoing. This setting was applied in the test cluster, and I was about to appl... [15:28:38] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:30:04] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:30:32] 10Analytics, 10Data-Engineering-Icebox: Release a public dataset about percentage of referrers in wikipedia traffic - https://phabricator.wikimedia.org/T250840 (10JArguello-WMF) This was initially created as a task for mentoring. Data Engineering is still working to define a more formal mentoring workflow that... [15:30:44] 10Data-Engineering-Icebox: Release a public dataset about percentage of referrers in wikipedia traffic - https://phabricator.wikimedia.org/T250840 (10JArguello-WMF) [15:31:46] 10Analytics-Wikistats, 10Data-Engineering: Implement inequality metrics for WikiStats - https://phabricator.wikimedia.org/T248964 (10JArguello-WMF) [15:33:00] (03CR) 10Elukey: "The change looks good to me! I have some comments/doubts to raise (just for awareness):" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811727 (https://phabricator.wikimedia.org/T311807) (owner: 10Btullis) [15:34:48] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:35:02] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:36:43] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/811709 (https://phabricator.wikimedia.org/T312215) (owner: 10Gerrit maintenance bot) [15:37:36] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:45:06] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:49:11] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:49:23] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:49:52] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:49:59] 10Analytics-Wikistats, 10Data-Engineering: Metrics tooltip in detail page is not localized - https://phabricator.wikimedia.org/T287908 (10JArguello-WMF) [15:50:04] 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [15:52:42] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:57:54] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Wikimedia Statistics - Horizontal (time) axis wrongly formatted when the option "Monthly" is choosen - https://phabricator.wikimedia.org/T290551 (10JArguello-WMF) [15:58:39] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering, 10I18n: Fixed time range names are forced to capitalized regardless of locale in sidebar - https://phabricator.wikimedia.org/T287910 (10JArguello-WMF) [16:00:14] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:07:44] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:15:14] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:22:48] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:30:22] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:37:52] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:40:22] (03CR) 10Michael Große: "I _think_ this should mostly work now 🤞" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/811305 (https://phabricator.wikimedia.org/T304793) (owner: 10Michael Große) [16:45:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:48:23] (03CR) 10Aqu: Fix done file path in HDFSArchiver (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811325 (https://phabricator.wikimedia.org/T310542) (owner: 10Aqu) [16:49:59] (03CR) 10Joal: [C: 03+1] Fix done file path in HDFSArchiver (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811325 (https://phabricator.wikimedia.org/T310542) (owner: 10Aqu) [16:53:04] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:03:34] 10Data-Engineering, 10MediaViewer, 10MediaWiki-extensions-EventLogging, 10MW-1.39-notes (1.39.0-wmf.18; 2022-06-27): Decommission the MediaViewer and MultimediaViewer* instruments - https://phabricator.wikimedia.org/T310890 (10phuedx) [17:04:32] 10Data-Engineering: Drop MediaViewer and MultimediaViewer* tables - https://phabricator.wikimedia.org/T311229 (10phuedx) [17:04:36] 10Analytics-Kanban, 10Data-Engineering, 10Event-Platform, 10Fundraising-Backlog, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [17:04:42] 10Data-Engineering, 10MediaViewer, 10MediaWiki-extensions-EventLogging, 10MW-1.39-notes (1.39.0-wmf.18; 2022-06-27): Decommission the MediaViewer and MultimediaViewer* instruments - https://phabricator.wikimedia.org/T310890 (10phuedx) 05Open→03Resolved a:03phuedx Many thanks to @Krinkle for the revie... [17:13:22] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "This is good, I'll merge (we don't have auto-verification on this project so we +2 verify manually)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/811305 (https://phabricator.wikimedia.org/T304793) (owner: 10Michael Große) [17:16:49] I've done a bit more work and paired with ottomata on the test-cluster job failures. It's currently down to this: [17:16:53] https://usercontent.irccloud-cdn.com/file/a0pqOp5F/image.png [17:17:20] The refine jobs are all failing on the `java.lang.NoClassDefFoundError: io/circe/Decoder` error. [17:19:57] THanks for the heads up btullis - Any news on oozie? [17:32:13] 10Analytics, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible: 502, connect failed for intake-analytics.wikimedia.beta.wmflabs.org (Mar 2022) - https://phabricator.wikimedia.org/T303160 (10AlexisJazz) [17:35:45] 10Analytics, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10User-Urbanecm: 502, connect failed for intake-analytics.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T289029 (10AlexisJazz) [17:42:08] 10Analytics-Wikistats, 10Data-Engineering: Annotations in wikistats that are only visible on "all" time range get bundled up (probably an issue we cannot resolve until we have a more granular time range) - https://phabricator.wikimedia.org/T200020 (10JArguello-WMF) [17:49:45] (03PS1) 10Ottomata: WIP - include missing dependencies [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811758 (https://phabricator.wikimedia.org/T311807) [17:50:52] (03CR) 10CI reject: [V: 04-1] WIP - include missing dependencies [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811758 (https://phabricator.wikimedia.org/T311807) (owner: 10Ottomata) [17:57:34] !log upgrading presto to 0.273.3 in analytics cluster - T311525 [17:57:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:57:37] T311525: Upgrade to latest PrestoDB and enable iceberg support - https://phabricator.wikimedia.org/T311525 [18:09:48] !log enabling iceberg hive catalog connector on analytics_cluster presto [18:09:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:17:06] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 01), 10Patch-For-Review: Upgrade to latest PrestoDB and enable iceberg support - https://phabricator.wikimedia.org/T311525 (10Ottomata) Woo hoo! ` presto> show catalogs; Catalog ------------------- analytics_hive analytics_iceberg syste... [18:30:13] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:33:15] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:00:07] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:07:17] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:14:37] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10EBernhardson) I would also love to see this come back, I have old dashboards that no longer work because templating was turned off. Looking things o... [19:15:29] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:29:21] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:00:19] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:07:35] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:15:03] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:20:56] (03CR) 10Kosta Harlan: [C: 03+1] Add analytics/mediawiki/editgrowthconfig [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811357 (https://phabricator.wikimedia.org/T312148) (owner: 10Urbanecm) [20:28:53] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:43:51] (03CR) 10Kosta Harlan: [C: 03+2] Add analytics/mediawiki/editgrowthconfig [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811357 (https://phabricator.wikimedia.org/T312148) (owner: 10Urbanecm) [20:44:26] (03Merged) 10jenkins-bot: Add analytics/mediawiki/editgrowthconfig [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811357 (https://phabricator.wikimedia.org/T312148) (owner: 10Urbanecm) [20:58:17] 10Analytics-Wikistats, 10Data-Engineering: Siteviews of all Wikipedias per month - https://phabricator.wikimedia.org/T224963 (10Perohanych) a:03Perohanych I was promised that the monthly list of wikipedias, arranged by number of monthly views, would be ready in 2020. I have arranged the list manually, but... [20:59:17] 10Analytics-Wikistats, 10Data-Engineering: Siteviews of all Wikipedias per month - https://phabricator.wikimedia.org/T224963 (10Perohanych) p:05Medium→03High [21:05:23] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10mpopov) Thank you @EBernhardson!! And good find! ----- Also, https://apache.github.io/superset/sqllab.html has this as an example: ` SELECT * FRO... [21:15:31] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:22:49] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:28:54] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-Focus-Sprint): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10JArguello-WMF) 05Open→03Resolved [21:30:05] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:35:50] 10Data-Engineering, 10Event-Platform, 10Platform Engineering Roadmap Decision Making, 10Platform Team Workboards (S&F Workboard): Need for new event-type - `user_create` and `user_rename` - https://phabricator.wikimedia.org/T262205 (10JArguello-WMF) [21:37:17] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:40:05] 10Data-Engineering: Investigate showing realtime the eventlogging banner stream (currently sampled at 1%) - https://phabricator.wikimedia.org/T255446 (10JArguello-WMF) [21:41:26] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10EBernhardson) I suspect the `latest_partition` function will come for free, here is a query that i'm hoping will start working again with templating... [21:43:40] 10Data-Engineering: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10JArguello-WMF) [21:44:45] 10Data-Engineering, 10Patch-For-Review: Use types in Analytics Puppet classes/profiles/etc.. - https://phabricator.wikimedia.org/T252617 (10JArguello-WMF) [21:46:40] 10Analytics, 10Data-Engineering, 10Event-Platform, 10serviceops: eventgate helm chart should use common_templates _tls_helpers.tpl instead of its own custom copy - https://phabricator.wikimedia.org/T291504 (10JArguello-WMF) @Ottomata Should we remove this task from Analytics to Data Engineering? [21:49:21] 10Data-Engineering, 10Event-Platform, 10SRE, 10serviceops, 10Patch-For-Review: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10JArguello-WMF) [21:49:36] 10Data-Engineering, 10SRE, 10Traffic-Icebox: varnishkafka / ATSkafka should support setting the kafka message timestamp - https://phabricator.wikimedia.org/T277553 (10JArguello-WMF) [21:49:46] 10Data-Engineering, 10SRE: Downloading from Archiva.wikimedia.org seems slower than Maven Central - https://phabricator.wikimedia.org/T273086 (10JArguello-WMF) [22:00:15] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:14:11] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:00:29] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:07:51] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:15:11] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:22:31] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:45:35] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:59:27] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers