[00:00:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:25] (03CR) 10Neil P. Quinn-WMF: [C: 03+2] "Self-merging, since this only changes the documentation in description fields." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/851735 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [00:02:26] (03CR) 10Neil P. Quinn-WMF: [C: 03+2] Revise Wikistories schema documentation (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/851735 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [00:02:50] (03Merged) 10jenkins-bot: Revise Wikistories schema documentation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/851735 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [00:05:55] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:09:51] (03CR) 10Neil P. Quinn-WMF: [C: 03+2] "Oh, wait, I guess I do have permissions 😂" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/851735 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [00:45:33] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:51:33] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:15:08] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:21:05] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:00:45] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:06:43] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:30:37] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:36:37] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:00:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:06:19] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:45:31] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:51:21] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:15:11] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:21:05] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:00:39] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:06:39] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:30:05] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:33:07] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:59:13] 10Data-Engineering: Check home/HDFS leftovers of ejoseph - https://phabricator.wikimedia.org/T322182 (10MoritzMuehlenhoff) [08:06:18] (DruidSegmentsUnavailable) firing: More than 10 segments have been unavailable for mediawiki_history_reduced_2022_10 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [08:26:18] (DruidSegmentsUnavailable) resolved: More than 10 segments have been unavailable for mediawiki_history_reduced_2022_10 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [10:15:10] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:19:40] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:45:22] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:49:56] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:59:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp3050 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp3050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:00:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:04:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp3050 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp3050%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:06:23] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:15:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2027%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:20:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2027%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:25:57] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp1075 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:30:42] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp1075 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [11:45:36] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:51:28] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:30:05] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:36:06] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:56:51] 10Data-Engineering-Planning, 10Data Pipelines, 10Infrastructure-Foundations, 10Shared-Data-Infrastructure: Also intake Network Error Logging events into the Analytics Data Lake - https://phabricator.wikimedia.org/T304373 (10jbond) [13:00:13] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:05:45] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, 10User-fgiunchedi: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10jbond) [13:06:15] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:30:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:36:03] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:37:23] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10KCVelaga_WMF) @ntsako After reviewing the input metrics, the comparisons are here: https://docs.google.com/spreadsheets/d/1smlxmLZN3igND0vW1Zhsr5BRnXgWxx_zbrd5rxMhkqc/edit#gid=75628010&range=A1:AC69, wh... [13:56:56] ottomata: mind deploying this aqs snapshot? We're fresh out of SREs at the moment https://gerrit.wikimedia.org/r/c/operations/puppet/+/852197 [13:57:15] https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS#Deploy_new_History_snapshot_for_Wikistats_Backend [13:57:21] 10Data-Engineering-Planning, 10API Platform: Establish testing procedure for Druid-based endpoints - https://phabricator.wikimedia.org/T311190 (10VirginiaPoundstone) [13:57:23] 10Data-Engineering-Planning, 10API Platform (Sprint 00): Review testing procedure for Druid-based endpoints - https://phabricator.wikimedia.org/T321727 (10VirginiaPoundstone) 05Resolved→03Invalid [14:02:14] joal: hive job consult if you have a minute? [14:03:07] pageview monthly dump failed this month, and again when I reran it (it lost the logs so I had to rerun). It was running out of memory so I tried it with: [14:03:10] https://www.irccloud.com/pastebin/Ieomtsra/ [14:03:22] and it worked. So two questions [14:04:13] 1. should we update the oozie job to increase the memory to that or try something lower (job takes 6 hours) [14:04:13] 2. it seemed to fail way faster than 6 hours, which makes me feel like how did it ever work with the default memory... [14:12:05] hi milimetric - sorry I was in meeting, I'm available now [14:12:43] Starting build #15 for job wikimedia-event-utilities-maven-release-docker [14:12:47] hm [14:14:00] I had forgotten about that job [14:15:43] Project wikimedia-event-utilities-maven-release-docker build #15: 09SUCCESS in 3 min 0 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/15/ [14:39:49] joal: pageview_actor failed too, also nonsense errors: https://hue.wikimedia.org/hue/jobbrowser#!id=job_1663082229270_289975 [14:40:04] (failed on rerun too) [14:40:35] I'm trying it manually to see more details [14:41:17] meh [14:43:35] milimetric: do you wish to batcave? [14:45:26] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:46:45] oh, Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/analytics/refinery/core/ActorSignatureGenerator [14:46:45] at org.wikimedia.analytics.refinery.hive.GetActorSignatureUDF.(GetActorSignatureUDF.java:50) [14:47:20] WUT? [14:47:35] wow that's unexpected :( [14:48:12] joal: to the batcave! :) [14:48:24] OMW! [14:51:26] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:54:40] milimetric: ok, shoudl be on canary, can you verify? [14:55:39] 10Data-Engineering, 10Equity-Landscape: Population input metrics - https://phabricator.wikimedia.org/T309279 (10JAnstee_WMF) p:05Triage→03Medium [14:56:29] 10Data-Engineering, 10Equity-Landscape: Population input metrics - https://phabricator.wikimedia.org/T309279 (10JAnstee_WMF) a:05ntsako→03JAnstee_WMF [14:58:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp3051 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp3051%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:03:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp3051 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=esams%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp3051%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [15:03:18] looking good on aqs1010 ottomata, thank you! [15:15:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:21:16] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:33:21] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10mforns) Thank you @mpopov!! [15:43:44] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10mforns) And thank you @Miriam as well! I saw your team added a couple jobs. Would that be all? If so I will close the list :-) [15:45:05] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:46:31] 10Data-Engineering, 10Product-Analytics: Presto returns incorrect data for an added field - https://phabricator.wikimedia.org/T321960 (10mforns) Hm, this seems something we should prioritize... Let's bring this out next Monday for our sprint planning! [15:50:59] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:00:05] milimetric: all done! [16:15:01] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:21:01] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:29:34] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10Miriam) Thank you @mforns for the ping! We are almost there, could you give us another 24 hours? [16:31:01] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10mforns) Of course @Miriam! just wanted to know whether those were all. Let me know if I can help! Cheers [16:51:09] 10Analytics-Radar, 10Ganeti, 10Infrastructure-Foundations, 10netops: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10jbond) [16:55:01] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 03), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10EChetty) [16:55:45] 10Analytics-Radar, 10Machine-Learning-Team, 10serviceops: Using docker in WMF production network outside of kubernetes - https://phabricator.wikimedia.org/T275551 (10jbond) [16:56:46] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10EChetty) [16:56:51] (03CR) 10Snwachukwu: [WIP] Add Custom Authentication Configuration Class for Cassandra. (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/851077 (https://phabricator.wikimedia.org/T306895) (owner: 10Snwachukwu) [16:59:23] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Data Pipelines (Sprint 03): Create Plan for Spark 2 Deprecation - https://phabricator.wikimedia.org/T318367 (10mforns) [16:59:33] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: Monthly pageview stats for October 2022 missing - https://phabricator.wikimedia.org/T322239 (10Radim.kubacki) [17:01:25] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 03), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) Release of `conda-analytics` done via T321736. For testing on the [[ https://wikitech.wik... [17:07:12] 10Data-Engineering, 10MediaWiki-Core-Hooks, 10Event-Platform Value Stream (Sprint 07): Add $comment and $performer to ArticleRevisionVisibilitySet params - https://phabricator.wikimedia.org/T321411 (10lbowmaker) [17:13:09] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Technical-Debt: Create a dashboard from the Dataset extracted from the HDFS FsImage dataset - https://phabricator.wikimedia.org/T321169 (10Antoine_Quhen) [17:13:46] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Technical-Debt: Create a dashboard from the Dataset extracted from the HDFS FsImage - https://phabricator.wikimedia.org/T321169 (10Antoine_Quhen) [17:23:37] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 03), 10Technical-Debt: Create a dashboard from the fsImage Dataset extracted from the HDFS FsImage - https://phabricator.wikimedia.org/T321169 (10EChetty) [17:32:57] mforns: heya - would you have a minute for me? [17:33:13] joal: yes ofc! batcave? [17:33:19] ! [17:33:22] OMW! [17:34:09] joal: argh! I lost my batcave link... [17:34:20] huhuhu - https://meet.google.com/rxb-bjxn-nip [17:34:51] mforns: --^ [17:34:58] thanks! [18:15:18] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:19:43] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:27:03] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics, 10Data Pipelines (Sprint 03), 10Patch-For-Review: Add support for jupyterhub on conda-analytics - https://phabricator.wikimedia.org/T321088 (10xcollazo) [18:30:48] (03CR) 10Joal: "Mostly comments on comments" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/850169 (https://phabricator.wikimedia.org/T321167) (owner: 10Aqu) [18:32:25] 10Analytics-Radar, 10Ganeti, 10Infrastructure-Foundations, 10netops: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) Removing the Ganeti tag, this is unrelated to Ganeti and only caused by ifupdown (and will eventually be solved by s... [18:32:35] 10Analytics-Radar, 10Infrastructure-Foundations, 10netops: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (10MoritzMuehlenhoff) [18:33:11] milimetric: I think we've forgotten to add the _SUCCESS file to the pageview-actor table, and that leads to follow jobs being stuck [18:45:14] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:51:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:00:17] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:03:23] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:11:57] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10JAnstee_WMF) @ntsako and @KCVelaga_WMF reviewed each of these points: > # `total_annual_grants_presence_weighted` and `total_historical_grants_presence_weighted` both seem to be weighted with the same va... [19:15:01] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:19:31] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:21:33] (03CR) 10Joal: "A bunch of comments - the first draft looks good already :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/851077 (https://phabricator.wikimedia.org/T306895) (owner: 10Snwachukwu) [19:52:36] (03PS1) 10Neil P. Quinn-WMF: Reformat descriptions to avoid weird wrapping [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/852279 (https://phabricator.wikimedia.org/T312262) [20:04:01] (03PS2) 10Neil P. Quinn-WMF: Reformat descriptions to avoid weird wrapping [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/852279 (https://phabricator.wikimedia.org/T312262) [20:05:48] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 04), 10Patch-For-Review: [Shared Event Platform] Produce new mediawiki.page-change stream from MediaWiki EventBus - https://phabricator.wikimedia.org/T311129 (10Mayakp.wiki) >>! In T311129#8360136, @Ottomata wrote: > We are live in testwiki!... [20:17:15] 10Data-Engineering-Planning, 10Data Pipelines: Convert to pure Docker the gitlab CI pipeline to build debianized conda - https://phabricator.wikimedia.org/T315475 (10xcollazo) [20:19:52] 10Data-Engineering-Planning, 10Data Pipelines: Convert to pure Docker the gitlab CI pipeline to build debianized conda - https://phabricator.wikimedia.org/T315475 (10xcollazo) 05duplicate→03Open Whoops! I though this task referred to `conda-analytics` but in fact is for `airflow-dags`. Sorry, reopening... [20:24:56] 10Data-Engineering-Planning, 10Data Pipelines: Optimize spark3 conda deb generation - https://phabricator.wikimedia.org/T315478 (10xcollazo) 05Open→03Resolved a:03xcollazo In T321736, we moved to a pure `Dockerfile` approach, and we now do this optimization here: https://gitlab.wikimedia.org/repos/data-e... [20:30:20] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:33:51] (03CR) 10Neil P. Quinn-WMF: [C: 03+2] "Self-merging, since this only affects documentation in the description fields." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/852279 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [20:34:34] (03Merged) 10jenkins-bot: Reformat descriptions to avoid weird wrapping [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/852279 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [20:36:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:47:02] hiii all.... hey quick question... is there a way for me to delete data from a table under my own schema in Hive? [20:47:13] hive (default)> delete from andyrussg.prophet_trends_pageviews_20221101 where segment_id = 'R:Middle East & North Africa'; [20:47:16] FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. [20:47:49] (just need to delete that bit since it failed partway through processing, to re-run) [20:48:50] the bit I want to delete is in can be selected via a column that the table is partitioned by, so maybe I should just delete the corresponding directory via the underlying FS? [21:27:46] (also asked the above question on Slack, apologies for cross-posting!) [22:30:54] (03PS1) 10Neil P. Quinn-WMF: Retain hashed Wikistories contribution_attempt_id [analytics/refinery] - 10https://gerrit.wikimedia.org/r/852308 (https://phabricator.wikimedia.org/T317934) [22:39:56] (03CR) 10Neil P. Quinn-WMF: "contribution_attempt_id will be a very short-lived identifier (created when a user opens the story builder and discarded when they leave i" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/852308 (https://phabricator.wikimedia.org/T317934) (owner: 10Neil P. Quinn-WMF) [22:43:36] ^ above question was answered on Slack, #data-engineering... thx!!! [23:00:36] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:06:26] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:09:52] (03PS1) 10Aqu: Create dataset from HDFS fsimage.xml [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/852315 (https://phabricator.wikimedia.org/T321168) [23:45:34] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:51:24] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state