[00:01:20] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:54] RECOVERY - Check unit status of drop_event on an-launcher1002 is OK: OK: Status of the systemd unit drop_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:16:19] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: drop_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:19:08] PROBLEM - Check unit status of drop_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:34:09] !log kill leftover process of bmansurov on an-airflow1002 to allow user cleanup via puppet [06:34:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:37:39] !log kill leftover process of bmansurov on stat1007 to allow user cleanup via puppet [06:37:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:43:11] !log kill leftover process of nokafor on stat1004 to allow user cleanup via puppet [06:43:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:44:54] !log kill leftover process of jmads on stat1005 to allow user cleanup via puppet [06:44:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:54:40] 10Data-Engineering, 10ops-eqiad: Check analytics1086's mgmt's cable - https://phabricator.wikimedia.org/T320458 (10elukey) [06:54:53] 10Data-Engineering, 10ops-eqiad: Check analytics1086 mgmt's cable - https://phabricator.wikimedia.org/T320458 (10elukey) [07:56:05] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:07:17] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:17:18] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10dcausse) > @dcausse how would we like to model this for other kinds of enriched streams, e.g. wikidata... [08:32:31] elukey: Thanks for all the tidy-ups. --^ [10:44:44] (03CR) 10Michael Große: [C: 03+1] "Makes sense to me, and I can confirm that this metric is expected to be in milliseconds" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/841188 (https://phabricator.wikimedia.org/T314131) (owner: 10Mforns) [12:58:01] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:02:04] 10Data-Engineering, 10Machine-Learning-Team, 10observability: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10elukey) >>! In T319214#8305260, @Ottomata wrote: >> >> Do you have any specific requirements in mind? If so I can try to test them :) > Our main use cases is makin... [13:05:49] RECOVERY - Check unit status of drop_event on an-launcher1002 is OK: OK: Status of the systemd unit drop_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:20:05] 10Data-Engineering, 10Machine-Learning-Team, 10observability: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Ottomata) The tricky thing about async calls in streams, is that the ordering of the events might get all messed up, as the calls will evaluate in an undetermined... [13:28:48] (03CR) 10Mforns: Fix end-of-month/year allowed_interval issue (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns) [13:29:51] heya ottomata :] joal and I wanted to have your opinion on https://gerrit.wikimedia.org/r/c/analytics/refinery/+/836295 since it's a tricky change, maybe we can meet shortly and discuss the particular issue? [13:30:02] whenver! [13:30:40] mforns: is after standup an option for you? [13:30:44] it'd be better for me :) [13:30:50] please :) [13:30:52] joal: sure! [13:31:16] mforns: actually, after scrum retro :) [13:32:17] yes, makes sense! [13:36:12] joal: btw, can you have a quick look at https://gerrit.wikimedia.org/r/c/analytics/refinery/+/841188 it's a one-liner, I already have tested it successfully. [13:36:38] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) Okay, then given that and also the discussion about `rendered_content_slots` above, let's avo... [13:37:07] (03CR) 10Joal: [C: 03+1] "LGTM if tested :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/841188 (https://phabricator.wikimedia.org/T314131) (owner: 10Mforns) [13:37:10] done mforns :) [13:37:16] thank you joal!! :] [13:38:01] (03CR) 10Mforns: [V: 03+2 C: 03+2] Make wikidata reliability metrics longs so they fit in graphite [analytics/refinery] - 10https://gerrit.wikimedia.org/r/841188 (https://phabricator.wikimedia.org/T314131) (owner: 10Mforns) [14:35:14] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python, 10Data Pipelines (Sprint 02): Upgrade WMFData Python Package to use Spark3 - https://phabricator.wikimedia.org/T318587 (10Ottomata) > Fetch pkgs from the pkgs/ dir https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html... [14:59:16] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python, 10Data Pipelines (Sprint 02): Upgrade WMFData Python Package to use Spark3 - https://phabricator.wikimedia.org/T318587 (10xcollazo) [14:59:18] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10xcollazo) [15:04:12] !log reset the BMC on an-worker1086 with `sudo bmc-device --cold-reset` [15:04:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:04:38] mforns: joal is the stuff on the train already deployed and just not moved down from last week? [15:04:39] https://etherpad.wikimedia.org/p/analytics-weekly-train [15:04:50] mforns: looking! [15:07:07] (03CR) 10Ottomata: "I think I understand, and +1!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns) [15:07:09] (03CR) 10Ottomata: [C: 03+1] Fix end-of-month/year allowed_interval issue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns) [15:07:41] 10Data-Engineering, 10SRE, 10ops-eqiad: Check analytics1086 mgmt's cable - https://phabricator.wikimedia.org/T320458 (10BTullis) Just for good measure, I have carried out a cold reset of the IPMI controller with: ` btullis@an-worker1086:~$ sudo bmc-device --cold-reset; echo $? 0 ` I'll check again to see whe... [15:42:24] ottomata: I think it was deployed indeed. Did you move it under 2022-10-04? [15:43:56] mforns: no wan't me! [15:44:07] !log remove materialized .json files from schemas/event/secondary - this should be a no-op as no clients should actually be using the json files. - T315674 [15:44:08] ok [15:44:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:44:10] T315674: Remove materialized .json files from event schema repositories - https://phabricator.wikimedia.org/T315674 [15:44:14] (03CR) 10Ottomata: [C: 03+2] Remove materialized json files and disable materializing them [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/839678 (https://phabricator.wikimedia.org/T315674) (owner: 10Ottomata) [15:45:00] (03Merged) 10jenkins-bot: Remove materialized json files and disable materializing them [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/839678 (https://phabricator.wikimedia.org/T315674) (owner: 10Ottomata) [15:50:32] mforns: the refinery drop changes make sense to me [15:50:36] do you need more input from me? [15:50:47] yes, can we meet after our meetings? [15:51:09] ottomata: in 1 hour and 30 mins? [15:51:41] sure [15:51:49] I think it won't take us a lot [15:51:52] k [15:51:55] ok! thanks :] [18:55:11] 10Analytics, 10EventStreams: Old events in the stream - https://phabricator.wikimedia.org/T320558 (10Iluvatar) [18:58:11] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JAllemandou) One note about slot information: if the page-event we're talking about here is envisioned... [19:02:01] 10Analytics, 10EventStreams: Old events in the stream - https://phabricator.wikimedia.org/T320558 (10Iluvatar) [19:21:43] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) @tstarling, @daniel advised that I should include more than just `is_bot` in our modeling of... [23:16:26] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Release Wmfdata-Python 2.0 - https://phabricator.wikimedia.org/T300442 (10nshahquinn-wmf) [23:55:29] 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: Replace anaconda-wmf with smaller, non-stacked Conda environments - https://phabricator.wikimedia.org/T302819 (10nshahquinn-wmf) @Ottomata offhand suggestion: what if the new environments are configured to default to installing Conda packages from...