[02:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [06:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [10:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [10:51:59] (03PS2) 10Kosta Harlan: Add WelcomeSurvey Interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) [10:52:35] (03CR) 10jerkins-bot: [V: 04-1] Add WelcomeSurvey Interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [11:06:53] 10Data-Engineering, 10Data-Engineering-Kanban: Investigate Superset Druid Timeouts - https://phabricator.wikimedia.org/T297148 (10BTullis) a:03BTullis I'm looking into this issue now, to see if there is anything I can do to help. It seems that the value for the Druid timeout on both the 'analytics' and 'pub... [11:07:24] 10Data-Engineering, 10Data-Engineering-Kanban: Investigate Superset Druid Timeouts - https://phabricator.wikimedia.org/T297148 (10BTullis) p:05Triage→03Medium [11:45:17] (03PS3) 10Kosta Harlan: Add WelcomeSurvey Interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) [11:46:27] (03CR) 10jerkins-bot: [V: 04-1] Add WelcomeSurvey Interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [11:55:23] (03CR) 10Kosta Harlan: "Do you see what I'm doing wrong? I just see "examples must validate against schema", but I'm not sure what is off." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [11:56:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Re-enable Superset **metadata** caching - https://phabricator.wikimedia.org/T295295 (10BTullis) This is now merged and deployed. I also restarted memcached to ensure that the cache was fresh. It feels more responsive to me, but it's not easy... [11:56:36] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Re-enable Superset metadata caching - https://phabricator.wikimedia.org/T295295 (10BTullis) [12:11:26] (03PS4) 10Kosta Harlan: Add mediawiki.welcomesurvey.interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) [12:12:09] (03CR) 10jerkins-bot: [V: 04-1] Add mediawiki.welcomesurvey.interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [12:26:12] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:37:18] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:41:06] 10Data-Engineering, 10Data-Engineering-Kanban: Investigate Superset Druid Timeouts - https://phabricator.wikimedia.org/T297148 (10JAllemandou) thanks a lot @BTullis for the thorough report! I support moving query timeout for Druid internal to 2'55' (so that the query fails before the Superset timeout :) If thi... [13:17:30] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) > What would be the next steps? Here is a proposal: # [DE, SRE]Agree on the name of the flow :) Will it be `sflow`... [14:02:55] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) Sounds good! 1. we can use "internal_flows" (not _netflow as netflow is a protocol). 2. can I start this anytime, or we... [14:14:16] hey team! [14:14:54] Hi mforns. How's it going? [14:15:26] hi btullis, good! you? [14:15:39] Smashing, thanks. :-) [14:21:51] :] [14:29:08] I think I *might* have sorted out my screensharing this properly time. [14:29:23] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > Agree on the name of the flow : Some guidelines: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelin... [14:29:32] wow - I'm not even sure what that represent btullis :) [14:29:38] in term of work I mean [14:29:51] Xscreen related stuff? [14:31:49] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) >>! In T263277#7552972, @ayounsi wrote: > Sounds good! > 1. we can use "internal_flows" (not _netflow as netflow is... [14:33:32] Yeah, lots of boring driver related stuff. Docking station that uses a displaylink driver. Laptop that has two GPUs. Xorg or Wayland windowing system. I just couldn't share in Google Meet, or sometimes it would let me share, but not a terminal, or not the whole screen, or only in Chrome. Tedious. :-) [14:34:44] (03CR) 10Ottomata: Add mediawiki.welcomesurvey.interaction schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [14:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [14:40:23] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) > can I start this anytime, or we need to create the kafka topic somewhere? Not really needed, unless you need to set s... [14:43:11] I'd like to spend a bit of time looking at this alert eventgate alert --^^ given that I sort of caused it during the Alertmanager work. I created the ticket (https://phabricator.wikimedia.org/T294911) but I'm not sure if I should be spending time on it, or if so, where I should look. [14:50:50] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Event-Platform, 10Observability-Alerting: Apparent latency warning in 90th centile of eventgate-logging-external - https://phabricator.wikimedia.org/T294911 (10BTullis) a:03BTullis I'm going to spend some time investigating this warning, bec... [15:14:19] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 3 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis) [15:35:49] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) `internal_network_flows` works, `network.flows.internal` too. @Ottomata indeed we do have restriction on the producer s... [15:41:39] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/737471 (https://phabricator.wikimedia.org/T287255) (owner: 10Jenniferwang) [15:55:16] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10BTullis) In case it helps, I came across this abandoned change from 2020: https://gerrit.wikimedia.org/r/c/schemas/event/secondar... [16:59:34] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10User-razzi: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10BTullis) a:05razzi→03BTullis I'm going to see if I can do a little work on this if that's OK with you @razzi. I will almost certainly... [17:01:54] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10User-razzi: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10razzi) Sounds good @BTullis; I'm available to help as well. Repository is at https://gitlab.wikimedia.org/razzi/presto-query-logger, lmk i... [17:08:57] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review, 10User-razzi: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10JAllemandou) >>! In T269832#7553437, @BTullis wrote: > I'm going to see if I can do a little work on this if that's OK with you @razzi. I... [17:10:53] 10Data-Engineering, 10Data-Engineering-Kanban: Move spark.local.dir to /srv on stat100x - https://phabricator.wikimedia.org/T295346 (10BTullis) 05Open→03Resolved [17:19:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10User-razzi: Superset SQL Lab fails to stop query - https://phabricator.wikimedia.org/T293083 (10razzi) I'm not sure exactly what's going on; I tried logging in as cmacholan on superset staging and was able to stop a query to mysql (I ran `select sleep(100)` on... [17:21:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Increase Superset Timeout - https://phabricator.wikimedia.org/T294771 (10razzi) 05Open→03Resolved Go team! This ticket wins the award for "most hours spent per line of code changed". Thanks @BTullis @elukey as always [17:21:26] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Presto/Superset User Experience Improvement - https://phabricator.wikimedia.org/T294259 (10razzi) [17:21:39] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10razzi) [18:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [19:15:02] razzi: lemme know when you'd like to deploy wikistats, or if I should just do it the old fashioned way [19:20:05] Ah yeah I’m eating lunch right now, didn’t get a fully finished docker pipeline yet, and the old fashioned way will probably work; give it a go old fashioned and if you get an error upon npm run build let me know! [19:25:24] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10wmfdata-python, 10Product-Analytics (Kanban): wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Milimetric) Sorry, I accidentally moved the PR to https://github.com/wikimedia/wmfdata-python/pull/24 [19:42:02] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:42:49] 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, 10Traffic-Icebox, and 2 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10Ottomata) Ah, right! https://phabricator.wikimedia.org/T248865#6289287 So yeah, unless we can at least control the event format... [19:46:38] 10Analytics-Radar, 10SRE, 10Traffic-Icebox: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10Jdlrobson) [19:52:13] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:08:54] (03PS1) 10Ottomata: Release 2.9.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/744864 [20:09:21] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Release 2.9.2 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/744864 (owner: 10Ottomata) [20:09:43] !log deploy wikistats2 with doc updates [20:09:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:10:23] milimetric: i just merged a wikistats 2 build commit, docs say it shoudl be updated on site within 30 mins, could you verify your change is there later today or tomorrow? [20:11:06] ottomata: oh! I was in the middle of building it and got distracted, didn't know you were doing it. Thx! I'll check in a bit [20:11:32] oh its in the train [20:11:37] no? [20:12:45] yeah, no it's great, we just missed the deploy last week and it got confusing. [20:42:10] (03PS5) 10Kosta Harlan: Add mediawiki.welcomesurvey.interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) [20:43:03] (03PS6) 10Kosta Harlan: Add mediawiki.welcomesurvey.interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) [20:55:39] (03CR) 10Kosta Harlan: Add mediawiki.welcomesurvey.interaction schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/744003 (https://phabricator.wikimedia.org/T267273) (owner: 10Kosta Harlan) [22:03:21] (03PS1) 10Razzi: Add Dockerfile to build production dist folder [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/744885 [22:16:51] (03CR) 10Razzi: "Here's my dockerfile for wikistats; took me a while but should save some time for future deploys." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/744885 (owner: 10Razzi) [22:35:00] (EventgateLoggingExternalLatency) firing: (2) Elevated latency for GET events on eventgate-logging-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-logging-external - https://alerts.wikimedia.org [22:37:11] 10Analytics-Radar, 10Data-Engineering, 10wmfdata-python, 10Product-Analytics (Kanban): Create a wmfdata-python test script - https://phabricator.wikimedia.org/T247261 (10nshahquinn-wmf) @Milimetric in case you missed it, I made some changes to the PR in response to your comment. Let me know what you think! [23:24:52] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 4 others: Sticky header: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10Edtadros) === Test Result - Beta **Status:** ✅ PASS **Environment:** beta **OS:*... [23:26:22] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 4 others: Sticky header: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10Edtadros) === Test Result - Prod **Status:** ✅ PASS **Environment:** enwiki **OS... [23:28:20] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 4 others: Sticky header: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10Edtadros) [23:28:53] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 4 others: Sticky header: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10Edtadros) a:05Edtadros→03ovasileva