[00:05:01] (03PS7) 10Neil P. Quinn-WMF: Begin sanitizing Wikistories streams [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832383 (https://phabricator.wikimedia.org/T312262) [00:09:49] (03CR) 10Neil P. Quinn-WMF: Begin sanitizing Wikistories streams (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/832383 (https://phabricator.wikimedia.org/T312262) (owner: 10Neil P. Quinn-WMF) [01:00:40] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews: Implement Unit Tests - https://phabricator.wikimedia.org/T299735 (10BPirkle) This is blocked by {T318765} [01:03:01] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:03:20] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Unique Devices service - https://phabricator.wikimedia.org/T288298 (10BPirkle) [01:03:31] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Mediarequests Service - https://phabricator.wikimedia.org/T288303 (10BPirkle) [01:04:36] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:04:38] 10Data-Engineering, 10API Platform, 10Code-Health-Objective, 10Epic, and 3 others: Implement aggregate endpoint of the pageviews API - https://phabricator.wikimedia.org/T299731 (10BPirkle) 05In progress→03Resolved [01:04:56] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Editors service - https://phabricator.wikimedia.org/T288305 (10BPirkle) [01:05:24] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:05:26] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Implement top-by-country endpoint of the pageviews API - https://phabricator.wikimedia.org/T299733 (10BPirkle) 05In progress→03Resolved [01:06:04] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Implement top-per-country endpoint of the pageviews API - https://phabricator.wikimedia.org/T299734 (10BPirkle) 05In progress→03Resolved [01:06:06] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:06:40] 10Analytics, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Implement per-article endpoint of the pageviews API - https://phabricator.wikimedia.org/T289265 (10BPirkle) 05In progress→03Resolved [01:06:42] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:07:09] 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Implement top endpoint of the pageviews API - https://phabricator.wikimedia.org/T299732 (10BPirkle) 05In progress→03Resolved [01:07:11] 10Data-Engineering, 10API Platform (Product Roadmap), 10Platform Engineering Roadmap, 10User-Eevans: AQS 2.0: Pageviews Service - https://phabricator.wikimedia.org/T288296 (10BPirkle) [01:41:15] 10Analytics, 10API Platform, 10Code-Health-Objective: Synchronize .gitignore files - https://phabricator.wikimedia.org/T315113 (10BPirkle) [07:21:45] 10Quarry, 10PAWS: Github action for phabricator notifications - https://phabricator.wikimedia.org/T318774 (10rook) [09:15:11] I plan to start a rolling restart of kafka-jumbo shortly, to pick up a new JVM. Let me know if you'd like me to defer the restart for any reason. Thanks. [09:17:47] 10Quarry: Wrong status of queries in Recent Queries list - https://phabricator.wikimedia.org/T137517 (10rook) I believe this was an artifact of queries getting stuck in a "running" state. That has largely been fixed with an updated stop button. As it stands looking through the quarry listing today, it would app... [09:17:50] 10Quarry: Wrong status of queries in Recent Queries list - https://phabricator.wikimedia.org/T137517 (10rook) 05Open→03Resolved [09:22:13] !log started cookbook sre.kafka.roll-restart-brokers jumbo-eqiad [09:22:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:44] (03CR) 10Awight: "This change is ready for review." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/835621 (https://phabricator.wikimedia.org/T315972) (owner: 10Awight) [13:27:08] (03PS4) 10Awight: Maps interaction event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/835621 (https://phabricator.wikimedia.org/T315972) [14:02:00] !log roll-restarting druid-public [14:02:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:15:25] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Looks good. Some suggestions inside." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/835621 (https://phabricator.wikimedia.org/T315972) (owner: 10Awight) [14:22:50] (03PS1) 10Xcollazo: Workaround Spark3 bug affecting MediaWikiEvent. Fix empty path bug. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/836222 (https://phabricator.wikimedia.org/T316371) [15:00:41] !log deploying Airflow for hdfsarchiver operator fix [15:00:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:01:40] !log roll-restarting druid-analytics [15:01:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:10:54] PROBLEM - Check unit status of eventlogging_to_druid_netflow_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_netflow_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:12:10] mforns: This --^ is probably down to my roll-restarting the druid-analytics cluster. [15:12:43] Oh! OK, thanks btullis ! [15:12:55] btullis: should I restart the timer? [15:13:33] mforns: Yes please, if you don't mind. I was just checking the email about it, but I think you're right that restarting the timer should be enough. [15:14:04] ok! [15:23:08] btullis: done :] [15:23:40] Great! Many thanks and apologies for the inconvenience. [15:29:08] no problemo at allll [15:29:49] !log started airflow projectview_geo job [15:29:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:31:21] RECOVERY - Check unit status of eventlogging_to_druid_netflow_hourly on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_netflow_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:52:24] hm joal, aqu, what was the issue that made Airflow write files as analytics:hdfs user instead of analytics:analytics-privatedata-users?? [15:52:36] It's happening again for unique_devices... [16:00:30] Hey mforns - IIRC it's the parent folder group - but I may completely be mistaken [16:09:13] 10Data-Engineering-Kanban, 10Data Pipelines (Sprint 02): Projectviews by country Airflow job - https://phabricator.wikimedia.org/T303193 (10JArguello-WMF) 05Open→03In progress [16:10:29] 10Data-Engineering-Kanban, 10Data Pipelines (Sprint 02): Projectviews by country Airflow job - https://phabricator.wikimedia.org/T303193 (10JArguello-WMF) a:03Snwachukwu [16:34:41] PROBLEM - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1001 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:37:53] PROBLEM - Check unit status of analytics-dumps-fetch-unique_devices on labstore1007 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:39:53] PROBLEM - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1002 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:44:09] PROBLEM - Check unit status of analytics-dumps-fetch-unique_devices on labstore1006 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:58:48] (03PS2) 10Joal: Fix mediawiki-history-denormalize for spark 3 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/835618 (https://phabricator.wikimedia.org/T318589) [17:04:16] (03CR) 10Joal: "Thank you for the review Xabriel :) I implemented your suggestion, it's way nicer." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/835618 (https://phabricator.wikimedia.org/T318589) (owner: 10Joal) [17:04:28] mforns: I'm availa [17:04:31] +ble [17:04:35] batcave? [17:04:40] ok! joal batcave! [17:43:56] (03CR) 10Xcollazo: [C: 03+2] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/835618 (https://phabricator.wikimedia.org/T318589) (owner: 10Joal) [17:51:34] (03Merged) 10jenkins-bot: Fix mediawiki-history-denormalize for spark 3 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/835618 (https://phabricator.wikimedia.org/T318589) (owner: 10Joal) [17:56:55] 10Data-Engineering, 10Event-Platform Value Stream: EventGate should support producing keyed messages for Kafka partitioning - https://phabricator.wikimedia.org/T318846 (10Ottomata) [18:03:23] 10Analytics, 10Dumps-Generation, 10cloud-services-team (Kanban): analytics-dumps-fetch-unique_devices.service failing on dumps servers - https://phabricator.wikimedia.org/T318849 (10Andrew) [18:04:34] 10Analytics, 10Dumps-Generation, 10cloud-services-team (Kanban): analytics-dumps-fetch-unique_devices.service failing on dumps servers - https://phabricator.wikimedia.org/T318849 (10Andrew) {F35538319} [18:05:12] 10Analytics, 10Dumps-Generation, 10cloud-services-team (Kanban): analytics-dumps-fetch-unique_devices.service failing on dumps servers - https://phabricator.wikimedia.org/T318849 (10JAllemandou) Ping @mforns on this one as it is probably related to the move to airflow [18:05:46] Hmmm... [18:05:52] working on it! [18:05:57] thanks mforns <3 [18:06:08] joal: are you avail for a CR? [18:06:17] sure mforns - tell me [18:08:58] joal: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/163 [18:09:04] 10Data-Engineering, 10SRE, 10serviceops, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: eventstreams chart should use latest common_templates - https://phabricator.wikimedia.org/T310721 (10lbowmaker) [18:09:06] found it mforns - reading [18:09:15] 10Data-Engineering-Kanban, 10Data Engineering Planning, 10SRE, 10serviceops, and 2 others: eventgate chart should use common_templates - https://phabricator.wikimedia.org/T303543 (10lbowmaker) [18:09:15] joal: I'm still testing it, but if you found something weird... [18:10:25] 10Data-Engineering-Kanban, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: [BUG] jsonschema-tools materializes fields in yaml in a different order than in json files - https://phabricator.wikimedia.org/T308450 (10lbowmaker) [18:10:44] mforns: open dumb question: {{execution_date.month}} is padded or un-padded? [18:10:50] We'd wish it to be un-padded [18:10:53] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10lbowmaker) [18:13:43] joal: it's unpadded, it's an int [18:13:52] thanks mforns :) [18:14:16] 👍 [18:14:37] Approved mforns - looks good! [18:14:40] thanks for that [18:14:47] do ou wish me to merge? [18:14:55] joal: thanks! ok, the test just finished successfully [18:15:07] Merging! [18:15:11] joal: can you still pair on deleting the bad-permits data? [18:15:19] sure mforns [18:15:23] batcave! [18:15:26] ok! [18:18:50] ok gone for now :) [18:22:11] byeeeeeeee :] [18:22:25] !log deployed airflow to fix unique_devices jobs [18:22:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:45:17] (03PS1) 10Neil P. Quinn-WMF: Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) [18:45:48] (03CR) 10CI reject: [V: 04-1] Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) (owner: 10Neil P. Quinn-WMF) [18:46:56] (03CR) 10Neil P. Quinn-WMF: "The key file to check is current.yaml; the other files are generated from it." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) (owner: 10Neil P. Quinn-WMF) [18:50:28] (03PS2) 10Neil P. Quinn-WMF: Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) [19:00:24] 10Data-Engineering, 10Event-Platform Value Stream: [SPIKE] Build simple stateless service using Flink SQL - https://phabricator.wikimedia.org/T318856 (10lbowmaker) [19:00:34] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02): [SPIKE] Build simple stateless service using Flink SQL - https://phabricator.wikimedia.org/T318856 (10lbowmaker) [19:04:11] mforns: this you? https://phabricator.wikimedia.org/T318849 [19:04:30] hey andrewbogott yes, I think so [19:04:37] it should be fixed now! [19:04:49] oh great... I'll just wait for those jobs to re-run then :) [19:04:54] thx [19:05:06] I wonder if I need to restart the service or it will pick up the corrected datasets by itself? [19:05:55] 10Analytics, 10Dumps-Generation, 10cloud-services-team (Kanban): analytics-dumps-fetch-unique_devices.service failing on dumps servers - https://phabricator.wikimedia.org/T318849 (10mforns) The permissions of the unique devices dumps has been restored. [19:06:25] ok, andrewbogott, thanks for the ping [19:11:05] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Spike: [SPIKE] Build simple stateless service using Flink SQL - https://phabricator.wikimedia.org/T318856 (10lbowmaker) [19:33:39] RECOVERY - Check unit status of analytics-dumps-fetch-unique_devices on labstore1006 is OK: OK: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:35:23] RECOVERY - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1001 is OK: OK: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:38:23] RECOVERY - Check unit status of analytics-dumps-fetch-unique_devices on labstore1007 is OK: OK: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:40:43] RECOVERY - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1002 is OK: OK: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:48:01] !log killed oozie's unique_devices-per_domain-monthly-coord because we migrated it to airflow [19:48:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:48:42] !log killed oozie's unique_devices-per_project_family-monthly-coord because we migrated it to airflow [19:48:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:49:28] !log killed oozie's unique_devices-per_project_family-daily-coord because we migrated it to airflow [19:49:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:50:06] !log killed oozie's unique_devices-per_domain-daily-coord because we migrated it to airflow [19:50:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:57:21] 10Data-Engineering, 10Event-Platform Value Stream: [SPIKE] Investigate using Flink Stateful Functions - https://phabricator.wikimedia.org/T318861 (10lbowmaker) [20:02:36] 10Data-Engineering, 10Event-Platform Value Stream, 10Spike: [SPIKE] Investigate using Knative Eventing - https://phabricator.wikimedia.org/T318862 (10lbowmaker) [20:05:43] 10Data-Engineering, 10Event-Platform Value Stream: Event Platform and DataHub Integration - https://phabricator.wikimedia.org/T318863 (10lbowmaker) [20:05:59] 10Data-Engineering, 10Event-Platform Value Stream, 10Spike: [SPIKE] Investigate using Flink Stateful Functions - https://phabricator.wikimedia.org/T318861 (10lbowmaker) [20:06:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp5016 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5016%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:11:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp5016 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5016%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [21:21:02] (03PS1) 10Mforns: Fix end-of-month/year allowed_interval issue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) [21:24:37] (03PS2) 10Mforns: Fix end-of-month/year allowed_interval issue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) [22:39:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:41:49] (03PS3) 10Neil P. Quinn-WMF: Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) [22:42:26] (03CR) 10CI reject: [V: 04-1] Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) (owner: 10Neil P. Quinn-WMF) [22:44:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [22:48:01] (03PS4) 10Neil P. Quinn-WMF: Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) [22:48:31] (03CR) 10CI reject: [V: 04-1] Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934) (owner: 10Neil P. Quinn-WMF) [22:48:50] (03PS5) 10Neil P. Quinn-WMF: Add Wikistories contribution_attempt_id [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/836266 (https://phabricator.wikimedia.org/T317934)