[00:04:45] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:06:45] (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:15:25] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:21:45] (SystemdUnitFailed) firing: (2) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:42:40] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Remove Matplotlib as a Wmfdata-Python dependency - https://phabricator.wikimedia.org/T324053 (10nshahquinn-wmf) 05Open→03Resolved a:03nshahquinn-wmf @xcollazo approved the PR, and I've merged it. [04:21:45] (SystemdUnitFailed) firing: hadoop-hdfs-journalnode.service Failed on analytics1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:29:06] 10Data-Engineering, 10Anti-Harassment, 10SRE, 10Traffic, and 2 others: Include User-Agent Client Hints in WebRequest logs - https://phabricator.wikimedia.org/T337947 (10kostajh) [08:21:45] (SystemdUnitFailed) firing: hadoop-hdfs-journalnode.service Failed on analytics1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:55:33] Hi aqu are you available so that we can merge the fix on the airflow alerts from emitting datahub lineage from airflow. https://gerrit.wikimedia.org/r/c/operations/puppet/+/931690 [09:05:13] 10Quarry, 10superset.wmcloud.org, 10cloud-services-team (FY2022/2023-Q4): Replace Quarry with an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Stuartyeates) I've opened a superset discussion at https://github.com/apache/superset/discussions/24455 related to getting superset to expor... [09:26:42] stevemunene sure [12:21:45] (SystemdUnitFailed) firing: hadoop-hdfs-journalnode.service Failed on analytics1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:50:29] hello folks [12:50:37] I am going to migrate vk instances in codfw to PKI [12:50:43] as always nothing should explode [12:51:39] !log move varnishafka instances in codfw to PKI - T337825 [12:51:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:51:41] T337825: Move varnishkafka to PKI - https://phabricator.wikimedia.org/T337825 [12:52:27] next in line eqiad and then esams [12:52:30] and then we are done [13:00:11] 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Configure new WDQS servers in codfw (wdqs20[13-22]) - https://phabricator.wikimedia.org/T332314 (10bking) a:03bking [13:10:06] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Event-Platform Value Stream (Sprint 14 B): Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10JArguello-WMF) [13:10:36] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Event-Platform Value Stream (Sprint 14 B): Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10JArguello-WMF) a:03Ottomata [13:11:13] 10Data-Engineering, 10Data Pipelines, 10Event-Platform Value Stream (Sprint 14 B): Fix wikimedia-event-utilities Guava dependencies issues - https://phabricator.wikimedia.org/T337421 (10Ottomata) 05Open→03Resolved a:03Ottomata [13:37:21] (03PS3) 10Aqu: Use canonical_data countries maintained by analytics-product [analytics/refinery] - 10https://gerrit.wikimedia.org/r/929723 (https://phabricator.wikimedia.org/T338033) [13:44:09] (03CR) 10Aqu: "Thanks for the review and your sharp eye Dan." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/929723 (https://phabricator.wikimedia.org/T338033) (owner: 10Aqu) [13:46:10] 10Data-Engineering, 10Data-Platform-SRE: Superset permissions for nshahquinn-wmf - https://phabricator.wikimedia.org/T339385 (10Stevemunene) a:03Stevemunene [13:49:50] 10Data-Engineering, 10Data-Platform-SRE: Superset permissions for nshahquinn-wmf - https://phabricator.wikimedia.org/T339385 (10Stevemunene) [14:15:56] 10Data-Engineering, 10Event-Platform Value Stream: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10Ottomata) [14:16:27] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10CodeReviewBot) otto opened https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/merge_requests/72 Rem... [14:16:43] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10CodeReviewBot) [14:16:57] 10Data-Engineering, 10Data Pipelines: Update API with May Net New Content Data - https://phabricator.wikimedia.org/T339159 (10Aklapper) (@Iflorez: Please add project tags to a task, so other people can find a task when searching via projects or when looking at workboards. Thanks!) [14:58:05] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Patch-For-Review: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) [15:01:54] (03PS2) 10Mforns: Remove deprecated code for AppSessionMetrics [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/919357 (https://phabricator.wikimedia.org/T329310) [15:17:28] (03CR) 10Mforns: [C: 03+2] "Self-Merging this, since it's just deletion of deprecated code. See task!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/919357 (https://phabricator.wikimedia.org/T329310) (owner: 10Mforns) [15:20:24] 10Data-Engineering, 10Event-Platform Value Stream: Make meta.dt required on all schemas that declare it - https://phabricator.wikimedia.org/T340044 (10xcollazo) [15:23:18] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) @dcausse The... [15:27:34] (03Merged) 10jenkins-bot: Remove deprecated code for AppSessionMetrics [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/919357 (https://phabricator.wikimedia.org/T329310) (owner: 10Mforns) [15:31:31] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14): Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) [15:32:53] (03PS1) 10Gmodena: error: add error_type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) [15:35:52] (03CR) 10Gmodena: error: add error_type field (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:36:10] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) Meh, never m... [15:36:28] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) [15:36:56] (03CR) 10Ottomata: [C: 03+1] "LGTM, let's merge whenever!" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:37:32] (03PS1) 10Mforns: Remove queries for deprecated mobile_apps jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/931959 (https://phabricator.wikimedia.org/T329310) [15:37:50] (03CR) 10Ottomata: [C: 03+1] error: add error_type field (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:39:01] (03CR) 10Milimetric: [C: 03+1] error: add error_type field (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:42:41] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Patch-For-Review: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) [15:43:25] (03CR) 10Ottomata: [C: 03+2] error: add error_type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:43:54] (03Merged) 10jenkins-bot: error: add error_type field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/931957 (https://phabricator.wikimedia.org/T309699) (owner: 10Gmodena) [15:44:48] !log deployed airflow analytics to remove deprecated dag for mobile_apps [15:44:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:47:29] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Patch-For-Review: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) [15:49:24] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Patch-For-Review: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) [15:49:54] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 14), 10Patch-For-Review: Deprecate old mobile datasets - https://phabricator.wikimedia.org/T329310 (10mforns) Since after 30 days nobody has complained about the data missing: Removed the refinery-source code and the airflow-dags code. Also deleted the da... [15:50:05] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B), 10Patch-For-Review: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api - https://phabricator.wikimedia.org/T309699 (10CodeReviewBot) gmodena updated https://gitlab.wik... [16:00:38] 10Data-Engineering: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453 (10Ottomata) [16:00:40] 10Data-Engineering, 10Data-Catalog, 10Product-Analytics: Propagate field descriptions from event schemas to Hive event tables - https://phabricator.wikimedia.org/T307040 (10Ottomata) [16:21:45] (SystemdUnitFailed) firing: hadoop-hdfs-journalnode.service Failed on analytics1069:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:26:03] (03CR) 10Joal: [C: 04-1] "I think this will not work as is" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/929723 (https://phabricator.wikimedia.org/T338033) (owner: 10Aqu) [16:26:35] aqu: Did a quick review, I think there is a need for changes please :) [16:29:24] mforns , milimetric: if you have a minute, Would you take a look at https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/439 please? [16:40:33] !log Rerun projectview-hourly DAG for hour: 2023-06-20T04:00 [16:40:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:40:42] Very weird stuff! [16:44:06] I'll rerun jobs depending on projectview_hourly for that hour [16:46:00] !log rerun browser_general_daily for 2023-06-20 [16:46:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:46:58] !log Rerun cassandra-load tasks for pageview-per-project daily and hourly for 2023-06-20 hour 4 [16:46:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:58:10] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10xcollazo) > eventgate... [17:14:53] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) I think you... [17:31:47] hola hola milimetric mforns [17:31:53] joal [17:31:54] https://diff.wikimedia.org/2023/06/21/new-dataset-uncovers-wikipedia-browsing-habits-while-protecting-users/ [17:31:58] ta-tachannnn [17:32:17] Yes nuria! it finally happened :) [17:32:50] hey nuria!! :-) [17:42:18] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B): Update eventgate and eventstreams helm chart to use automatic kafka egress networkpolicies and envoy service mesh - https://phabricator.wikimedia.org/T335024 (10Ottomata) [17:44:00] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B): Update eventgate and eventstreams helm chart to use automatic kafka egress networkpolicies and envoy service mesh - https://phabricator.wikimedia.org/T335024 (10Ottomata) @tchin, I've done the eventgate ones! Want to do eventstreams ones? Patch... [17:45:30] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE, 10serviceops, 10Patch-For-Review: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10Ottomata) Status update: networkpolicy for Kafka brokers has been DRY, but referencing the hostnames for Kafka brokers for... [17:46:34] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B): Improve Event Platform and MediaWiki Event Enrichment wikitech documentation - https://phabricator.wikimedia.org/T329629 (10Ottomata) [18:01:16] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B): Improve Event Platform and MediaWiki Event Enrichment wikitech documentation - https://phabricator.wikimedia.org/T329629 (10Ottomata) [18:03:01] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B): Update eventgate and eventstreams helm chart to use automatic kafka egress networkpolicies and envoy service mesh - https://phabricator.wikimedia.org/T335024 (10tchin) I could try taking a crack at it [18:22:12] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B): Improve Event Platform and MediaWiki Event Enrichment wikitech documentation - https://phabricator.wikimedia.org/T329629 (10Ottomata) Some https://wikitech.wikimedia.org/wiki/Template:Navigation_Event_Platform updates today. Also added... [18:22:24] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B): Improve Event Platform and MediaWiki Event Enrichment wikitech documentation - https://phabricator.wikimedia.org/T329629 (10Ottomata) [18:26:49] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10CodeReviewBot) otto merged https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/merge_requests/72 Rem... [18:27:31] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B), 10Patch-For-Review: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10Ottomata) [18:27:38] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 B), 10Patch-For-Review: Fix eventutillites_python stream_manager error_sink configuration - https://phabricator.wikimedia.org/T335591 (10Ottomata) a:03Ottomata [18:33:10] (03CR) 10Ottomata: [C: 03+2] mediawiki/revision/score: add the dt field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/929733 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [18:34:08] (03Merged) 10jenkins-bot: mediawiki/revision/score: add the dt field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/929733 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [18:35:00] (03CR) 10Ottomata: [C: 03+2] mediawiki/revision/create: add mandatory dt field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/930666 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [18:35:36] (03Merged) 10jenkins-bot: mediawiki/revision/create: add mandatory dt field [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/930666 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [18:53:52] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10JAllemandou) Heya - so... [18:57:09] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 B), 10Patch-For-Review: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api - https://phabricator.wikimedia.org/T309699 (10CodeReviewBot) gmodena merged https://gitlab.wiki... [19:06:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:06:51] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10mpopov) > If someone's... [19:07:29] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10mpopov) Oh sorry, this... [19:08:10] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:15:56] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:16:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:25:12] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:26:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:29:41] 10Data-Engineering, 10Movement-Insights, 10Product-Analytics, 10Research-Backlog: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10kzimmerman) a:03Mayakp.wiki @MGerlach will consult on this. Assigning th... [19:29:50] 10Data-Engineering, 10Movement-Insights, 10Product-Analytics, 10Research-Backlog: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10kzimmerman) p:05Triage→03Medium [19:31:28] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:31:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:36:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:37:42] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:43:15] 10Data-Engineering, 10Event-Platform Value Stream, 10serviceops: Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10Ottomata) [19:43:30] 10Data-Engineering, 10Event-Platform Value Stream, 10serviceops: Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10Ottomata) [19:45:24] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:46:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:50:06] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10JAllemandou) >>! In T2... [19:51:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:53:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:56:59] Thanks for the details on the email alerts mforns and joal much appreciated :) [19:59:18] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) > For schema... [20:00:58] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:01:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:11:40] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:11:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:16:14] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:16:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:21:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:22:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:23:15] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10mpopov) > Which field... [20:30:42] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:31:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:38:08] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:41:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:46:16] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:46:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:54:04] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:56:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:01:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:06:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:16:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:20:12] 10Data-Engineering, 10Event-Platform Value Stream: EventBus should set dt fields with greater precision than second - https://phabricator.wikimedia.org/T340067 (10Ottomata) [21:21:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:26:32] 10Data-Engineering, 10Event-Platform Value Stream: EventBus should set dt fields with greater precision than second - https://phabricator.wikimedia.org/T340067 (10Ottomata) @xcollazo this task should be pretty easy to do if you want to try your hand at some PHP! Relevant code for page change streams is: http... [21:30:23] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:31:41] 10Analytics, 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10Ottomata) @mpopov for... [21:36:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:39:31] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:41:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:45:37] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:51:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:54:45] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:56:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:00:53] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:07:01] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:16:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:21:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:31:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:36:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:45:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:46:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:51:31] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:51:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:00:11] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:01:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:04:49] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:06:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:15:45] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:16:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:21:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:21:55] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:31:11] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:31:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:38:59] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:41:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:45:15] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:46:45] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:49:59] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:51:45] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed