[04:35:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5022 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5022%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[04:40:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5022 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5022%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[08:56:10] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 03+2] homepagemodule: Add support for newimpact drawer/tour events [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/863385 (https://phabricator.wikimedia.org/T323619) (owner: 10Kosta Harlan)
[08:57:04] <wikibugs>	 (03Merged) 10jenkins-bot: homepagemodule: Add support for newimpact drawer/tour events [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/863385 (https://phabricator.wikimedia.org/T323619) (owner: 10Kosta Harlan)
[09:05:49] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:06:12] <wikibugs>	 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] quarry-web-02 out of memory - https://phabricator.wikimedia.org/T324438 (10dcaro)
[09:06:23] <wikibugs>	 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] quarry-web-02 out of memory - https://phabricator.wikimedia.org/T324438 (10dcaro) 05Open→03Resolved
[09:31:25] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:33:19] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream: Flink Tables should have a default ROWTIME column. - https://phabricator.wikimedia.org/T324144 (10gmodena) a:03gmodena
[10:16:59] <wikibugs>	 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] worker-04 down - https://phabricator.wikimedia.org/T324402 (10dcaro) Might be related to T324438
[11:26:39] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05): [SPIKE] Evaluate a pyflink version of Mediawiki Stream Enrichment - https://phabricator.wikimedia.org/T323217 (10gmodena)
[11:43:40] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Add an-presto10[06-15] to the presto cluster - https://phabricator.wikimedia.org/T323783 (10Stevemunene) @Ottomata This might affect the rare packages using python2 or the deployments that had already set up symlinks to pyt...
[11:45:43] <steve_munene>	 !log restarting presto-server.service on an-presto1007 T323783
[11:45:46] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:45:46] <stashbot>	 T323783: Add an-presto10[06-15] to the presto cluster - https://phabricator.wikimedia.org/T323783
[11:48:13] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5026 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5026%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[11:53:13] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5026 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5026%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[12:06:57] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): Add an-presto10[06-15] to the presto cluster - https://phabricator.wikimedia.org/T323783 (10Stevemunene) an-presto1007 is now part of the cluster. the delay in joining the cluster was caused by the timing between the puppet...
[13:49:37] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:00:02] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:07:17] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05): Flink + Event Platform integration for writing into streams via Table API - https://phabricator.wikimedia.org/T324114 (10JArguello-WMF)
[14:08:59] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05): Flink Tables should have a default ROWTIME column. - https://phabricator.wikimedia.org/T324144 (10JArguello-WMF)
[14:17:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5025 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5025%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[14:22:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5025 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5025%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[14:35:13] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5021 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5021%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[14:40:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5021 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5021%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[14:43:12] <wikibugs>	 (03PS1) 10Snwachukwu: Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769)
[14:46:19] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu)
[14:46:26] <wikibugs>	 10Quarry, 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro, 10cloud-services-team (Kanban): [quarry] worker-04 down - https://phabricator.wikimedia.org/T324402 (10rook) Thanks for dealing with that. The workers have a memory leak and eventually run out. Originally the idea...
[14:54:22] <wikibugs>	 (03PS12) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841)
[14:55:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo)
[14:56:13] <wikibugs>	 (03PS13) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841)
[15:39:44] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Add automata value in agent_type field of the refined table {hawk} - https://phabricator.wikimedia.org/T95693 (10Aklapper)
[15:50:14] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:32:41] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:40:13] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5027 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5027%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[16:45:13] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5027 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5027%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[16:45:34] <wikibugs>	 10Analytics-Clusters, 10Analytics-Radar, 10Data-Engineering-Planning, 10Event-Platform Value Stream, and 2 others: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10akosiaris)
[16:55:55] <wikibugs>	 10Analytics-Clusters, 10Analytics-Radar, 10Data-Engineering-Planning, 10Event-Platform Value Stream, and 2 others: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10akosiaris) @Ottomata, @elukey  any updates on this?...
[17:02:34] <wikibugs>	 10Analytics-Clusters, 10Analytics-Radar, 10Data-Engineering-Planning, 10Event-Platform Value Stream, and 2 others: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10Ottomata) I would like to see config management for...
[17:07:12] <wikibugs>	 (03PS2) 10Snwachukwu: Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769)
[17:09:12] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Refactor and Expand External referer classification [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/864772 (https://phabricator.wikimedia.org/T309769) (owner: 10Snwachukwu)
[17:19:52] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:28:43] <wikibugs>	 10Analytics-Clusters, 10Analytics-Radar, 10Data-Engineering-Planning, 10Event-Platform Value Stream, and 2 others: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10akosiaris) 05Open→03Stalled Cool, thanks for th...
[17:30:13] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp5024 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5024%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[17:30:20] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:35:13] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp5024 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5024%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[19:23:21] <wikibugs>	 10Analytics-Radar, 10Privacy Engineering, 10Product-Analytics: Clarify the data retention extension process - https://phabricator.wikimedia.org/T256776 (10kzimmerman) 05Open→03Declined Both pages have been updated since this task was created, and this has not been a pressing issue for our team.
[19:31:37] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05): Flink Tables should have a default ROWTIME column. - https://phabricator.wikimedia.org/T324144 (10gmodena) According to [the doc]( https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/concepts/time_attributes/) we can...
[19:35:21] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05): Flink Tables should have a default ROWTIME column. - https://phabricator.wikimedia.org/T324144 (10Ottomata) > For this reason I went with option 1 and propose a change to the Catalog's getTable() method to add watermark metadata at run tim...
[19:40:36] <wikibugs>	 10Data-Engineering-Kanban, 10Product-Analytics, 10Wmfdata-Python, 10GitLab (Project Migration): Move Wmfdata-Python from Github to Gitlab - https://phabricator.wikimedia.org/T304544 (10nshahquinn-wmf)
[19:48:46] <wikibugs>	 10Analytics-Radar, 10Product-Analytics: Investigate running Stan models on GPU - https://phabricator.wikimedia.org/T286493 (10mpopov) 05Open→03Declined No real need for this or bandwidth in the foreseeable future to make progress on this.
[20:19:59] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:25:39] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Create a shared flink docker image - https://phabricator.wikimedia.org/T316519 (10Ottomata) Status update!  [[ https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/858356 | flink and flink-kub...
[20:30:16] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:38:25] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Shared-Data-Infrastructure: [SPIKE] Deploy event driven stateless Flink service to DSE cluster - https://phabricator.wikimedia.org/T320812 (10Ottomata)
[21:38:29] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Create a shared flink docker image - https://phabricator.wikimedia.org/T316519 (10Ottomata)
[22:46:23] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): NEW FEATURE REQUEST: Upgrade superset to 1.5.2 - https://phabricator.wikimedia.org/T323458 (10BTullis) a:03BTullis
[23:08:14] <wikibugs>	 10Data-Engineering-Planning, 10Observability-Alerting, 10Shared-Data-Infrastructure, 10Traffic: Reduce/eliminate false positives for VarnishKafkaNoMessages alert - https://phabricator.wikimedia.org/T324522 (10BTullis)