[00:09:03] (03PS3) 10Kimberly Sarabia: Adds skin field in mobilewebuiactions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/968674 (https://phabricator.wikimedia.org/T350205) [00:11:23] (03CR) 10Kimberly Sarabia: "See https://phabricator.wikimedia.org/T350205 Thanks!" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/968674 (https://phabricator.wikimedia.org/T350205) (owner: 10Kimberly Sarabia) [01:36:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:37:50] (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:02:14] (PuppetDisabled) firing: Puppet disabled on dbstore1007:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [05:21:59] (PuppetDisabled) resolved: Puppet disabled on dbstore1007:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=misc&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [05:38:05] (SystemdUnitFailed) firing: monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:32] 10Data-Engineering, 10serviceops, 10Event-Platform: Upgrade change propagation to nodejs18 - https://phabricator.wikimedia.org/T348950 (10elukey) [08:21:15] (EventgateValidationErrors) firing: ... [08:21:16] eventgate-analytics-external stream eventlogging_WMDEBannerSizeIssue validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [09:02:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:07:50] (03CR) 10Michael Große: "This change is ready for review." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970723 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [09:17:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:27:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:32:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:01:50] (03CR) 10Cyndywikime: "Done" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [10:02:16] (03PS13) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [10:02:45] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [10:11:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [10:27:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:27:53] (03PS14) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [10:28:20] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [10:32:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:37:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:40:26] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix Grafana dashboard links to new format (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970415 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:40:59] (03Merged) 10jenkins-bot: Fix Grafana dashboard links to new format [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970415 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:42:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:43:29] (03PS1) 10Lucas Werkmeister (WMDE): Fix Grafana dashboard links to new format [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970365 (https://phabricator.wikimedia.org/T348644) [10:44:14] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix Grafana dashboard links to new format [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970365 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [10:44:51] (03Merged) 10jenkins-bot: Fix Grafana dashboard links to new format [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970365 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [10:48:31] (03PS15) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [10:49:05] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [10:49:20] (03CR) 10Lucas Werkmeister (WMDE): "Thanks a lot for tracking these down!" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970416 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:49:23] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix Grafana links to a different dashboard [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970416 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:49:54] (03Merged) 10jenkins-bot: Fix Grafana links to a different dashboard [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970416 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:52:02] (03PS1) 10Lucas Werkmeister (WMDE): Fix Grafana links to a different dashboard [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970746 (https://phabricator.wikimedia.org/T348644) [10:52:35] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix Grafana links to a different dashboard [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970746 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [10:53:08] (03CR) 10WMDE-Fisch: [C: 03+2] "Thanks!" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970723 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [10:53:26] (03Merged) 10jenkins-bot: Fix Grafana links to a different dashboard [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970746 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [10:57:28] (03PS16) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [10:57:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:58:00] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [11:02:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:04:05] (03PS1) 10Gerrit maintenance bot: Add dga.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/969998 (https://phabricator.wikimedia.org/T350229) [11:05:12] (03PS1) 10Gerrit maintenance bot: Add bjn.wikiquote to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/970000 (https://phabricator.wikimedia.org/T350235) [11:06:36] (03PS1) 10Gerrit maintenance bot: Add zgh.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/970004 (https://phabricator.wikimedia.org/T350241) [11:17:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:20:23] (03PS17) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [11:20:56] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [11:22:17] 10Data-Platform-SRE, 10Patch-For-Review: [Airflow] Setup Airflow instance for WMDE - https://phabricator.wikimedia.org/T340648 (10Stevemunene) While setting up the wmde instance, we noticed an error on the deployment server in the initial setup of the scap repo. As per the instance creation instructions we use... [11:25:05] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "I guess this implies that the preferred workflow for a new dashboard / file is" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970417 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [11:25:55] (03Merged) 10jenkins-bot: Add missing links to Grafana dashboards using the data [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970417 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [11:26:01] (03Merged) 10jenkins-bot: Fix/Add Grafana links for technical wishes scripts [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970723 (https://phabricator.wikimedia.org/T348644) (owner: 10Michael Große) [11:26:28] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [11:26:46] (03PS1) 10Lucas Werkmeister (WMDE): Add missing links to Grafana dashboards using the data [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970747 (https://phabricator.wikimedia.org/T348644) [11:27:58] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Add missing links to Grafana dashboards using the data [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970747 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [11:28:29] (03Merged) 10jenkins-bot: Add missing links to Grafana dashboards using the data [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970747 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [11:29:13] (03PS1) 10Lucas Werkmeister (WMDE): Fix/Add Grafana links for technical wishes scripts [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970748 (https://phabricator.wikimedia.org/T348644) [11:29:46] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix/Add Grafana links for technical wishes scripts [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970748 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [11:30:18] (03Merged) 10jenkins-bot: Fix/Add Grafana links for technical wishes scripts [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970748 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [11:37:10] (03PS18) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [11:37:43] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [11:40:46] (03PS19) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [11:41:17] (03CR) 10CI reject: [V: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) (owner: 10Cyndywikime) [11:46:21] (03PS20) 10Cyndywikime: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T300273) [11:47:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:02:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:21:31] (EventgateValidationErrors) firing: ... [12:21:31] eventgate-analytics-external stream eventlogging_WMDEBannerSizeIssue validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [12:54:12] (03CR) 10WMDE-Fisch: "This change is ready for review." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970769 (https://phabricator.wikimedia.org/T348644) (owner: 10WMDE-Fisch) [12:57:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:02:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:04:47] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/31 [13:07:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:12:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:27:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:32:50] (SystemdUnitFailed) firing: (3) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:36:03] 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] mw-page-content-change-enrich should (re)produce kafka keys - https://phabricator.wikimedia.org/T338231 (10gmodena) > This is a fine starting point, but I don't think we should ultimately limit ourse... [13:52:14] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm [13:52:30] PROBLEM - Webrequests Varnishkafka log producer on cp1113 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [13:52:34] PROBLEM - Webrequests Varnishkafka log producer on cp1114 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [14:00:56] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "The change on its own looks okay to me, but it looks like the script hasn’t collected any data since almost four years ago… https://grafan" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970769 (https://phabricator.wikimedia.org/T348644) (owner: 10WMDE-Fisch) [14:03:53] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10cmooney) @VRiley-WMF cloudvirt-wdqs1002 is showing a media/cable failure when it tries to boot over network: {F41426317,width=600} That could be that the... [14:04:47] 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] mw-page-content-change-enrich should (re)produce kafka keys - https://phabricator.wikimedia.org/T338231 (10Ottomata) > Do you maybe have any use case in mind No, but I think the Flink library code sh... [14:21:50] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10cmooney) >>! In T346948#9291643, @VRiley-WMF wrote: > cloudvirt-wdqs1003 has been relocated > > cloudvirt-wdqs1003 - C 8. U 21. port 18. CableID 4015 >... [14:34:26] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10VRiley-WMF) @cmooney I have replaced the DAC cable and updated Netbox with the CableID; also I reseated the NIC for good measure. It is plugged into the sa... [14:40:15] 10Data-Platform-SRE, 10serviceops-radar, 10Discovery-Search (Current work), 10Epic: Estimate cirrus streaming updater's usage of MWAPI - https://phabricator.wikimedia.org/T350185 (10bking) Met with @pfischer today about this topic. He pointed out that we're doing a phased rollout (gradually increasing the... [14:42:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:43:57] 10Data-Platform-SRE, 10Discovery-Search, 10serviceops-radar, 10Epic: Estimate cirrus streaming updater's usage of MWAPI - https://phabricator.wikimedia.org/T350185 (10bking) [14:44:12] 10Data-Platform-SRE, 10Discovery-Search, 10serviceops-radar, 10Epic: Estimate cirrus streaming updater's usage of MWAPI - https://phabricator.wikimedia.org/T350185 (10bking) [14:44:18] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) I added `--trace-gc` to nodejs CLI for eventgate-analytics canary (node18... [14:53:02] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) Hm, what happened to eventgate cpu and mem resource requests and limits i... [14:57:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:02:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:07:50] (SystemdUnitFailed) firing: (4) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:12:27] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm executed wit... [15:12:50] (SystemdUnitFailed) firing: (5) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:17:50] (SystemdUnitFailed) firing: (6) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:23:23] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm [15:27:50] (SystemdUnitFailed) firing: (6) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:28:15] 10Data-Engineering, 10Data Engineering and Event Platform Team, 10serviceops, 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10JMeybohm) >>! In T349823#9288286, @Ottomata wrote: >> Although this will still not make e... [15:35:51] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) > Hm, what happened to eventgate cpu and mem resource requests and limits... [15:35:59] I shall be rebooting stat1008 in 5, the host seems stuck and slow with super high cpu load from around at around 07:55:00 UTC [15:47:50] (SystemdUnitFailed) firing: (6) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:49:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [15:56:54] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10cmooney) @Jclark-ctr had a look at the NIC riser card wasn't properly seated. After re-seating the card the server connection seems to be working, current... [15:58:55] !log powercyle stat1008, host is frozen/stuck in an unresponsive state [15:58:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:02:50] (SystemdUnitFailed) firing: (6) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:03:35] ryankemper, stevemunene, brouberol: we're in https://meet.google.com/ztw-tpvg-nyq for our learning circle [16:05:08] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b9bd4e38-25ed-4ed0-bdf7-47bd52027bdc) set by cmooney@cumin1001 for 1:00:00 on 1 host(s) an... [16:09:28] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [16:17:50] (SystemdUnitFailed) firing: (6) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:21:31] (EventgateValidationErrors) firing: ... [16:21:31] eventgate-analytics-external stream eventlogging_WMDEBannerSizeIssue validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [16:31:13] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) I deployed eventgate-analytics in eqiad with `--max_semi_space_size=32`.... [16:42:48] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) Looks [[ https://grafana.wikimedia.org/goto/wak5BN4Sz?orgId=1 | CPU has g... [16:49:16] (03CR) 10WMDE-Fisch: Fix list of current beta features (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970769 (https://phabricator.wikimedia.org/T348644) (owner: 10WMDE-Fisch) [17:08:01] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix list of current beta features (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970769 (https://phabricator.wikimedia.org/T348644) (owner: 10WMDE-Fisch) [17:08:35] (03Merged) 10jenkins-bot: Fix list of current beta features [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970769 (https://phabricator.wikimedia.org/T348644) (owner: 10WMDE-Fisch) [17:09:02] (03PS1) 10Lucas Werkmeister (WMDE): Fix list of current beta features [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970753 (https://phabricator.wikimedia.org/T348644) [17:09:34] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Fix list of current beta features [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970753 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [17:10:11] (03Merged) 10jenkins-bot: Fix list of current beta features [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970753 (https://phabricator.wikimedia.org/T348644) (owner: 10Lucas Werkmeister (WMDE)) [17:16:20] (03PS1) 10Lucas Werkmeister (WMDE): Add or update some more Grafana links [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/970816 (https://phabricator.wikimedia.org/T348644) [17:20:42] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) Same CPU and latency and memory pattern after deploying to eventgate-anal... [17:20:55] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm completed: -... [17:37:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:39:20] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10taavi) I believe this is all done. Thank you everyone! [17:40:48] 10Data-Platform-SRE, 10Cloud-VPS, 10SRE, 10cloud-services-team, 10ops-eqiad: Move cloudvirt-wdqs hosts - https://phabricator.wikimedia.org/T346948 (10taavi) 05Open→03Resolved [17:47:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:52:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:02:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:07:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:17:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:22:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:32:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:37:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:44:59] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on an-test-client1002:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [18:47:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:55:54] I'm running another greedy process on stat1009, it could take a few hours: beam.smp(1540133). Please feel free to kill if needed, it's resumable. [19:07:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:17:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:18:36] 10Data-Platform-SRE, 10Discovery-Search: Track and clean up object storage used by rdf-streaming-updater - https://phabricator.wikimedia.org/T348685 (10bking) [19:22:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:22:53] 10Data-Platform-SRE, 10Discovery-Search: Track and clean up object storage used by rdf-streaming-updater - https://phabricator.wikimedia.org/T348685 (10bking) [19:28:18] 10Data-Engineering, 10Data Pipelines, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, and 2 others: [Event Platform] eventgate-wikimedia occasionally fails to produce events due to stream config errors - https://phabricator.wikimedia.org/T326002 (10cjming) just fyi there have been... [19:32:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:41:06] 10Data-Engineering, 10Data Pipelines, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, and 2 others: [Event Platform] eventgate-wikimedia occasionally fails to produce events due to stream config errors - https://phabricator.wikimedia.org/T326002 (10Ottomata) Thank you, looking. Po... [19:51:14] 10Data-Engineering, 10Data Pipelines, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, and 2 others: [Event Platform] eventgate-wikimedia occasionally fails to produce events due to stream config errors - https://phabricator.wikimedia.org/T326002 (10Ottomata) I just ran a manual pro... [20:00:14] 10Data-Platform-SRE, 10Discovery-Search: Track and clean up object storage used by rdf-streaming-updater - https://phabricator.wikimedia.org/T348685 (10bking) We got another alert for Swift disk usage today. Rather than have an automated cleanup process, I wonder if it would be useful to set a TTL on our objec... [20:17:02] 10Analytics, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: [Event Platform] Enable canary events for all MediaWiki streams - https://phabricator.wikimedia.org/T266798 (10Ottomata) Canary events [[ https://wikitech.wikimedia.org/w/index.ph... [20:21:20] 10Data-Engineering: ProduceCanaryEvents job should be scheduled by Airflow - https://phabricator.wikimedia.org/T341229 (10Ottomata) Another idea: instead of migrating to airflow, make this a dedicated long lived service and run in k8s. We could add metrics about produced events. [20:21:31] (EventgateValidationErrors) firing: ... [20:21:31] eventgate-analytics-external stream eventlogging_WMDEBannerSizeIssue validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [20:27:32] 10Data-Engineering, 10Data Engineering and Event Platform Team: ProduceCanaryEvents job should be scheduled by Airflow - https://phabricator.wikimedia.org/T341229 (10Ottomata) [20:41:22] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) Hm, there is [[ URL | CPU throttling on eventgate-analytics-external ]] (... [20:51:24] 10Data-Engineering, 10EventStreams, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: eventgate: eventstreams: update nodejs and OS - https://phabricator.wikimedia.org/T347477 (10Ottomata) Hm, is the [[ https://grafana.wikimedia.org/goto/DGSPmD4Sz?orgId=1 | TLS... [21:22:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:31:16] (EventgateValidationErrors) resolved: ... [21:31:16] eventgate-analytics-external stream eventlogging_WMDEBannerSizeIssue validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors [21:32:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:47:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:58:23] 10Data-Platform-SRE, 10Patch-For-Review: Improve data-reload cookbook based on graph split needs - https://phabricator.wikimedia.org/T349011 (10RKemper) [22:02:50] (SystemdUnitFailed) firing: (2) monitor_refine_event.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:03:23] (03PS21) 10Urbanecm: Add analytics for Impressions, Success and Abandonment rate for temporary Users [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T346327) (owner: 10Cyndywikime) [22:26:54] (03CR) 10Urbanecm: [C: 04-1] Add analytics for Impressions, Success and Abandonment rate for temporary Users (037 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/962569 (https://phabricator.wikimedia.org/T346327) (owner: 10Cyndywikime) [22:45:14] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on an-test-client1002:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange