[00:30:20] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:31:44] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:06:45] <wikibugs>	 10Data-Engineering, 10Pageviews-API, 10Data Pipelines (Sprint 09): Missing Pageviews Data (projectviews-20230220-230000) - https://phabricator.wikimedia.org/T330184 (10Predata-Datasci) Data now available at both links.
[05:06:56] <wikibugs>	 10Data-Engineering, 10Pageviews-API, 10Data Pipelines (Sprint 09): Missing Pageviews Data (projectviews-20230220-230000) - https://phabricator.wikimedia.org/T330184 (10Predata-Datasci) 05Open→03Resolved
[06:09:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[06:24:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[06:30:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[06:40:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[08:14:28] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10nfraison)
[08:14:40] <nfraison>	     !log Reimage an-presto1004 to upgrade to bullseye T329361
[08:14:41] <stashbot>	 T329361: Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361
[08:18:02] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by nfraison@cumin1001 for host an-presto1004.eqiad.wmnet with OS bullseye
[09:03:05] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by nfraison@cumin1001 for host an-presto1004.eqiad.wmnet with OS bullseye e...
[09:08:39] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10nfraison)
[09:31:51] <icinga-wm>	 PROBLEM - Check if active EventStreams endpoint is delivering messages. on alert1001 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[09:34:56] <nfraison>	     !log Reimage an-presto1005 to upgrade to bullseye T329361
[10:01:48] <icinga-wm>	 RECOVERY - Check if active EventStreams endpoint is delivering messages. on alert1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration
[10:05:04] <icinga-wm>	 RECOVERY - Check systemd state on an-airflow1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:08:19] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10dcausse) `org.wikimedia.analytics.refinery.job.ProduceCanaryEvents` seems stuck since yesterday maint operat...
[10:22:23] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by nfraison@cumin1001 for host an-presto1005.eqiad.wmnet with OS bullseye c...
[10:24:13] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 09): Upgrade Presto servers to Bullseye - https://phabricator.wikimedia.org/T329361 (10nfraison)
[11:07:25] <nfraison>	 !log roll restart presto clusters to take in account fix on node.environment typo
[11:07:26] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:17:49] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10gmodena) @dcausse taking a look. It seems adjacent to #event-platform_value_stream. You caught me in a blind...
[11:23:12] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10dcausse) I think @Ottomata used to take care of this and was not sure what tags to add so please feel free t...
[13:24:00] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Patch-For-Review: [Flink Operations] How to handle restarting a Flink application - https://phabricator.wikimedia.org/T328563 (10gmodena) I have a working setup on minikube that manages restarts and HA using the flink k8s operator, min...
[13:35:54] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1108 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:36:46] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1108 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[13:56:49] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10lbowmaker) @mforns - maybe something for Ops week
[14:01:34] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1108 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:02:24] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1108 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[14:20:47] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Affiliates input metrics - https://phabricator.wikimedia.org/T330295 (10ntsako)
[14:27:36] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream: Refactor Image Suggestions Feedback > Cassandra Flink Job and Deploy to DSE k8s - https://phabricator.wikimedia.org/T329524 (10gmodena) We'll need to do some ops work to run this application on k8s. Namely: 1. Provide a docker image; this requires set...
[14:29:15] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10Epic: Make Realtime MediaWiki XML content dump available for external  consumption - https://phabricator.wikimedia.org/T330296 (10lbowmaker)
[14:35:15] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream: Refactor Image Suggestions Feedback > Cassandra Flink Job and Deploy to DSE k8s - https://phabricator.wikimedia.org/T329524 (10Ottomata) Are we sure we want to deploy to DSE k8s?  Which Cassandra clusters are we writing to?  Do we write in both DCs, o...
[15:13:11] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10SRE: > ~1 request/second to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10CDanis)
[15:13:42] <wikibugs>	 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's  project-title-country data - https://phabricator.wikimedia.org/T267283 (10Nuria) 05Open→03Resolved
[15:13:49] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10Data-release, 10Privacy Engineering, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339 (10Nuria)
[15:37:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[15:57:24] <ebernhardson>	 we're seeing a variety of problems with refined data from event's not showing up in hdfs, the last event.mediawiki_revision_score for datacenter=codfw is 20230221T10 (yesterday 10am UTC). Is this also happening elsewhere?
[15:57:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[15:57:41] <ebernhardson>	 suggests that canary events aren't firing 
[16:03:25] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Ottomata) @dcausse yes the plan is to move this job to airflow eventually.    Its a little tricky because it...
[16:03:48] <inflatador>	 ottomata ^^ any idea what's happening w/the canary stuff ?
[16:04:53] <ottomata>	 looking, also see  davids ticket https://phabricator.wikimedia.org/T330236
[16:05:05] <ottomata>	 (sorry, a little slow, am catching up with emails and chats since last wed)
[16:05:21] <ottomata>	 if codfw was totally depooled
[16:05:26] <ottomata>	 codfw k8s
[16:05:29] <ottomata>	 this makes sense.
[16:06:06] <ottomata>	 canary events are explicitly produced to e.g. eventgate-main.svc.{eqiad,codfw}.wmnet, instead of .discovery.wmnet, so that we push canary events through both DCs
[16:06:18] <ottomata>	 if codfw is offline, then no canary event will be produced in codfw
[16:07:23] <ottomata>	 it looks like codfw is backup now
[16:08:49] <inflatador>	 ACK, thanks ottomata !
[16:10:10] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Ottomata) @gmodena for reference: - https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#...
[16:10:30] <icinga-wm>	 PROBLEM - Check systemd state on an-airflow1005 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_airflow-webserver@search.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:17:42] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Ottomata) This does indeed look like it was caused by {T329664}.  The ProduceCanaryEvents job is configured...
[16:40:55] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Ottomata) Also relevant context: - {T252585} - {T266798}
[16:59:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: Last successful gobblin run of job webrequest was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:01:37] <Lucas_WMDE>	 hi folks! I think I need help from someone with root rights on stat1007 to inspect / restart a systemd unit there: T330311
[17:01:38] <stashbot>	 T330311: wmde-analytics-minutely.service is no longer running on stat1007 - https://phabricator.wikimedia.org/T330311
[17:01:53] <Lucas_WMDE>	 (hoping this is the right channel for that sort of request, sorry if not)
[17:02:42] <sukhe>	 Lucas_WMDE: on it
[17:02:46] <Lucas_WMDE>	 thanks <3
[17:03:04] <sukhe>	 Lucas_WMDE: done
[17:03:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[17:03:41] <Lucas_WMDE>	 amazing
[17:05:34] <Lucas_WMDE>	 hm, but now it says the service is “active (exited) since … 2min 26s ago”
[17:06:09] <Lucas_WMDE>	 and I only see one new datapoint in grafana so far
[17:06:54] <sukhe>	 ah so it's being called by a timer
[17:07:12] <Lucas_WMDE>	 yeah
[17:07:31] <Lucas_WMDE>	 and I think as long as the service is active, even if exited, the timer won’t restart it
[17:07:40] <Lucas_WMDE>	 idk why it stays active though
[17:08:00] <Lucas_WMDE>	 I wonder if it’s the RemainAfterExit=yes
[17:08:26] <sukhe>	 timer is still running fwiw
[17:08:43] <Lucas_WMDE>	 the unit seems to have been changed, at least https://phabricator.wikimedia.org/T330311#8637991
[17:09:23] <Lucas_WMDE>	 ah yes, https://gerrit.wikimedia.org/r/c/operations/puppet/+/890843 looks quite related
[17:09:33] <Lucas_WMDE>	 I guess this is moving outside of the -analytics jurisdiction though
[17:09:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (2) Last successful gobblin run of job webrequest was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:09:51] <sukhe>	 ah
[17:15:09] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10dcausse) Are there ways to unblock it? It's causing plenty of hourly jobs to fail on our side.
[17:19:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (4) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:24:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (5) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:28:49] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data-Engineering-Planning, 10Data Pipelines: Merge Ks-Arab and Ks-Deva to ks - https://phabricator.wikimedia.org/T314476 (10Iflaq) The request is about only using "ks" language code for Kashmiri language and discontinuing 'ks-deva'. Currently "ks-arab', and "ks-deva" a...
[17:29:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (6) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:32:53] <wikibugs>	 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF)
[17:34:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (7) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:38:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[17:38:35] <ottomata>	 investigating ^ 
[17:38:37] <ottomata>	 thrwead in slack
[17:38:38] <ottomata>	 somethign is wrong.
[17:44:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (7) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[17:49:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (7) Last successful gobblin run of job event_default was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[18:14:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (5) Last successful gobblin run of job event_default_test was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[18:19:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (5) Last successful gobblin run of job event_default_test was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[18:47:59] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Refine drops $schema field values - https://phabricator.wikimedia.org/T255818 (10Ottomata) Not yet, still waiting on spark 3 upgrade.  See last comment.
[18:49:49] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10MediaWiki-extensions-EventLogging: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) Nope, not yet. {T259163} and {T282131}, and then also all the existent mobile app usages need to be...
[18:51:29] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 09), 10Patch-For-Review: [Flink Operations] How to handle restarting a Flink application - https://phabricator.wikimedia.org/T328563 (10Ottomata) > permissions to create, edit, delete ConfigMaps Yes, we got it!  FWIW, we MAYYYBYE will want to...
[18:53:21] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Ottomata) It should be fixed for recent data, since they turned codfw back on.  There is no (easy) way to pr...
[19:24:00] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Event partitions missing since  2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10EBernhardson) I'm working through marking everything so airflow will run now. The basic idea is you have to...
[19:24:48] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) firing: (2) Last successful gobblin run of job event_default_test was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin  - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[19:39:45] <mforns>	 !log restarted the following an-launcher1002 timers, which seemed stuck (next run = n/a): gobblin-webrequest.timer, reportupdater-browser.timer, reportupdater-reference-previews.timer, refine_event.timer, refine_eventlogging_legacy.timer
[19:39:46] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[20:40:33] <jinxer-wm>	 (GobblinLastSuccessfulRunTooLongAgo) resolved: Last successful gobblin run of job webrequest was more than 2 hours ago. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest - https://alerts.wikimedia.org/?q=alertname%3DGobblinLastSuccessfulRunTooLongAgo
[22:27:28] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10SRE, 10Traffic: Requests to (hard) redirect pages return their target's contents but are counted as pageviews to the redirect page - https://phabricator.wikimedia.org/T125015 (10BCornwall) p:05Medium→03Triage
[22:39:20] <wikibugs>	 (03PS1) 10Urbanecm: Add analytics/mediawiki/mentor_dashboard/personalized_praise [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/891368 (https://phabricator.wikimedia.org/T325117)
[22:40:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add analytics/mediawiki/mentor_dashboard/personalized_praise [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/891368 (https://phabricator.wikimedia.org/T325117) (owner: 10Urbanecm)
[22:41:08] <wikibugs>	 (03PS2) 10Urbanecm: Add analytics/mediawiki/mentor_dashboard/personalized_praise [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/891368 (https://phabricator.wikimedia.org/T325117)
[22:41:39] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add analytics/mediawiki/mentor_dashboard/personalized_praise [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/891368 (https://phabricator.wikimedia.org/T325117) (owner: 10Urbanecm)
[22:44:11] <wikibugs>	 (03PS3) 10Urbanecm: Add analytics/mediawiki/mentor_dashboard/personalized_praise [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/891368 (https://phabricator.wikimedia.org/T325117)
[22:57:28] <wikibugs>	 10Data-Engineering, 10Data-Persistence, 10Discovery-Search, 10Infrastructure-Foundations, and 8 others: eqiad row B switches upgrade - https://phabricator.wikimedia.org/T330165 (10colewhite)