[00:10:45] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[02:54:39] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:17:09] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[03:50:53] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[04:46:55] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[06:23:44] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Spike: [SPIKE] Build simple stateless service using Flink SQL - https://phabricator.wikimedia.org/T318856 (10gmodena) A summary of this spike, and evaluation of the approach, can be found at https://www.mediawiki.org/wiki/Platform_Engineering_T...
[07:24:05] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[07:44:36] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:37:28] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:39:01] <wikibugs>	 10Data-Engineering, 10API Platform, 10GraphQL, 10Pageviews-API: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935 (10EChetty)
[08:40:36] <wikibugs>	 10Analytics-Radar, 10Data-Engineering, 10API Platform, 10Pageviews-API, 10Tool-Pageviews: 429 Too Many Requests hit despite throttling to 100 req/sec - https://phabricator.wikimedia.org/T219857 (10EChetty)
[09:09:26] <icinga-wm>	 PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[09:20:12] <icinga-wm>	 RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[09:22:03] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Spike: [SPIKE] Build simple stateless service using Flink SQL - https://phabricator.wikimedia.org/T318856 (10gmodena) >>! In T318856#8314516, @Ottomata wrote: > This is SO COOL.  (btw, no code in https://gitlab.wikimedia.org/gmodena/flink-media...
[09:45:52] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[10:18:28] <icinga-wm>	 PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[10:20:18] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:29:11] <joal>	 btullis: I'm having an issue with kerberos - have we done done anything is that realm lately?
[10:29:23] <joal>	 btullis: oh, and hi - my apologizes
[10:31:32] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:37:14] <joal>	 nevermid btullis - I managed to make it work - I had a corrupted ticket it seems - no idea why
[10:40:56] <icinga-wm>	 RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[11:20:20] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10SRE, 10serviceops, and 2 others: eventgate chart should use common_templates - https://phabricator.wikimedia.org/T303543 (10Clement_Goubert) Just for confirmation before diving into it on Monday, the list of services to re-deploy is:  ` deployment-c...
[13:02:09] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10SRE, 10serviceops, and 2 others: eventgate chart should use common_templates - https://phabricator.wikimedia.org/T303543 (10Ottomata) Correct!
[13:03:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp2035 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2035%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[13:05:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[13:08:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp2035 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2035%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[13:45:14] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Output Rank Metrics - https://phabricator.wikimedia.org/T306618 (10KCVelaga_WMF) @JAnstee_WMF All the regional aggregations also now fully align. You can see the full comparison at https://docs.google.com/spreadsheets/d/1B9vZc8BI7zLrZhyM7XNAYQXN_XSMLyw5CJgBjxO...
[13:45:30] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Readership Output Rank Metrics - https://phabricator.wikimedia.org/T306617 (10KCVelaga_WMF) @JAnstee_WMF All the regional aggregations also now fully align. You can see the full comparison at https://docs.google.com/spreadsheets/d/1B9vZc8BI7zLrZhyM7XNAYQXN_XSMLyw5CJgBjxO...
[13:46:47] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Output Rank Metrics - https://phabricator.wikimedia.org/T306618 (10KCVelaga_WMF) @JAnstee_WMF All regional aggregations also align well now. You can see my comparisons at: https://docs.google.com/spreadsheets/d/1LhdxdVUCMXfmvK2xubnbvB7s4mRR5xEsnb2w1QUggmU/edit...
[13:50:21] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[14:38:35] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03), 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10lbowmaker)
[14:39:01] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 03): Refactor EventBus extension Hooks to use new hook system - https://phabricator.wikimedia.org/T320655 (10lbowmaker)
[15:32:45] <wikibugs>	 (03PS7) 10Mforns: Fix end-of-month/year allowed_interval issue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746)
[15:50:27] <wikibugs>	 (03CR) 10Mforns: [V: 03+2] "OK, I think this time it's good!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns)
[16:03:28] <wikibugs>	 (03CR) 10Joal: Fix end-of-month/year allowed_interval issue (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns)
[16:03:39] <joal>	 mforns: couple of comments to help my understanding :)
[16:03:46] <mforns>	 joal: sure!
[16:03:52] <mforns>	 looking
[16:16:09] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:18:44] <joal>	 hey mforns - would you give me a few minutes in da cave?
[16:18:51] <mforns>	 yessss joal 
[16:27:19] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[16:41:25] <wikibugs>	 (03CR) 10Joal: Fix end-of-month/year allowed_interval issue (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746) (owner: 10Mforns)
[16:44:55] <wikibugs>	 (03PS8) 10Mforns: XFix end-of-month/year allowed_interval issue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/836295 (https://phabricator.wikimedia.org/T316746)
[16:55:55] <joal>	 Hi dcausse - your job is using 1/2 of the cluster RAM :)
[16:56:20] <joal>	 dcausse: no big deal so far, but that's more than what we usually accept from users
[16:56:29] <joal>	 I need to leave now, I'll recheck later on 
[17:45:53] <wikibugs>	 10Data-Engineering, 10Data-Persistence, 10Image-Suggestions: Section Level Image Suggestions - Data Persistence Request - https://phabricator.wikimedia.org/T320831 (10lbowmaker)
[17:46:51] <wikibugs>	 (03PS3) 10Milimetric: [WIP] Collaborate on a new editors dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/838256
[18:13:14] <dcausse>	 joal: sorry about that, will stop it
[19:06:20] <joal>	 thanks dcausse :)
[19:15:53] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:38:23] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[19:40:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: (4) varnishkafka on cp5015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka  - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[19:45:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp5015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka  - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[20:34:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:19:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[21:31:11] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 02), 10Spike: [SPIKE] Build simple stateless service using PyFlink - https://phabricator.wikimedia.org/T318859 (10tchin) [[ https://gitlab.wikimedia.org/tchin/stateless-pyflink-examples/-/tree/main/ | Here's the with example datastream and table equiv...
[22:37:59] <icinga-wm>	 PROBLEM - MegaRAID on analytics1068 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[23:34:05] <icinga-wm>	 RECOVERY - MegaRAID on analytics1068 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring