[00:18:13] 10Data-Engineering, 10Equity-Landscape: Load country data - https://phabricator.wikimedia.org/T310712 (10Mayakp.wiki) Hi @JAnstee_WMF and @ntsako I have provided my suggestions in T318850#8400320 . pls let me know if you need clarifications on any of them. I will be on vacation starting Monday Nov 21, 2022 a... [00:37:42] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:48:39] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [03:11:07] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [04:06:02] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [04:38:57] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [05:31:09] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [06:25:01] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:07:04] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10fgiunchedi) >>! In T319214#8400080, @Volans wrote: > @fgiunchedi @elukey I seeing some strange beha... [08:14:41] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:47:02] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Volans) >>! In T319214#8401405, @fgiunchedi wrote: > Yes very much possible, I believe the lag has... [08:47:39] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:48:13] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 05): [NEEDS GROOMING][SPIKE} Evaluate a pyflink version of Mediawiki Stream Enrichment - https://phabricator.wikimedia.org/T323217 (10gmodena) a:03gmodena [09:03:57] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Volans) And now (before the merge of the above patch) the data is back in sync. Hence it looks to m... [09:10:44] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10elukey) The other actor that plays a role in this pipeline is Druid: https://grafana.wikimedia.org/... [09:20:39] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [09:27:42] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10elukey) Another thing that we should discuss is how many Kafka partitions of `webrequest_{upload,te... [09:33:34] 10Data-Engineering-Planning, 10Data Pipelines, 10Foundational Technology Requests, 10Traffic, and 2 others: Add a webrequest sampled topic and ingest into druid/turnilo - https://phabricator.wikimedia.org/T314981 (10elukey) Status: We deployed benthos on two centrallog nodes, and we are now evaluating its... [09:58:13] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10fgiunchedi) >>! In T319214#8401493, @Volans wrote: > And now (before the merge of the above patch)... [10:00:28] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10KCVelaga_WMF) @ntsako ah, my had. I misread that. Thanks for clarifying. [10:01:20] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Volans) Ok, after refreshing a bit the query in this [[ https://superset.wikimedia.org/superset/exp... [10:35:25] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:11:37] ACKNOWLEDGEMENT - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Btullis T318659 - Adding this server to the list of other servers exhibiting this behaviour. https://wikitech.wikimedia.org/wiki/ [11:11:37] %23Monitoring [11:31:13] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Editors service - https://phabricator.wikimedia.org/T288305 (10SGupta-WMF) [11:41:05] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:03:08] (03PS1) 10Milimetric: Script a way to search oozie lineage [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858289 [12:07:38] (03CR) 10Milimetric: "just fyi in case you need to parse through oozie coordinators, it wasn't too bad" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858289 (owner: 10Milimetric) [12:44:22] 10Data-Engineering, 10Equity-Landscape: Affiliates input metric - https://phabricator.wikimedia.org/T309275 (10KCVelaga_WMF) affiliates grants QA comments | **column** | **alignment** | **reason** | total_calendar_year_grants| not aligned | not sure what's causing the issue here, but @ntsako the interim table... [13:20:19] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268 (10rook) I'm going to close this for now. If this is still desired please re-open with clarification on if the above is what is desired. [13:20:28] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268 (10rook) 05Open→03Declined [13:52:25] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [14:10:27] 10Data-Engineering, 10Event-Platform Value Stream, 10MW-1.40-notes (1.40.0-wmf.8; 2022-10-31): EventBus' stream config destination_event_service setting should move into producers.mediawikI_eventbus specific settings. - https://phabricator.wikimedia.org/T321557 (10JArguello-WMF) [14:10:43] 10Data-Engineering-Planning, 10Event-Platform Value Stream: Create a shared flink docker image - https://phabricator.wikimedia.org/T316519 (10Ottomata) a:03Ottomata [14:34:45] !log deploying updated hadoop packages to analytics-presto hosts [14:34:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:37:51] !log deploying updated hadoop packages to hue and yarn webservers [14:37:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:51:12] !log deploying updated hadoop packages to druid-analytics [14:51:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:52:08] !log deploying updated hadoop packages to druid-public [14:52:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:53:16] 10Data-Engineering-Planning: Security Related Task [Placeholder] - https://phabricator.wikimedia.org/T312620 (10BTullis) [14:55:11] 10Data-Engineering, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10Ottomata) [14:58:34] (03PS1) 10Milimetric: [WIP] Stream revision topics into iceberg table [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/858344 [15:02:19] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10Ottomata) We can, but it isn't done with the Puppet file resources, so there isn't... [15:03:10] (03CR) 10CI reject: [V: 04-1] [WIP] Stream revision topics into iceberg table [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/858344 (owner: 10Milimetric) [15:03:15] 10Data-Engineering, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10BTullis) No objection from me. Do we need any additional approval from elsewhere in #sre or can we just go ahead and make the change? Maybe @odimitrijevic could... [15:04:34] 10Data-Engineering-Planning, 10Machine-Learning-Team, 10Research: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10Ottomata) @Isaac, for sake of continuity, let's have this discussion over on... [15:06:25] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10BTullis) >>! In T306895#8402421, @Ottomata wrote: > We can, but it isn't done with... [15:12:04] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) 2 more questions to answer: ===== Nested vs flat/top level fields Right now, this schema uses Rows AKA S... [15:18:52] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) In https://phabricator.wikimedia.org/T317768#8400702 @Isaac wrote: > @Ottomata recognizing that this mig... [15:27:22] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) [15:28:00] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) [15:42:30] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/853376 (https://phabricator.wikimedia.org/T322379) (owner: 10Jenniferwang) [15:43:30] (03CR) 10Milimetric: [C: 04-1] "(downgrading eventutilities' guava version seems to work, this does not)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075 (owner: 10Ottomata) [15:49:42] elukey: hi, what's up?! :D I'm about to deploy refinery and it includes 2 changes to druid loading from you. Is there anything that needs to be done simultaneously with the deployment? or any job restarts? [15:52:44] mforns: o/ please go ahead, already applied them manually (to kick off the supervisors) [15:52:55] cool elukey thanks! [15:53:21] following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Realtime_indexation_to_Druid [15:53:23] !log started refinery deployment for weekly train (accompanying refinery-source 0.2.9) [15:53:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:07:14] !log finished refinery deployment [16:07:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:35:23] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [16:43:42] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Isaac) > It is not! We still want and need your feedback. That's why this is currently an 'rc0' stream, and the sch... [16:43:56] 10Quarry, 10good first task: Define in a single place the pseudoname of unnamed queries - https://phabricator.wikimedia.org/T197029 (10rook) https://github.com/toolforge/quarry/pull/13 [16:52:03] 10Data-Engineering, 10Event-Platform Value Stream, 10Wikimedia-production-error: EventBus: Error: Call to a member function isCurrent() on null - https://phabricator.wikimedia.org/T323294 (10brennen) [16:52:23] 10Data-Engineering, 10Event-Platform Value Stream, 10User-brennen, 10Wikimedia-production-error: EventBus: Error: Call to a member function isCurrent() on null - https://phabricator.wikimedia.org/T323294 (10brennen) [17:06:00] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:06:42] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [17:14:12] !log restarted mediawiki-denormalize-coord as part of weekly deployment train [17:14:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:17:32] (03PS2) 10Krinkle: navigationtiming: Add skin field [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857493 (https://phabricator.wikimedia.org/T323124) (owner: 10Phedenskog) [17:23:08] 10Data-Engineering, 10Equity-Landscape: Affiliates input metric - https://phabricator.wikimedia.org/T309275 (10JAnstee_WMF) >count_affiliates_in_country not aligned With reference to 2021 Data Reference for QA workbook, Ntsako's calculations are based on Official Affiliate Information 2021 (Mirror) whereas Jai... [17:32:29] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10JAnstee_WMF) >hist.total_historical_grants_to_date is for affiliated grants whereas the data.historical_grants_to_date is for both affiliate and non-affiliate grants. I'll edit the code on phabricator to... [18:02:58] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:08:52] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:19:32] 10Data-Engineering-Planning, 10Product-Analytics (Kanban): Superset Date Filter fix needed - https://phabricator.wikimedia.org/T318299 (10Mayakp.wiki) Confirmed! Superset will be upgraded to v2.0. @BTullis will create a new task and DE will try to prioritize it to be done by end of Q2. Request to DE: Please... [18:22:40] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [18:23:30] 10Data-Engineering, 10Data-Engineering-Kanban, 10Superset: Superset SQL Lab fails to stop query - https://phabricator.wikimedia.org/T293083 (10Mayakp.wiki) Noting here that this happened with @SNowick_WMF and I, recently. Closing the query (not STOP button) does work to kill the query. [19:08:06] 10Data-Engineering, 10Event-Platform Value Stream, 10User-brennen, 10Wikimedia-production-error: EventBus: Error: Call to a member function isCurrent() on null - https://phabricator.wikimedia.org/T323294 (10Ottomata) Interesting! I see there are some checks in the older EventBusHooks that guard against th... [19:26:54] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10Ottomata) > Yes, that's what I was thinking. Make it so that the exec in the define... [19:27:18] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10Ottomata) OR! You could get fancier and make an HDFS puppet file provider :) [19:28:22] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [19:34:09] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10jcrespo) > Do we need any additional approval from elsewhere in SRE or can we just go ahead and make the change Regarding approvals, if the change is j... [19:38:47] 10Data-Engineering, 10SRE, 10SRE-Access-Requests: Grant ssh access to analytics-admins to dcausse and gmodena - https://phabricator.wikimedia.org/T323280 (10jcrespo) According to Namely, Will and Guillome should approve for each + either Otto or Olja from your side (let me know if that is up to date). [19:39:20] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [19:52:19] 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) > I'm not following the aspect about page properties not being persisted through edits I don't know if I... [19:55:16] (03CR) 10Ottomata: [C: 03+1] "Huh, cool! Fine with me!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858289 (owner: 10Milimetric) [19:58:14] (03Abandoned) 10Ottomata: Bump guava version to match wikimedia-event-utiltiies version [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075 (owner: 10Ottomata) [20:00:37] (03CR) 10Ottomata: [WIP] Stream revision topics into iceberg table (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/858344 (owner: 10Milimetric) [20:02:25] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:10:20] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:20:32] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [21:20:55] (03CR) 10Milimetric: [C: 03+2] "merging since it's isolated" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858289 (owner: 10Milimetric) [21:21:00] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Script a way to search oozie lineage [analytics/refinery] - 10https://gerrit.wikimedia.org/r/858289 (owner: 10Milimetric) [22:02:57] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:08:53] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:39:15] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [23:02:15] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:08:11] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event_sanitized_analytics_immediate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:23:09] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring