[00:09:34] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, 10observability: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Volans) @elukey thanks a lot for this live data! That's awesome! I went to the Data Engineerin... [06:01:12] (VarnishkafkaNoMessages) firing: (5) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [06:06:12] (VarnishkafkaNoMessages) resolved: (6) varnishkafka on cp2027 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [06:19:52] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:31:42] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:43:42] 10Data-Engineering, 10AQS 2.0 Roadmap, 10API Platform (API Platform Roadmap), 10Epic, and 2 others: AQS 2.0: Unique Devices service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) [08:02:48] (03PS1) 10Elukey: druid: add cache_status to the webrequest_sampled supervisor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) [08:06:33] (03PS2) 10Elukey: druid: add cache_status to the webrequest_sampled supervisor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) [08:08:06] (03CR) 10Filippo Giunchedi: [C: 03+1] "See inli" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [08:08:11] (03CR) 10Volans: "Thanks for the patch, it would be great to have it. [non voting as I have zero context on this config]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [08:36:59] (03CR) 10Joal: [C: 03+1] "LGTM - merge at will :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [09:10:09] (03CR) 10Elukey: [V: 03+2 C: 03+2] druid: add cache_status to the webrequest_sampled supervisor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857408 (https://phabricator.wikimedia.org/T314981) (owner: 10Elukey) [09:11:33] !log update the webrequest sampled live supervisor on Druid Analytics after https://gerrit.wikimedia.org/r/857408 [09:11:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:43:18] (03PS1) 10Phedenskog: Add skin to navtiming. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857493 (https://phabricator.wikimedia.org/T323124) [09:56:59] (03PS4) 10Aqu: Put wikihadoop into refinery/source [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) [10:00:38] (03CR) 10Aqu: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [10:57:33] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10KCVelaga_WMF) Thanks @ntsako there is mis-alignment at the historical and annual aggregation of grants, which probably rolling into the change as well. However, the issue doesn't seem to be with with the... [10:58:22] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10KCVelaga_WMF) a:05ntsako→03JAnstee_WMF assigning back to Jaime for now, to check for differences in the base datasets. [11:12:05] PROBLEM - MegaRAID on an-worker1093 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [11:33:02] RECOVERY - MegaRAID on an-worker1093 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [12:23:01] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: RAID battery alert in an-worker1085 - https://phabricator.wikimedia.org/T318659 (10BTullis) [12:23:03] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): RAID battery alert in an-worker1083 - https://phabricator.wikimedia.org/T321809 (10BTullis) [12:23:05] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: RAID battery alert in an-worker1093 - https://phabricator.wikimedia.org/T313130 (10BTullis) [12:23:07] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: RAID battery alert in an-worker1089 - https://phabricator.wikimedia.org/T314838 (10BTullis) [12:51:23] 10Data-Engineering-Planning, 10DC-Ops, 10Shared-Data-Infrastructure: Multiple RAID battery failures on hadoop worker hosts - https://phabricator.wikimedia.org/T318659 (10BTullis) [12:58:50] 10Data-Engineering: Add shell username ntsako to archiva-deployers - https://phabricator.wikimedia.org/T323213 (10ntsako) [13:06:07] PROBLEM - MegaRAID on an-worker1093 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [13:12:59] 10Data-Engineering, 10SRE-Access-Requests: Add shell username ntsako to archiva-deployers - https://phabricator.wikimedia.org/T323213 (10BTullis) p:05Triage→03Medium a:03BTullis I'm adding the #sre-access-requests tag for visibility, but I'll carry out this work [13:16:41] RECOVERY - MegaRAID on an-worker1093 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [13:17:25] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Event-Platform Value Stream (Sprint 04): [SPIKE] Deploy event driven stateless Flink service to DSE cluster - https://phabricator.wikimedia.org/T320812 (10gmodena) [13:17:27] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Event-Platform Value Stream (Sprint 04): [SPIKE] Deploy event driven stateless Flink service to DSE cluster - https://phabricator.wikimedia.org/T320812 (10gmodena) [13:19:28] 10Data-Engineering, 10SRE-Access-Requests: Add shell username ntsako to archiva-deployers - https://phabricator.wikimedia.org/T323213 (10BTullis) Hi @ntsako - I've added you to that group now. You should be able to deploy to archiva and verify your group membership here: https://ldap.toolforge.org/group/archiv... [13:19:44] 10Data-Engineering, 10SRE-Access-Requests: Add shell username ntsako to archiva-deployers - https://phabricator.wikimedia.org/T323213 (10BTullis) 05Open→03Resolved [13:21:01] 10Data-Engineering, 10SRE-Access-Requests: Add shell username ntsako to archiva-deployers - https://phabricator.wikimedia.org/T323213 (10ntsako) Thank you for the prompt assistance @BTullis [13:24:16] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING][SPIKE} Evaluate a pyflink version of Mediawiki Stream Enrichment - https://phabricator.wikimedia.org/T323217 (10gmodena) [13:30:16] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05)): an-worker1090 MegaRaid issues - https://phabricator.wikimedia.org/T315748 (10BTullis) [13:46:33] 10Data-Engineering, 10Event-Platform Value Stream: [NEEDS GROOMING][SPIKE} Evaluate a pyflink version of Mediawiki Stream Enrichment - https://phabricator.wikimedia.org/T323217 (10Ottomata) Best SQL Example [[ https://gist.github.com/ottomata/bc583fac4cafc4d7651db463dc755c9e | here ]]. Will be much better wit... [13:57:45] (03CR) 10Joal: "I have seen this package version causing issues, latest in Gobblin for instance. I think it's used in Hadoop, and possibly spark, and ther" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075 (owner: 10Ottomata) [13:59:18] (03CR) 10Joal: [C: 03+1] Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [14:02:04] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10JAllemandou) > My next question: What can we do to update to keep the contents of t... [14:05:24] (03CR) 10Urbanecm: [C: 03+1] "lgtm" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/850125 (https://phabricator.wikimedia.org/T320826) (owner: 10Kosta Harlan) [14:27:41] (03CR) 10Ottomata: Put wikihadoop into refinery/source (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/856530 (https://phabricator.wikimedia.org/T321168) (owner: 10Aqu) [14:53:49] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10KCVelaga_WMF) @ntsako in the final SELECT statement, I see two columns with the same label, `total_historical_grants_to_date` i.e. ` data.historical_grants_to_date AS... [14:54:00] 10Data-Engineering, 10Equity-Landscape: Affiliates input metric - https://phabricator.wikimedia.org/T309275 (10KCVelaga_WMF) I was able to QA the metrics exclusively based on affiliate_data_csv (i.e. not anything related to grants), I will review them tomorrow, as they are tied to grants table. QC sheet: http... [15:04:57] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 05): [NEEDS GROOMING][SPIKE} Evaluate a pyflink version of Mediawiki Stream Enrichment - https://phabricator.wikimedia.org/T323217 (10lbowmaker) [15:14:58] 10Data-Engineering, 10Equity-Landscape: Grants input metric - https://phabricator.wikimedia.org/T309276 (10ntsako) @KCVelaga_WMF the labelling does not correspond to the underlying table. The difference between the two is that the `hist.total_historical_grants_to_date` is for affiliated grants whereas the `da... [15:38:14] 10Data-Engineering-Radar, 10Cassandra: Bootstrap new Cassandra nodes (eqiad) - https://phabricator.wikimedia.org/T307802 (10Eevans) [15:48:13] (03CR) 10Phuedx: [C: 03+2] Add http client_ip to iOS schemas (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) (owner: 10Mazevedo) [15:48:49] (03Merged) 10jenkins-bot: Add http client_ip to iOS schemas [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/855675 (https://phabricator.wikimedia.org/T322790) (owner: 10Mazevedo) [15:56:11] (03CR) 10Ottomata: Bump guava version to match wikimedia-event-utiltiies version (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075 (owner: 10Ottomata) [15:56:35] (03CR) 10Ottomata: Bump guava version to match wikimedia-event-utiltiies version (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857075 (owner: 10Ottomata) [16:03:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [16:08:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [17:18:22] 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10Machine-Learning-Team, 10Observability-Logging, and 2 others: Evaluate Benthos as stream processor - https://phabricator.wikimedia.org/T319214 (10Volans) @fgiunchedi @elukey I seeing some strange behaviour of the data in the dashboard, not sure... [17:50:07] 10Data-Engineering-Planning, 10Data Pipelines, 10Release-Engineering-Team, 10serviceops-collab, 10GitLab (CI & Job Runners): Experiencing pipeline failure due to disk-space issues - https://phabricator.wikimedia.org/T310593 (10brennen) [18:02:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp5007 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp5007%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:03:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp1076 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:07:13] (VarnishkafkaNoMessages) resolved: (3) varnishkafka on cp4037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:08:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp1076 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:11:07] ^ you will see a bunch of these messages [18:11:14] we are restarting varnish fleet-wide so it's expected [18:11:14] thanks [18:24:12] (VarnishkafkaNoMessages) firing: (4) varnishkafka on cp1078 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:24:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2029 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2029%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:29:12] (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp1078 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:29:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp2029 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2029%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:45:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:46:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4047%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:46:13] (VarnishkafkaNoMessages) firing: varnishkafka on cp5004 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5004%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:48:51] sukhe: Thanks for the heads-up. [18:50:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2031 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:51:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4047 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:51:13] (VarnishkafkaNoMessages) resolved: varnishkafka on cp5004 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqsin%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp5004%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [18:58:42] (03CR) 10Urbanecm: [C: 03+2] HomepageVisit: Add specialcontribute as valid referer_route [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/850125 (https://phabricator.wikimedia.org/T320826) (owner: 10Kosta Harlan) [18:59:26] (03Merged) 10jenkins-bot: HomepageVisit: Add specialcontribute as valid referer_route [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/850125 (https://phabricator.wikimedia.org/T320826) (owner: 10Kosta Harlan) [19:06:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2033 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2033%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:06:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp6004 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp6004%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:07:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2034 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:11:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp1081 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:11:12] (VarnishkafkaNoMessages) resolved: (4) varnishkafka on cp2034 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:12:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2034 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:21:09] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:28:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp4049 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4049%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:28:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp6013 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp6013%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:30:10] 10Data-Engineering-Planning, 10Cassandra, 10Data Pipelines (Sprint 04), 10Patch-For-Review: Write dedicated cassandra authorization code to read password from file when loading - https://phabricator.wikimedia.org/T306895 (10BTullis) I'm coming to this a little late, but If have thought we could write a fil... [19:31:03] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:33:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp4049 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:33:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp5011 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:40:52] (03PS1) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 [19:41:25] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (owner: 10Mazevedo) [19:43:50] (03PS2) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [19:44:17] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [19:48:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2037%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:48:30] (03PS3) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [19:49:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp2038 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:49:47] (03PS1) 10Mforns: Update changelog.md with v0.2.9 changes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857760 [19:51:04] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for deployment train..." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/857760 (owner: 10Mforns) [19:53:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2037 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:54:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2038 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [19:55:16] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [19:55:24] Starting build #114 for job analytics-refinery-maven-release-docker [20:08:42] Project analytics-refinery-maven-release-docker build #114: 09SUCCESS in 13 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/114/ [20:09:57] (03PS4) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [20:10:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2040 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp2040%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:10:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp6015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp6015%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:10:47] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [20:11:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp6007 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp6007%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:13:05] (03PS5) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [20:13:35] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [20:15:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2040 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:15:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp6015 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp6015%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:16:12] (VarnishkafkaNoMessages) resolved: varnishkafka on cp6007 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=drmrs%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp6007%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:17:11] Starting build #73 for job analytics-refinery-update-jars-docker [20:17:25] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.9 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857059 [20:17:26] Project analytics-refinery-update-jars-docker build #73: 09SUCCESS in 14 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/73/ [20:17:49] (03PS6) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [20:19:06] (03CR) 10CI reject: [V: 04-1] Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) (owner: 10Mazevedo) [20:20:59] (03PS7) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [20:24:12] (03PS8) 10Mazevedo: Add ios talk page interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/857759 (https://phabricator.wikimedia.org/T321841) [20:25:35] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:30:12] (VarnishkafkaNoMessages) firing: varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=codfw%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp2041%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:32:12] (VarnishkafkaNoMessages) firing: (2) varnishkafka on cp1090 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:34:02] 10Data-Engineering-Planning, 10Machine-Learning-Team, 10Research: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score- - https://phabricator.wikimedia.org/T317768 (10Isaac) @Ottomata recognizing that this might be long past the time when you'... [20:35:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp2041 is not sending enough cache_text requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:36:31] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [20:36:45] (03CR) 10Mforns: [V: 03+2 C: 03+2] "MERging for deployment train..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857059 (owner: 10Maven-release-user) [20:37:12] (VarnishkafkaNoMessages) resolved: (2) varnishkafka on cp1090 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages [20:37:48] !log deployed refinery-source 0.2.9 as part of weekly deployment train [20:37:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:51:09] (03PS1) 10Mforns: Bump up MediawikiHistoryRunner jar versions to refinery-source 0.2.9 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857775 (https://phabricator.wikimedia.org/T320860) [20:52:20] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for weekly deployment train..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/857775 (https://phabricator.wikimedia.org/T320860) (owner: 10Mforns) [21:40:07] !log deployed airflow up to e08e32e83b519dee214b7177bbe0fd3ac5a0be3c [21:40:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:09:59] PROBLEM - MegaRAID on an-worker1094 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [23:42:53] RECOVERY - MegaRAID on an-worker1094 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring