[01:11:31] RECOVERY - MegaRAID on an-worker1082 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:52:20] 10Data-Engineering, 10Cassandra, 10Epic, 10Platform Team Workboards (Platform Engineering Reliability): Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10BTullis) [09:33:21] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): dbstore1007 is swapping heavilly, potentially soon killing mysql services due to OOM error - https://phabricator.wikimedia.org/T290841 (10BTullis) I //always// listen to your suggestions @Jcrespo :-) Do yo... [09:55:13] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10BTullis) a:03BTullis I'm happy with this and since you've already done the work in writing the patch, I'll merge and test it today. [10:00:44] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10BTullis) OK, @EBernhardson, @mpopov - The patch has been applied and superset has been restarted with the feature flag enabled. It looks OK to me, b... [10:15:37] 10Data-Engineering, 10Data-Services, 10Patch-For-Review: Move wikireplicas dbproxy haproxy config to etcd - https://phabricator.wikimedia.org/T304478 (10BTullis) 05In progress→03Declined Thanks @Joe for your input on this ticket. I think that we've decided it's too much work for us to take on at the mome... [10:18:13] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE Observability, and 2 others: Migrate the majority of the analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10BTullis) 05Open→03Resolved I'm re-resolving this ticket now @fgiunchedi -... [11:51:45] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:52:09] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:02:07] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:03:15] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:55:10] (03PS3) 10NOkafor: Minor trailing space and back slash adjustments Cassandra Loading HQL files [Draft] Bug: T311507 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) [13:01:06] (03CR) 10Ottomata: (DO NOT SUBMIT) spark: support "const" in json schema (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/813342 (https://phabricator.wikimedia.org/T311615) (owner: 10Hashar) [13:02:15] (03CR) 10NOkafor: "Resolved the trailing spaces and unescaped characters" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [13:02:39] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation): dbstore1007 is swapping heavilly, potentially soon killing mysql services due to OOM error - https://phabricator.wikimedia.org/T290841 (10Ottomata) Not that I know of! But I probably wouldn't know either... [13:45:50] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: "Pages to date" not loading with "daily" metric - https://phabricator.wikimedia.org/T312717 (10Milimetric) I'm not sure @Nevmit, we triage again on July 25th, and if this is prioritized then, that week but probably the next at the earliest. So early Au... [14:47:52] PROBLEM - IPMI Sensor Status on aqs2004 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [14:57:24] PROBLEM - IPMI Sensor Status on aqs2003 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:01:54] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10EBernhardson) @BTullis thanks! Loading my old dashboard it looks to be working same as it did before. Looks complete to me. [15:02:03] PROBLEM - IPMI Sensor Status on aqs2001 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:04:25] FYI Power supply is maint if not known [15:13:23] PROBLEM - IPMI Sensor Status on aqs2002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [15:48:43] RECOVERY - IPMI Sensor Status on aqs2004 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [16:03:05] RECOVERY - IPMI Sensor Status on aqs2001 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [16:06:20] (03CR) 10Michael Große: "This is just preparatory work for the follow-up patch" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813919 (owner: 10Michael Große) [16:08:55] RECOVERY - IPMI Sensor Status on aqs2003 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [16:09:10] (03CR) 10Michael Große: "I think this should just work. If desired, we can backfill this for a few days by creating a maintenance script" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813920 (https://phabricator.wikimedia.org/T303394) (owner: 10Michael Große) [16:15:08] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Extract collecting the LexemePagePropsStats into private method [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813919 (owner: 10Michael Große) [16:15:27] RECOVERY - IPMI Sensor Status on aqs2002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [16:15:46] (03Merged) 10jenkins-bot: Extract collecting the LexemePagePropsStats into private method [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813919 (owner: 10Michael Große) [16:15:52] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Track number of created Lexemes via ui per day [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813920 (https://phabricator.wikimedia.org/T303394) (owner: 10Michael Große) [16:19:20] (03PS1) 10Lucas Werkmeister (WMDE): Extract collecting the LexemePagePropsStats into private method [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813946 [16:19:32] (03CR) 10Lucas Werkmeister (WMDE): [V: 03+2 C: 03+2] Extract collecting the LexemePagePropsStats into private method [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813946 (owner: 10Lucas Werkmeister (WMDE)) [16:19:42] (03Merged) 10jenkins-bot: Track number of created Lexemes via ui per day [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813920 (https://phabricator.wikimedia.org/T303394) (owner: 10Michael Große) [16:24:34] (03PS1) 10Michael Große: Track number of created Lexemes via ui per day [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813947 (https://phabricator.wikimedia.org/T303394) [16:30:11] (03CR) 10Michael Große: [C: 03+2] "self +2 since this is only a cherry-pick of an already merged patch to production" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813947 (https://phabricator.wikimedia.org/T303394) (owner: 10Michael Große) [16:31:12] (03Merged) 10jenkins-bot: Track number of created Lexemes via ui per day [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/813947 (https://phabricator.wikimedia.org/T303394) (owner: 10Michael Große) [17:53:01] 10Data-Engineering, 10Product-Analytics, 10Patch-For-Review: Request for SQL Templating to be enabled in Superset - https://phabricator.wikimedia.org/T312134 (10mpopov) Can confirm templating works! Thank you @EBernhardson @BTullis ! [19:27:48] 10Data-Engineering, 10Event-Platform, 10Technical-Debt: Migrate usage of Database::select to SelectQueryBuilder in EventStreamConfig - https://phabricator.wikimedia.org/T312398 (10Southparkfan) 05Open→03Invalid Impacted code is not used by this extension. [19:37:21] ottomata: are you working tomorrow? If not, can you please review https://gerrit.wikimedia.org/r/c/operations/puppet/+/813925? :] [19:49:44] mforns: +1 should I merge? [19:50:01] yes, please, if you think it's ok [19:50:13] thanks! [19:51:37] done! [20:08:06] 10Analytics, 10Analytics-Wikistats, 10Data-Engineering: "Pages to date" not loading with "daily" metric - https://phabricator.wikimedia.org/T312717 (10Nevmit) Thank you @Milimetric [20:14:39] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 01): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10Ottomata) I was really hoping to get this finished today, but I'm still having issues. ####### Unpacking the conda packed tarball has an error @Antoine_Qu... [20:20:08] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 01): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10Ottomata) I created a Draft MR with the stuff I'm trying: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/merge_requests/3 [20:57:50] (03CR) 10Ottomata: Schemas for Gerrit (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811302 (https://phabricator.wikimedia.org/T311615) (owner: 10Hashar) [22:43:49] (03CR) 10Hashar: Schemas for Gerrit (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811302 (https://phabricator.wikimedia.org/T311615) (owner: 10Hashar) [23:09:14] (03PS7) 10Hashar: Schemas for Gerrit [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811302 (https://phabricator.wikimedia.org/T311615) [23:11:37] (03CR) 10Hashar: Schemas for Gerrit (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/811302 (https://phabricator.wikimedia.org/T311615) (owner: 10Hashar)