[01:30:23] PROBLEM - Check unit status of monitor_refine_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:25:39] (03CR) 10Sharvaniharan: [C: 03+2] "Merging this... Thank you for the reviews." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/794023 (https://phabricator.wikimedia.org/T305575) (owner: 10Sharvaniharan) [04:26:20] (03Merged) 10jenkins-bot: New schema for android app breadcrumbs [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/794023 (https://phabricator.wikimedia.org/T305575) (owner: 10Sharvaniharan) [08:10:55] 10Data-Engineering: krb1001's auth.log grows a lot causing disk space issues for the root partition - https://phabricator.wikimedia.org/T302518 (10Aklapper) a:05razzi→03None Resetting inactive task assignee [08:11:04] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Services, 10Platform Engineering, and 2 others: Log_param is redacted in wiki replica when only comment and/or user should be - https://phabricator.wikimedia.org/T301943 (10Aklapper) a:05razzi→03None Resetting inactive task assignee [08:11:23] 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics, 10Superset, 10Patch-For-Review: Upgrade Superset to 1.4.2 - https://phabricator.wikimedia.org/T304972 (10Aklapper) a:05razzi→03None Resetting inactive task assignee [08:11:35] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 2 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10Aklapper) a:05razzi→03None Resetting inactive task... [08:11:40] 10Data-Engineering, 10Data-Services, 10cloud-services-team (Kanban): Recreate views for globalblocks table - https://phabricator.wikimedia.org/T300988 (10Aklapper) a:05razzi→03None Resetting inactive task assignee [08:45:14] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Persistence (Consultation), 10Data-Services, and 2 others: Toolforge db: View 'fiwiki_p.flaggedrevs' references invalid table/column/rights to use them - https://phabricator.wikimedia.org/T302233 (10Ladsgroup) 05Open→03Resolved I just close it, if i... [08:57:14] 10Data-Engineering-Kanban, 10Data-Catalog: User Experience: Authentication - https://phabricator.wikimedia.org/T307711 (10BTullis) >> Without this, I'm not sure that there is currently any more work to be done on this ticket. > > I agree. What was that hiccup the other day, where I didn't have access and the... [09:21:28] 10Data-Engineering, 10Data-Engineering-Kanban: Add the conftool pooled/depooled status and weight into prometheus for each service - https://phabricator.wikimedia.org/T309189 (10BTullis) I now have a draft CR for this, thanks to @jbond for his help. However, whilst working on this, John identified a potential... [09:34:23] 10Data-Engineering, 10Data-Catalog: Resolve 500 errors when browsing Kafka datasets - https://phabricator.wikimedia.org/T308736 (10BTullis) 05Open→03Resolved [09:56:14] (03CR) 10Joal: [C: 03+1] "LGTM! Minimal change ideas, can be merged as is." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/791394 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [09:57:44] (03CR) 10Joal: [C: 03+1] "I forgot to mention in my previous CR: This is +1 if queries have been tested :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/791394 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [09:59:35] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Data Infrastructure as a Service MVP - https://phabricator.wikimedia.org/T308317 (10BTullis) [09:59:37] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Analytics Platform Future State Planing - https://phabricator.wikimedia.org/T302728 (10BTullis) [10:18:38] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Assess existing and in-development storage platforms for suitability - https://phabricator.wikimedia.org/T309509 (10BTullis) [10:35:07] 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Assess existing and in-development storage platforms for suitability - https://phabricator.wikimedia.org/T309509 (10BTullis) p:05Triage→03High a:03BTullis [11:28:30] !log deploy airflow spark3 aqs_hourly [11:28:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:12:17] btullis: hey, if you're around today, please downtime analytics1068 as mega raid is flapping and that's expected [12:12:21] I left a note on the ticket [12:14:04] RhinosF1: Thanks yes, I will. I tried to ack it today but for some reason Icinga gave me a permission denied error and I hadn't got around to investigating further. [12:14:09] https://usercontent.irccloud-cdn.com/file/xallYtxZ/image.png [12:18:35] btullis: are you logged in with the right name / case ? [12:18:54] Remember icinga lets you login case insensitive but gives no perms [12:21:17] I think I was, but I forced a logout of CAS and logged in again. Done now. [13:28:02] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Stang) ` MariaDB [zhwiktionary_p]> select * from page limit 10; ERROR 1356 (HY000): View 'zhwiktionary_p.page' references invalid table(s) or column(s) or fun... [13:29:28] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Stang) [13:30:19] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Stang) [13:30:33] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Stang) 05duplicate→03Open [13:34:44] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Marostegui) I am recreating all the views on s3 for the page table [13:35:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Marostegui) For what is worth this is caused by a schema change that is being run in production: T60674 [13:36:40] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10Marostegui) 05Open→03Resolved a:03Marostegui All views are recreated - this https://xtools.wmflabs.org/ec-namespacetotals/zh.wiktionary.org/DinoWP works... [13:38:49] 10Data-Engineering, 10Data-Engineering-Kanban, 10DBA, 10Data-Services, and 2 others: XTools:500 error - https://phabricator.wikimedia.org/T309531 (10SD_hehua) Good job,Thank you! [13:44:33] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic, 10User-zeljkofilipin: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10akosiaris) >>! In T306181#7963731, @phuedx wrote: >>>! In T306181#7914450, @akosiaris wrote... [14:48:29] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) Thanks @phuedx and @akosiaris for that information and for the patch. That's a great find ab... [14:54:24] (03CR) 10Mforns: [C: 03+1] Add HQL scripts for wikidata graphite metrics (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/791394 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [15:16:20] 10Data-Engineering: Check home/HDFS leftovers of razzi - https://phabricator.wikimedia.org/T309000 (10odimitrijevic) p:05Triage→03High [15:17:12] 10Data-Engineering: Migrate eventlogging check_prometheus checks to alertmanager - https://phabricator.wikimedia.org/T309007 (10odimitrijevic) p:05Triage→03Medium [16:38:50] 10Data-Engineering-Kanban, 10Airflow: Airflow upgrade - https://phabricator.wikimedia.org/T309552 (10Antoine_Quhen) [17:17:30] hey mforns - I found the bug for the test failure of my PR - do you agree me deploying it? [17:17:39] yes! [17:17:46] what was it? [17:18:07] protobuf released a new major version - the bump is from 3.20 to 4.12 [17:18:19] And skein depends on protobuf > 3.5 [17:18:37] I added protobuf <4.0 in our config [17:22:15] (03CR) 10Joal: [C: 03+1] Add HQL scripts for wikidata graphite metrics (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/791394 (https://phabricator.wikimedia.org/T300021) (owner: 10Snwachukwu) [17:38:59] (03PS1) 10Joal: Update the browser_general hql to use spark hints [analytics/refinery] - 10https://gerrit.wikimedia.org/r/801416 [18:20:31] mforns, aqu: I actually separated my PRs - the main one that fixes the dependencies is here: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/68 [18:20:44] I then have following ones about spark3, but less urgent ) [18:21:28] ok joal, in meeting now, but will review! If you want to merge, please go ahead! [18:21:50] mforns: I'd rather wait for reviews :) it's not that much of an emergency :) [18:22:14] ok [19:02:32] 10Data-Engineering, 10Airflow: [Airflow] URLSensor might be preventing alerts to fire correctly - https://phabricator.wikimedia.org/T309563 (10mforns) [20:19:49] !log Restarted oozie job pageview-druid-daily-coord [20:19:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log