[07:04:09] 10Analytics-Radar, 10SRE, 10Patch-For-Review, 10Services (watching), 10User-herron: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10elukey) @razzi @herron do you think that we can setup a quick meeting to discuss the next... [09:39:22] 10Analytics-Clusters: Disk filling up on `/` on an-coord1001 - https://phabricator.wikimedia.org/T279304 (10BTullis) I'm happy to take a look at this one this week if it helps @razzi It looks like it will be back in a warning state in about a week and critical in just over three weeks, at the current rate. {F3... [09:54:23] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-MoritzMuehlenhoff: Reduce manual kinit frequency on stat100x hosts - https://phabricator.wikimedia.org/T268985 (10BTullis) Having looked in detail at the kstart package we can see that it does not install any daemon, nor run any pre/post... [10:05:38] 10Analytics, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10elukey) Adding some notes collected in several meetings with Joseph during these months, plus related tasks. The architecture that we have in mind for the Alluxio/Presto cluster is the follow... [10:29:14] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10MoritzMuehlenhoff) [10:52:00] a-team just wondering about the matomo1002 VM.. this is in eqiad row C which we are doing maintenance on on Thursday. [10:52:19] I sent an email there on Friday, if someone can let me know anything we need to consider in advance for it let me know :) [11:08:17] topranks: I will look into this for you, but I would be surprised if an outage of a few 10s of seconds would be an issue anyway. If it's a Ganeti VM I'll see if it's possible to live-migrate it away from row C before Thursday. [11:08:41] btullis: great thanks, appreciate that. [11:09:11] Pleasure. I'll reply to the email thread as well. [12:46:02] topranks, btullis - for the matomo VM there shouldn't be any issue, we can leave it there and take a few seconds of network maintenance without problems. SPOF but not critical, also only used to track analytics for microsites. [12:48:17] (we allow minutes of downtime when we upgrade matomo for example, to upgrade the db schema etc..) [12:49:03] elukey, thanks for the info. That's what I thought re: usage and a network blip. Am I right that we can't do VM-level live-migrations in our Ganeti clusters then? [12:51:14] I've used it before using the feature of DRBD replication of a guest's disk between two hosts, so that a running VM can be migrated between two hosts. Had assumed that this was the case here too, but maybe not. [12:54:20] btullis: lollop [12:58:23] btullis: there are some limitations in live migration, not sure if we us drbd everywhere due to past issues, Moritz knows best (but I have never done it in the past so my experience is limited) [12:59:10] o/ [13:00:29] 👍 Great, thanks all. [13:00:53] Morning millimetric. [13:15:16] hi btullis! [13:16:25] I spelt your nick incorrectly. Schoolboy error. [13:33:04] oh :) that's partially my fault, I use the Romanian spelling [14:26:38] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [14:33:22] heya teammm! [14:34:44] Hiya! [14:35:05] milimetric: I'm here, I can do the SUCCESS_ file thing! [14:35:09] :] [14:35:14] mforns: I'm in the middle of it [14:35:26] good morning, starting a lil late today :) [14:35:38] though it took me like 10 minutes to convince myself not to just rewrite it in AirFlow :) [14:35:38] ok, wanna pair milimetric? hi fdans! [14:35:51] to the batcave! [14:35:57] omw [14:36:23] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [14:38:44] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: Switch buffer re-partition - Eqiad Row C - https://phabricator.wikimedia.org/T286065 (10cmooney) [14:40:15] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [14:52:46] ottomata: do you have 3 mins before standup? [14:53:17] 10Analytics-Radar, 10SRE, 10Patch-For-Review, 10Services (watching), 10User-herron: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10herron) sure, sounds good to me! [15:10:20] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Top edited pages list on enwiktionary contains nonexistent pages with titles made up of question marks - https://phabricator.wikimedia.org/T284623 (10Milimetric) 05Open→03Resolved resolving this, but feel free to open subtasks [15:14:01] 10Analytics, 10Analytics-Kanban, 10WMDE-TechWish: Deployment access request for some analytics repos - https://phabricator.wikimedia.org/T274880 (10Milimetric) This should be done, but I saw reports of folks not being able to +2 despite being in the proper gerrit groups, @Andrew-WMDE I believe. Can someone... [15:34:14] 10Analytics-Radar, 10Product-Analytics: Investigate running Stan models on GPU - https://phabricator.wikimedia.org/T286493 (10mforns) [15:37:25] 10Analytics: Push Gobblin import metrics to Prometheus and add alerts on some critical imports - https://phabricator.wikimedia.org/T286503 (10mforns) [15:37:27] 10Analytics: When gobblin fails, we should know about it - https://phabricator.wikimedia.org/T286559 (10mforns) [15:39:14] 10Analytics: Push Gobblin import metrics to Prometheus and add alerts on some critical imports - https://phabricator.wikimedia.org/T286503 (10mforns) p:05Triage→03High [15:39:32] 10Analytics: When gobblin fails, we should know about it - https://phabricator.wikimedia.org/T286559 (10mforns) p:05Triage→03High [15:44:32] 10Analytics: Refinery python code should use anaconda-wmf - https://phabricator.wikimedia.org/T286743 (10mforns) p:05Triage→03Medium [15:46:28] 10Analytics: [EventGate] Failures when getting stream config from MediaWiki API - https://phabricator.wikimedia.org/T286793 (10mforns) p:05Triage→03High [15:47:11] 10Analytics: [EventGate] Failures when getting stream config from MediaWiki API - https://phabricator.wikimedia.org/T286793 (10mforns) a:03mforns [15:49:51] 10Analytics, 10Analytics-EventLogging, 10Wikimedia-production-error: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814 (10mforns) Thanks for posting this @cjming. We have added the Search team to the... [15:51:11] 10Analytics-EventLogging, 10Analytics-Radar, 10Wikimedia-production-error: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814 (10mforns) [15:55:04] 10Analytics, 10Analytics-EventLogging, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10mforns) Adding the editing team, because the schema editattemptstep is the one with instrumentation issues. cc @nshahquinn-wmf @nettrom_WMF [15:57:05] 10Analytics-EventLogging, 10Analytics-Radar, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10mforns) [16:02:59] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10cmooney) [16:10:04] 10Analytics-EventLogging, 10Analytics-Radar, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10Mayakp.wiki) [16:14:29] 10Analytics-Clusters: Disk filling up on `/` on an-coord1001 - https://phabricator.wikimedia.org/T279304 (10BTullis) a:05razzi→03BTullis [16:17:52] 10Analytics-EventLogging, 10Analytics-Radar, 10Editing-team, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10ldelench_wmf) [16:23:18] 10Analytics-EventLogging, 10Analytics-Radar, 10Editing-team, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10nshahquinn-wmf) >>! In T286815#7221661, @mforns wrote: > Adding the editing team, because the schema edita... [16:26:55] 10Analytics-Radar, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: Draft of full process for instrumentation using new client libraries - https://phabricator.wikimedia.org/T275694 (10ldelench_wmf) [16:30:53] razzi: one thing that I just realized, tomorrow there will be network maintenance in eqiad in row D for https://phabricator.wikimedia.org/T284592 [16:31:11] timeline is 15:00 UTC (08:00 PDT / 11:00 EDT / 17:00 CEST) [16:31:43] it is the same starting time as ours, but in theory it should be fine since it will last some seconds [16:31:53] but let's make sure that we don't start until it is finished [16:32:05] (it may impact analytics nodes in row d) [16:33:09] Gotcha, good catch elukey [16:44:52] 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10nettrom_WMF) a:05nettrom_WMF→03None [16:52:59] 10Analytics-EventLogging, 10Analytics-Radar, 10Editing-team, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10DLynch) This would be a duplicate of T237063, I think? [16:58:12] 10Analytics, 10Inuka-Team, 10Product-Analytics (Kanban): Superset timeouts for KaiOS dashboard - https://phabricator.wikimedia.org/T277320 (10nshahquinn-wmf) [18:26:39] 10Analytics-EventLogging, 10Analytics-Radar, 10Editing-team, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10mforns) Thanks @nshahquinn-wmf for pointing to the right doc :-) [19:45:07] hi! is it possible to see the page load times for a certain namespace at enwikisource (the Page: NS)? [19:45:34] is there a graoh for that, or do I need to dig about in a DB? [19:53:22] motivation: seeing how much time is saved by https://phabricator.wikimedia.org/T230689 [19:58:28] (03PS1) 10MewOphaswongse: Suggested Edits: Update homepagemodule schema to support new mobile navigation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/705493 (https://phabricator.wikimedia.org/T268708) [20:15:12] 10Analytics, 10Analytics-Kanban, 10Platform Engineering, 10Research: Create airflow instances for Platform Engineering and Research - https://phabricator.wikimedia.org/T284225 (10razzi) [20:30:28] * razzi taking a siesta for an hour [20:33:43] (03PS2) 10MewOphaswongse: Add a link: Update schema to support edit mode toggle [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/704402 (https://phabricator.wikimedia.org/T278115) [20:33:55] 10Analytics-EventLogging, 10Analytics-Radar, 10Editing-team, 10Product-Analytics, 10Wikimedia-production-error: '.event.abort_timing' should be integer - https://phabricator.wikimedia.org/T286815 (10DLynch) [22:25:53] 10Analytics-Radar, 10SRE, 10Patch-For-Review, 10Services (watching), 10User-herron: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10razzi) Yes please invite me to a meeting @elukey! Thanks for keeping things moving on this... [22:56:25] inductiveload: pewrf metrics are in grafana https://grafana.wikimedia.org/d/000000050/performance-metrics?orgId=1&refresh=5m [22:57:00] inductiveload: but I am not sure they are as detailed as what you are looking for [23:28:34] 10Analytics, 10DBA, 10Infrastructure-Foundations, 10SRE, and 3 others: Switch buffer re-partition - Eqiad Row D - https://phabricator.wikimedia.org/T286069 (10Bstorm)