[06:19:34] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10Marostegui) p:05Triage→03Medium [07:08:05] good morning! [07:08:06] elukey@stat1006:~$ sudo du -hs /tmp [07:08:06] 61G /tmp [07:09:43] there is a 37G directory with timestamp Nov 9th, that I believe is before the spark scratch dir change [07:09:46] going to drop it [07:09:59] !log drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8 on stat1006 to free space on the root partition [07:10:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:14:45] 10Analytics-Radar, 10WMDE-GeoInfo-FocusArea, 10WMDE-TechWish-Sprint-2021-11-10: Review existing dashboards and metrics for maps - https://phabricator.wikimedia.org/T295315 (10lilients_WMDE) a:05lilients_WMDE→03None [10:30:38] elukey: Thanks very much. FYI the spark local directory change was reverted because it broke Jupyter. I'll have another look at it this week. [10:31:36] ahhhh snap didn't know it [11:20:18] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Loading of the 11th snapshot has finished successfully and all instances are now compa... [12:14:04] mforns: joal: I have deployed that patch to the timings of delayed sanitization. [12:29:23] PROBLEM - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:31:26] I will reset the failed state of this alert --^ Deploying the new timer caused the service to re-run. [12:31:57] !log btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service [12:32:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:40:29] RECOVERY - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:43:26] 10Analytics, 10CheckUser, 10Patch-For-Review, 10Platform Team Workboards (Clinic Duty Team), 10Schema-change: Schema changes for `cu_changes` and `cu_log` table - https://phabricator.wikimedia.org/T233004 (10Ladsgroup) @Rxy hi, I'm willing to help getting this done (review, scripts, etc.). Are you still... [12:57:27] hey teamm :] [12:58:21] btullis: I understand you deployed the fix after refine ran? Or before? I've seen the alert triggered again... [13:11:58] Hi mforns :) [13:12:11] heyyy :] [14:16:47] Hi mforns: I think it's nothing to worry about. It seems that deploying the updated timer causes it to fire. So the monitor job ran today at 12:15 UTC. [14:16:52] https://www.irccloud.com/pastebin/W9lErNT4/ [14:17:21] ah! btullis I get it now. Thanks! [14:17:53] A pleasure. [15:00:47] 10Data-Engineering, 10DBA, 10Infrastructure-Foundations, 10Patch-For-Review, 10Puppet: Split mariadb::dbstore_multiinstance into 2 separate roles (backup sources and analytics) - https://phabricator.wikimedia.org/T296285 (10jcrespo) Deployment went as expected- but now that I thought a bit, I think btull... [15:12:45] (03PS1) 10Bearloga: movement_metrics: Cleanup notebooks dir [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/741688 (https://phabricator.wikimedia.org/T296397) [15:14:06] (03CR) 10Bearloga: [V: 03+2 C: 03+2] movement_metrics: Cleanup notebooks dir [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/741688 (https://phabricator.wikimedia.org/T296397) (owner: 10Bearloga) [15:27:09] I've been asked to reboot the two schema servers in codfw. schema2003 and schema2004. [15:28:12] I thought I'd check there to see whether I should depool them individually before rebooting them, or whether rebooting them without depooling is fine. [15:28:45] I'd depool them before rebooting, IIRC the cookbook takes care of it if the node is behind LVS (to double check) [15:28:53] they are basically nginx nodes [15:30:48] Cool, thanks. The cookbook that I'll be using is `sre.ganeti.reboot-vm` because the VM needs a cold boot. [15:32:06] ...but yeah it looks like that cookbook can depool and repool them. [15:33:07] perfect :) [15:55:23] Both rebooted. There's one Icinga alert as a result on schema2004, which I think is the result of the `up` script in /etc/network/interfaces. [15:55:28] https://www.irccloud.com/pastebin/xLRWHbWP/ [15:56:16] yep it is https://phabricator.wikimedia.org/T273026 [15:56:27] a reset-failed is usually ok [15:57:01] Thanks elukey - I thought as much. [15:59:41] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10elukey) 05Open→03Stalled Setting this to stalled until we agree on https://phabricator.wikimedia.org/T296089 [15:59:47] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10SRE, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey) [16:14:27] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for deployment train" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/739922 (https://phabricator.wikimedia.org/T290516) (owner: 10Milimetric) [16:52:38] razzi: I have another pending wiki replica patch that I would enjoy dropping in your (or your team's) lap. Any interest? https://gerrit.wikimedia.org/r/c/operations/puppet/+/732740 [16:53:23] andrewbogott: sounds great, I can take a look now! [16:54:32] thanks! [16:55:36] andrewbogott: do you think you could give me a quick rundown of localuser and globaluser / point me to docs? [16:58:00] I don't think I know enough to provide a useful rundown. My first step would be to ask Reedy for help (because he's helpful!) [16:58:33] * Reedy looks in [16:58:44] * razzi waves [16:59:22] There's https://www.mediawiki.org/wiki/Toolserver:Database_schema/Global#Globaluser_table [16:59:30] But I don't see the equivalent local table yet [16:59:53] anything in Toolserver: is horribly outdated I think [17:00:17] It's slightly better than https://www.mediawiki.org/wiki/Extension:CentralAuth/globaluser_table is [17:00:37] What server / db could I connect a cli client to and poke around? [17:00:43] Only for reads :) [17:00:50] `sql centralauth` will work fine from mwmaint* [17:01:05] razzi: you mean for live data or replica data? [17:01:54] https://github.com/wikimedia/mediawiki-extensions-CentralAuth/blob/master/central-auth.sql won't answer all questions... But at least gives some context as to what the tables purpose is [17:01:56] either, mostly looking at the schema really [17:02:13] ty Reedy [17:03:04] INcredible I may end up understanding the magic of how users log in to multiple wikis as a result [17:03:19] You might've gone too far the rabbit hole at that point [17:03:19] (of course it's implemented as an extension ... xD) [17:04:25] so basically the only code to review in this patch is [17:04:25] ``` [17:04:26] view: > [17:04:26] select lu_wiki, lu_name, lu_attached_timestamp, lu_attached_method, lu_local_id, lu_global_id [17:04:26] where: lu_global_id = gu_id AND gu_hidden=''``` [17:04:53] if you compare that to a select * from localuser... it should obviously have less rows [17:05:19] gu_hidden default is '' which means the user is not hidden [17:06:26] am I understanding the join in that it's simply saying "don't show me stuff hidden in globaluser" ? [17:06:45] And other than that, it's resctricting the columns in localuser [17:06:56] restricting the rows in localuser [17:07:25] if you look at globaluer, we also do the same where `where: gu_hidden=''` [17:08:14] but yeah, pretty much that [17:08:32] if we're not exposing that user/row in globaluser, we shouldn't be exposing the corresponding rows in localuser [17:15:47] 10Data-Engineering, 10Product-Analytics: [REQUEST] Notebook for testing with wmfdata - https://phabricator.wikimedia.org/T296420 (10mpopov) [17:16:04] 10Data-Engineering, 10Product-Analytics: [REQUEST] Notebook for testing with wmfdata - https://phabricator.wikimedia.org/T296420 (10mpopov) p:05Triage→03Medium [17:18:58] Cool thanks Reedy ! [17:18:58] 10Data-Engineering, 10Product-Analytics: [REQUEST] Notebook for testing with wmfdata - https://phabricator.wikimedia.org/T296420 (10mpopov) @BTullis: would it make sense to have it query via Hive as well? (by the way, [[ https://github.com/wikimedia/wmfdata-python/blob/master/wmfdata/hive.py | wmfdata's hive m... [17:19:25] oop looks like we lost andrewbogott in this channel [17:21:04] 10Data-Engineering, 10Product-Analytics: [REQUEST] Notebook for testing with wmfdata - https://phabricator.wikimedia.org/T296420 (10mpopov) [17:27:41] Hi andrewbogott , channel disconnected for a second, gave your patch a +1, want to pair on rolling that out? [17:30:09] !log Deployed refinery using scap, then deployed onto hdfs [17:30:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:30:59] razzi: sure, want to do that now or schedule for later? [17:31:29] andrewbogott: How about after lunch? (I think we're both in central time) [17:31:52] Sure. 1pm? (I'm free from 1 to 2) [17:31:57] perf [19:24:27] heya razzi, could you merge a puppet change that belongs to this week's deployment train? [19:25:34] if you don't feel comfortable merging without another +1, I can leave it for next week [19:25:42] https://gerrit.wikimedia.org/r/c/operations/puppet/+/739923/ [19:38:47] (03CR) 10Mforns: Add skin_diff schema to sanitize allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/737471 (https://phabricator.wikimedia.org/T287255) (owner: 10Jenniferwang) [19:38:56] Yeah I can do that mforns , give me a few minutes [19:39:12] \o/ [19:46:28] Alright I'm here mforns , what's the scoop? :P [19:46:38] hehe [19:47:11] I'm not sure if that has been reviewed by us, but doesn't have any +1 or +2 [19:47:57] Ah I see, it's a patch that milimetric wants deployed but he's on vacay? [19:48:17] razzi: it's a new scoop job that will load some extra tables for topic subscription data [19:48:23] yes [19:49:57] hi! Sorry, I'm here for any questions razzi [19:50:27] hey milimetric :] [19:50:38] If the patch looks ok you can merge, if there are any problems you can revert. I'll log in and check on it. If you're not comfortable because it's turkey day feel free to skip [19:51:07] And it didn't get reviewed, so skipping is totally ok [19:51:09] Let's just skip :) [19:51:29] Thanks for chiming in milimetric , have a good 🦃📅 [19:52:17] lmk if you need anything else mforns; I'll be signing off for the holiday pretty soon myself [21:05:48] good call, happy turkey day everyone [21:31:45] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Desktop Improvements, and 4 others: Add agent_type and access_method to sticky header instrumentation - https://phabricator.wikimedia.org/T294246 (10cjming) a:05cjming→03Edtadros