[08:54:19] o/ I am not able to log into logstash anymore: Access denied due to missing permissions. I ran an LDAP search and apparently I am not a member of any of the required groups (logstash-access, nda, ops). Since I was able to login in the past, I wonder what happened since. [08:57:06] pfischer: there was an annoucement, trying to find the email, will forward it to you [08:57:37] dcausse: found it [08:57:54] I think there's a page to request perms now [09:00:55] Yep, done. Thank you. [10:46:13] errand+lunch [11:23:42] lunch [11:46:49] q: is there an easy way to regenerate fixtures in mjolnir? I refactored cli for arg passing, and the checks on expected are quite strict [11:47:37] i was hoping that pasting the output of mjolnir-utils.py -h would be enough, but indentation is broken [11:47:50] no biggie, I can figure out how tests generate the command doc in the first place [11:48:37] just wanted to check if I'm maybe missing something obvious (e.g. some REBUILD_FIXTURES conf) [12:08:33] ebernhardson i attempted the mjolnir cli refactoring we were talking about last night https://gitlab.wikimedia.org/repos/search-platform/mjolnir/-/merge_requests/14/diffs [12:08:56] tagged as draft because I need to test if all works as expected :) [12:09:06] lunch + doc appointment [12:55:39] o/ [13:16:48] gmodena: mjolnir should inspect the REBUILD_FIXTURES env var IIRC [13:17:02] REBUILD_FIXTURES=yes [13:41:48] Patch for adding OpenSearch profile to beta cluster if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131098 [13:51:21] ^^ self-merged [13:56:04] inflatador: ack, sorry missed the ping [13:58:46] No problem, we're making progress [14:06:26] \o [14:06:46] gmodena: hmm, there is a way to rebuild fixtures but i'm completely forgetting right now. I'll look over it today [14:07:01] (it's been years) [14:12:31] looks like i implemented rebuild on a per-fixture basis :S Some rebuild when the file is deleted, some have a magic file that was externally built [14:15:08] ebernhardson no worries, I managed to inspect the test output and re-generate things manually [14:22:42] o/ [14:59:46] o/ [15:00:45] ebernhardson this is fresh off my linkedin feed https://arxiv.org/abs/2503.19092 [15:00:48] ^ Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation [15:01:28] interesting, and that is something i was curious about: " We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers" [15:01:40] exactly! [15:01:56] To this end, we offer initial guidelines and a research agenda to ensure the reliable use of LLMs in IR evaluation. [16:46:02] heading out [17:03:01] CR to start migrating the CODFW Elastic hosts here if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1129264 [17:03:04] cc ryankemper ^^ [17:03:20] whoops, that is the wrong CR. 1 sec [17:03:39] OK, for real this time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131087 [17:24:08] got past the puppet CA errors in deployment-prep with this guide https://wikitech.wikimedia.org/wiki/Help:Project_puppetserver . Puppet still unhappy though, it's complaining about some nagios checks [17:33:15] if/when we stand up net-net opensearch clusters, it seems we might have to set `cluster.initial_master_nodes`. the new DP cluster is complaining about it, and I remember it was complaining on relforge alpha as well [17:33:54] net-new, that is [17:35:38] ahh, i suppose that makes sense. I haven't set up a new cluster in ages [17:39:23] I wish I knew more about the effect it has on existing hosts. Seems harmless enough based on the comment `# Set to ensure a node sees N other master eligible nodes to be considered operational within the cluster.` [17:39:33] regardless, easy enough to one-off in beta [18:01:16] * inflatador loves how the journalctl equivalent to `tail -f` is `journalctl -fu` [18:09:50] lunch, back in ~40 [18:36:00] for building the opensearch plugins deb and restarting the relevant clusters, do we prefer creating a ticket just for that, or using the ticket that led to the changes? I've been creating new tickets and it seems to work, but wondering if that's preferred [18:36:07] re: T389812 [18:36:08] T389812: cirrussearch-opensearch-image should provide an opensearch image that offers the same functionnalities as the production opensearch cluster - https://phabricator.wikimedia.org/T389812 [18:44:07] I tested https://gitlab.wikimedia.org/repos/search-platform/mjolnir/-/merge_requests/14 on sat hosts and I _think_ that now the date filtering works as we expected [18:44:36] as a side effect now --start-date and --end-date flags are available for all commands that require an input table [18:45:06] i prepped a sister MR in airflow dags (requires a new mjolnir conda envs) [18:50:16] looking [18:54:19] seems reasonable, thanks! [19:05:33] back [19:06:10] ebernhardson A new ticket is probably better, but I'm fine if you want to stick the Data-Platform-SRE label on an existing ticket and ping us [19:06:19] whichever's easier [19:10:02] inflatador: it's not hard to make a new ticket, i filled T390100 [19:10:03] T390100: Build and deploy updated opensearch plugins deb - https://phabricator.wikimedia.org/T390100 [19:23:18] OK, dp cluster is up and running. Puppet is still unhappy, but it's related to icinga checks so I think we can ignore those [19:31:51] we'll probably never have to do it again, but here's a quick and dirty playbook to re-init the cluster: https://gitlab.wikimedia.org/repos/search-platform/sre/ansible-playbooks/cirrussearch-deployment-prep/-/tree/main?ref_type=heads [19:31:51] \o/ [19:32:41] will probably have to wait for tomorrow to ship config patches, most days i can do the 1pm deploy window but school gets out early (at 1pm) on wednesdays and i can never make it [19:33:00] Or you or ryan can do it, the config patch is a prod no-op so should be trivial [19:33:03] ACK, I assume we can't rebuild the indices until that config patch is active? [19:34:04] right, you need https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1131333 so that we can rebuild use `--cluster=eqiad-opensearch` [19:34:37] Is it too late to schedule for today? I'll give it a shot [19:35:05] as long as the window isn't full you can sign up, lemme check [19:35:07] it is **not** too late [19:35:13] just signed up [19:35:18] nice :) [19:37:20] if we're using search-psi in the patch, do we need to use port 9643? [19:37:28] re https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1131333/2/wmf-config/CirrusSearch-labs.php [19:37:47] or is that part of the hack? [19:37:49] inflatador: no, the -psi is just kinda cheating a bit [19:37:57] inflatador: we want all the indices on :9243 [19:38:37] ACK, should be ready in a few then [19:39:02] ebernhardson thanks! Ok with you if I merge an re-release mjolnir? [19:39:10] gmodena: yea go for it [19:39:40] ebernhardson ack [19:40:36] ebernhardson actually... we could push a .dev conda env instead of a release one. i see a deploy option on the current MR [19:41:07] let me see if it works. Could help to run an integration test, and avoid version spam [19:41:40] sounds reasonable, i don't think i've tried that yet [19:49:40] it worked https://gitlab.wikimedia.org/repos/search-platform/mjolnir/-/packages/1054 [20:12:21] the mw-config patch for beta cluster was merged, LMK next steps [20:16:49] back [20:17:43] inflatador: next would be from an mwmaint-style server. lemme see in horizon what that is [20:18:10] ebernhardson ACK, https://horizon.wikimedia.org/project/prefixpuppet/ might be a good place to look [20:18:10] looks like deployment-mwmaint03 [20:18:24] i usually look in the instance list for something named mwmaint :) [20:18:33] (the number changes over time) [20:20:06] for instructions, looks like maybe we should copy them forward somewhere. But looking at https://wikitech.wikimedia.org/w/index.php?title=Search&oldid=2164435 [20:21:14] we want to first try `mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki testwiki --cluster=eqiad-opensearch` [20:21:45] and if that works, `mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki testwiki --cluster=eqiad-opensearch` [20:21:55] assuming that's all fine, we wrap it into a foreachwiki and run it everywhere [20:22:53] ebernhardson nice, shall I do the honors? We can do it in a Meet if you like [20:23:08] sure, go for it. I can join a meet [20:24:33] cool, I'm in https://meet.google.com/fde-tbpf-wqh?authuser=0 ...cc ryankemper in case he wants to join [20:34:36] ebernhardson `RuntimeException from line 87 of /srv/mediawiki/php-master/extensions/CirrusSearch/includes/Job/JobTraits.php: Received cirrusSearchElasticaWrite job with page updates for an unwritable cluster eqiad-opensearch.` [20:37:51] inflatador: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1131440 [20:48:46] inflatador: otw back with dog, will join in 12’ [20:53:01] inflatador: ebernhardson@deployment-mwmaint03:/srv/mediawiki$ watch grep eqiad-opensearch wmf-config/InitialiseSettings-labs.php [20:55:26] https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/rolling-operation.py#L290