[03:49:00] something seems very wrong with the search on wikipedia. pleasae have a look at https://phabricator.wikimedia.org/T393663 as soon as you can. [08:20:12] search flowing to opensearch@codfw [08:38:03] lesson learned, it's not good to leave completion index updates disabled for too long :/ [09:31:25] wdqs users are starting to notice the impact of a change we made to the rdf output ( https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#SPARQL_query_for_family_name_(P734)_of_Adam_(Q70899)_returns_both_%22no_value%22_and_%22unknown_value%22 ) [09:31:42] dcausse: any follow up on the completion issues? or are we good [09:31:50] gehel: we're good [09:32:37] I'll write a small script to reconcile affected wdqs items but might be good to schedule T386098 in the next couple of weeks [09:32:38] T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels - https://phabricator.wikimedia.org/T386098 [09:40:53] err... completion index problem might not be solely related to updates being disabled... [09:41:14] eqiad has only 5000886, codfw 10336191 [09:43:43] could be a bad run and then we stopped updating... but leaving a partial index live is not good :/ [09:44:11] will file a task to investigate what happened [09:48:28] sigh... not finding logs of the past eqiad run in mwmaint1002... [09:51:39] we had weird behaviors in the past with scrolls in mixed clusters (https://github.com/elastic/elasticsearch/issues/25158) perhaps something similar happened and a bunch of pages got skipped? [09:59:30] perhaps T363521? [09:59:31] T363521: Completion suggester can promote a bad build - https://phabricator.wikimedia.org/T363521 [10:07:17] only seeing https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.05.07?id=OJUeq5YBfOjk-Vo1yy77 but that should have stopped the script and not promote the index [10:21:33] lunch [10:30:45] lunch 2 [13:17:31] o/ [13:54:36] Created T393709 to talk about hosting autocomplete indices somewhere else [13:54:37] T393709: Consider hosting autocomplete indices in a separate OpenSearch cluster - https://phabricator.wikimedia.org/T393709 [14:06:25] \o [14:07:10] o/ [14:10:54] .o/ [14:19:45] CR for fixing up conftool after it changes: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143589 [14:20:14] err...with the changes for cirrussearch eqiad, that is [14:39:00] trying to use pyspark as a quick&dirty script to extract a couple lines to stdout... not really great, passing the script as stdin but it's still in in a kind of interactive mode [14:40:48] I'm not seeing our `cirrus_check_settings.json` files in `/etc/opensearch/production-${n}` anymore. Checked 4 hosts so far. Wonder if we goofed up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1140519 somehow? [14:41:48] oh wait, nm. That file's only on the masters [14:44:59] realized that I can just run spark3-submit script.py... [14:54:07] should probably use pyarrow and open parquet files for these simple data extraction... [15:05:00] yea spark3-submit is the ticket, but indeed parquet can be even easier, although sometimes HADOOP_CLASSPATH needs to be set for it [15:49:06] I finally got tired of typing `curl xyz` over and over again to get node status/health and made a few crappy bash functions. This is just a starting point, if y'all think of a better way to do this LMK. PRs welcome! https://gitlab.wikimedia.org/repos/search-platform/searchme# [15:51:00] pondering https://gerrit.wikimedia.org/r/c/operations/puppet/+/1142693 ... it seems the intent is we should have a discovery-dns for each cluster? so search.discovery.wmnet, search-chi.discovery.wmnet, etc. [15:55:33] yes.. port is not a thing apparently? [15:56:27] i think joe is saying port is a thing, but discovery-dns should be per-cluster. If they are different clusters, they should use different names [15:56:43] sure [16:07:17] workout, back in ~40 [16:54:01] back [16:58:53] re: the envoy patch, I think setting up multiple discovery records is a good idea as well. I'll get a ticket started for that [16:59:27] inflatador: i made a patch already, docs are pretty short suggest this is all thats needed: https://gerrit.wikimedia.org/r/c/operations/dns/+/1143617 [17:00:11] {◕ ◡ ◕} [17:01:07] * inflatador wonders if this means we'll need new SAN names [17:01:57] sadly, probably yes. more names [17:02:53] Yup, confirmed. No problem, CFSSL makes it easy [17:05:35] i waffled, but in the end added search-chi as well, it seems like ideally we should move away from the unprefixed name [17:13:56] Yeah, agreed [17:35:15] 3k completion/s, 1.2k fulltext/s not sure we can sustain that :) [17:36:06] yeah I wish the vatican could have waited for the opensearch upgrade to finish up [17:45:33] :P [17:46:13] Looks like things are on the down-trend. Interestingly, we never got the hot spot problems on individual hosts like we've been seeing lately in eqiad [17:55:41] dinner [17:59:25] how odd... https://commons.wikimedia.org/wiki/Special:MediaSearch?search=deepcategory%3A%22Manufacturing+by+product%22&type=image works, but https://commons.wikimedia.org/w/index.php?search=deepcategory%3A%22Manufacturing+by+product%22&title=Special:MediaSearch&type=image does not [18:08:06] hmm, it's not the url that matters, can fail on same url. I suspect it's failing and the warning isn't being forwarded to the user [18:13:10] lunch, back in ~40 [20:30:39] inflatador: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1143670 patch to shift over some wdqs-full hosts to wdqs-main [20:31:03] 👀 [21:53:10] ryankemper here's my handoff, I'm heading out for the day: https://etherpad.wikimedia.org/p/handoff-wdqs-T388134 [21:53:10] T388134: Drop support for the full Wikidata graph from query.wikidata.org - https://phabricator.wikimedia.org/T388134