[08:25:18] Skipping triage tmrw. Just doing graph split stuff mon/tues so not much to update [13:09:59] pfischer: I'm looking at new tasks in our board and found T380572. Do you have a bit more context? [13:09:59] T380572: SUP: Reduce Metrics - https://phabricator.wikimedia.org/T380572 [13:58:47] o/ [14:16:05] I was playing around with opensearch security over the weekend...I didn't have much luck running on baremetal as opposed to docker-compose. Might have to compare notes with e-bernhardson [14:16:26] Or if anyone else has done it LMK. [14:48:31] Created T380752 for the Relforge opensearch migration...I also associated this chain of patches ( https://gerrit.wikimedia.org/r/c/operations/puppet/+/1090529 ) with the new task. Happy to re-organize if anyone objects [14:48:31] T380752: Migrate Relforge to Opensearch - https://phabricator.wikimedia.org/T380752 [14:51:44] \o [14:52:50] hmm, 1068 emails in my data-engineering alerts folder. sounds like a good weekend :P [14:55:06] I cleared some tasks (mainly *_streaming_updater_reconcile_hourly) because of canary events issue [15:12:09] .o/ [15:24:51] Any objections to removing Search Platform from ProbeDown alerts such as T379182 ? Data Platform SRE would still get these [15:24:52] T379182: ProbeDown - wdqs1015 - https://phabricator.wikimedia.org/T379182 [15:28:28] seems reasonable to me [15:28:39] +1 [15:34:37] Cool, will get started on a patch [16:02:02] dcausse: we're in https://meet.google.com/eki-rafx-cxi [16:02:09] oops, joining [16:17:03] pfischer: is there more work to do on T374702 ? [16:17:04] T374702: Cleanup: Remove deprecated weighted tag methods - https://phabricator.wikimedia.org/T374702 [16:57:53] sometimes i start the mediawiki test suite without a filter...and then i remember why not: 59 / 52384 [16:58:23] :) [17:03:49] errand, back in ~40 [17:37:26] ryankemper: found out we need some followup on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1094484 . Removing the resource didn't delete it from the server. I could put up an ensure=>absent patch, but it should be just one host [17:37:49] * ebernhardson should know by now that removing a thing in puppet doesn't remove it [17:40:00] ebernhardson: so on `snapshot1016.eqiad.wmnet` i manually delete `cirrussearch-dump-s11.[timer,service]`? [17:40:46] ryankemper: yea, please [17:41:32] ebernhardson: oh wait is it all the shards or just 11 [17:42:02] oh yeah looks like just 11 [17:42:03] inflatador: when you have a moment could you deploy 0.3.150 (https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/1091290) on wcqs* machines [17:42:13] ryankemper: yea just 11 [17:42:33] inflatador: I believe we deployed that last week. are you seeing that missing? [17:42:39] er [17:42:42] s/inflatador/dcausse [17:43:03] ryankemper: yes, wcqs machines were not updated [17:43:23] they're still running old artifacts for both the blazegraph service and the updater [17:44:10] ryankemper yeah we talked about doing at one of our pairing sessions last wk [17:44:46] looking at wcqs1003 I still see: blazegr+ 68231 1 0 Nov20 ? 00:29:10 java -cp /srv/deployment/wdqs/wdqs/lib/streaming-updater-consumer-0.3.147-jar-with-dependencies.jar [17:51:05] ebernhardson: done [17:51:15] looking at wcqs, very confused because im pretty sure i deployed it last week [17:51:42] ryankemper the SAL says you did https://sal.toolforge.org/log/ABmfRpMBKFqumxvtF-WJ [17:52:42] oof i found my terminal window i never pressed c to continue beyond the canary [17:52:51] big fail [17:54:54] okay, deploy will be done in a few mins [17:55:41] thanks! [17:56:16] dcausse: done [17:56:20] thx! [17:56:46] ryankemper: might doing a quick rolling-restart of the updater service? [17:57:11] s/might/mind [18:00:23] dcausse: done [18:00:28] thanks! [18:04:37] Re: T219507 , I think we can close as the new SUP has a non-cook-book way of reindexing? https://wikitech.wikimedia.org/wiki/Search/CirrusStreamingUpdater LMK if y'all disagree [18:04:37] T219507: Create cookbook to reindex into elasticsearch / cirrus - https://phabricator.wikimedia.org/T219507 [18:05:26] inflatador: yes, cirrus-reindex-orchestrator is pretty a much a "cookbook" [18:06:51] we need to update the doc tho [18:07:08] Taking dog out [18:07:32] dcausse ACK, will point docs to https://gitlab.wikimedia.org/repos/search-platform/cirrus-reindex-orchestrator unless you wanted to handle it [18:09:23] inflatador: sounds good to me, thanks for taking care of this! [18:09:47] dinner [18:20:29] ebernhardson re backfill process above, I updated https://wikitech.wikimedia.org/wiki/Search/CirrusStreamingUpdater#Backfilling with a link to https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/cirrus-streaming-updater/README.md ...it doesn't specifically call out reindexing as opposed to backfilling, LMK if we should add something for reindexing [18:22:18] inflatador: i guess i could improve the docs, there is the top level entrypoint (python -m cirrus_reindexer) for backfilling, and then another one (python -m cirrus_reindexer.reindex_all) for reindex+backfill. The names are all terrible :P [18:23:11] in theory you can also manually issue a backfill with helmfile, thats probably documented somewhere as well [18:30:32] ebernhardson ACK, I pointed the docs to the backfill via helmfile instructions in the README, but I wasn't sure about reindexing . Don't worry too much about the names :P [19:38:11] Just removed the last graphite-based panel from https://grafana.wikimedia.org/goto/bf6KnLnHR?orgId=1 [19:51:01] \o/ [19:58:09] https://grafana.wikimedia.org/goto/kA7E4L7Ng?orgId=1 needs a little work, after that we should be good [19:58:16] appointment, back in ~90 [21:36:05] back [21:59:13] re: metrics update, I'm seeing slight discrepancies between https://grafana.wikimedia.org/goto/-X01pY7Hg?orgId=1 (prom) https://grafana.wikimedia.org/goto/6fObpL7NR?orgId=1 (graphite) . LMK if y'all think this needs tweaking, or if it's good enough. Will raise again with d-causse when he's back tomorrow