[04:07:01] :Q [06:18:20] hello folks [06:18:42] an-airflow1001 is again showing the root partition filled at ~94% [06:19:05] this time apt-get clean doesn't help since there is not much left to clean [06:19:40] the space used seems mostly logs-related, and we have a systemd timer on the vm that cleans files older than a month [06:19:53] we could reduce the retention to say 21 days? [06:21:02] I don't really know the situation, but Erik mentioned yesterday that there's an issue with airflow db reconnect and logging - this is the same thing? [06:21:50] ah yes there is also https://phabricator.wikimedia.org/T283856 [06:22:09] yep, I was just looking for that ticket [06:22:49] I can manually run the find + delete command as one off to clear out some logs now, say go down to 27 days or similar [06:23:02] just to avoid issues until the whole team decides what to do [06:23:05] does it sound ok? [06:23:15] sure, let's do that [06:23:17] it will basically drop the oldest 2/3 days of logs [06:23:19] ack [06:23:21] doing it [06:23:51] it won't solve the issue in the long run, since, as I understand it, logs can grow in size very rapidly, but that's what the ticket is for [06:23:56] thx [06:24:13] I am basically copying /usr/local/bin/airflow-clean-log-dirs into my home and reducing a little retention until we get under 90% [06:24:20] ah yes sure it is a sneaky issue [06:26:23] root partition usage now 88%, I had to lower the retention to 21 days [06:26:34] (as one off) [06:26:37] all good [06:53:44] thx, we'll take care of the root cause [08:25:23] tangentially related to our work: https://techblog.wikimedia.org/2021/06/07/searching-for-wikipedia/ [09:01:53] relocating [09:04:25] errand [09:43:40] lunch [10:16:30] lunch [11:14:43] dcausse: I'm not sure how the script I modified to have query completion template in place factors into tests - can I assume tht a correct template is there, or should I run the script during the test? [12:28:29] zpapierski: you'll probably have to orchestrate all that in the scripts run by cindy [12:29:20] I'd use a help with that, once I'm done with writing the scripts [12:29:37] but that's probably for tomorrow, for now I know how to proceed [12:30:00] see tests/jenkins/ [12:53:15] meal 3 break [13:52:18] errand [14:56:06] \o [14:56:11] o/ [14:56:15] zpapierski: for cindy, the easiest place would probably be the resetMwv.sh script [14:56:35] thats supposed to reset the vagrant instance to a state that can run the suite [14:57:01] o/ [14:57:02] that's not an issue, reloading the index itself was trivial [14:57:18] zpapierski: i mean thats what should create the template indexes [14:57:30] or the templates themselves i mean [14:57:32] aaa, right, I was asking that question [14:57:41] does this happen during jenkins run? [14:57:54] jenkins can't run our tests, it doesn't have multi-wiki support [14:58:09] I mean, I have no issue with running the template update on test [14:58:21] cindy runs these, and uses resetMwv.sh to reset the instance before running a patch [14:58:23] ah, right, Cindy comments on CI like jenkins, but isn't jenkins [14:58:49] yup [14:59:00] ok, so I should add template update there, if it doesn't happen already, noted [14:59:58] also for extra fun, releng forced the cirrus nodejs jobs from node 10 to node 12, which means cindy and that whole suite have to upgrade from node 10 to node 12. And for even more fun, to make the repo mergable means cindy can't run [15:00:12] (you can't merge anything to cirrussearch today) [15:01:15] I wasn't going to, but what's next? [15:01:49] have to upgrade the test suite from node 10 to 12 [15:02:30] :/ [15:03:13] it's extra fun, because the upgrade through webdriverio v5 was a heavy rewrite and the quickstart guide says "start a new project and copy the structure, but not the files, from your pre v5" [15:03:19] but i'm going to try and not do that :P [15:05:14] good luck... [15:05:41] dropping off to learn some weird language [15:12:12] ryankemper: https://gerrit.wikimedia.org/r/c/operations/puppet/+/698546 is almost ready to merge. I think we usually want a linked phab task to track approvals for permissions changes. Not sure what the current process is. Could you check? (cc tanny411) [15:17:26] gehel: I didnt create any phab task for this, just had a chat with dcausse. [15:18:24] tanny411: I think we usually want an audit track for permission changes. Ryan should know. And sorry for the delay on this :/ [15:19:08] tanny411: it's probably the last permission change you need, but if there is a next one, please harass Ryan until he makes it happen! [15:19:12] :) [15:19:50] no problem :) [15:20:15] gehel: a task to the work requiring this perm or a dedicated task for the perm change? [15:20:53] dcausse: We used to require a dedicated phab task. Let me see if I can find some doc [15:21:13] sure I'll start one [15:23:22] Oh, but this is only access to analytics cluster, so not production access as such [15:25:03] yes this is "just" granting another group on the analytics cluster [15:27:30] filed T284575 [15:27:31] T284575: Add akhatun to analytics-search-users group - https://phabricator.wikimedia.org/T284575 [15:28:36] Andrew is merging it [15:28:42] thanks! [17:27:40] dinner [17:51:28] * ebernhardson closes yesterdays hatjitsu room, and notes he's not the last one there :) [17:51:49] it wasn’t me this time for once :P [19:31:40] ebernhardson: I'm thinking of removing the ip block in an hour or so and seeing if we get swarmed with traffic again. Any objections? [19:34:53] ryankemper: should be ok [22:12:54] ebernhardson: Tried lifting the block and we're still getting flooded. We'll definitely have to figure out a mitigation before we lift the block [22:16:20] :( [22:22:37] https://grafana-rw.wikimedia.org/d/qrOStmdGk/elasticsearch-pool-counters?viewPanel=2&orgId=1&from=1623188913102&to=1623190914952 Can see how fast it spikes back up lol [22:23:00] Guess they didn't get the hint [22:24:43] heh, indeed.