[02:16:42] wrt the streaming updater: all of codfw internal is done, which leaves the rest of codfw public wdqs and all of eqiad left [05:57:57] awesome :) [07:09:46] gehel: we could trigger next transfers [07:10:07] I still need to finish my breakfast, but if you have time, we can sync up [07:10:17] with dcausse as well, once he's not away [07:13:01] dcausse: ping us when around! [07:13:27] it looks like T288231 has not been updated with completed transfers [07:13:28] T288231: Deploy the wdqs streaming updater to production - https://phabricator.wikimedia.org/T288231 [07:15:05] it has been now [07:15:35] you were faster than me! [07:16:01] :) [07:16:25] wdqs1009 isn't yet ready, so codfw is the only option [07:17:04] https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1&var-site=codfw&var-k8sds=codfw%20prometheus%2Fk8s&var-opsds=codfw%20prometheus%2Fops [07:17:05] and it's probably going to need 5 or 6 h to be ready [07:17:12] more or less, yeah [07:17:47] we estimated around 5PM UTC [07:29:22] gehel: I'm around [07:31:49] regarding data-transfer at a glance I agree with has been said, looks like codfw can continue, eqiad is still stuck on wdqs1009 catching up [07:34:55] I'll start the transfer from 2008 -> 2001 [07:35:20] ryankemper: I'm reusing your existing tmux session so you can easily take over later today [07:35:35] I'm here as well [07:35:54] do we need to discuss anything else before starting this transfer? [07:35:58] yep [07:36:12] https://etherpad.wikimedia.org/p/streaming_updater_cutover [07:36:40] we switched order a bit, since doing the puppet during the cookbook run broke the process [07:37:08] the idea now is to stop the updater from starting, merge& apply the patch and start the cookbook [07:37:26] ok, will do [07:37:32] it means we'll stoping an update process for a pooled instance, but it's only for few seconds before depooling [07:38:34] rebasing https://gerrit.wikimedia.org/r/c/operations/puppet/+/730796 [07:40:46] patch merged, running puppet-agent [07:41:20] cookbook started [07:41:45] ok, it should take about 6h, so we can do another one later on [07:43:24] we estimated that codfw should be done by Tuesday, later on we can fire up eqiad tranfer as well :) [07:44:30] looking at previous run I think it takes 1h30, 1h15 for the transfer and 15min to catch-up [07:44:43] really? that's waay shorter than we though [07:44:55] in that case, we can finish codfw today [07:45:09] I'm guessing eqiad will take more time, though [07:46:29] hard to tell, eqiad machines are older so perhaps [07:46:43] that and there are more of them [07:46:52] ah yes sure [08:05:51] I need to say Gnome is growing super quickly on me, though it required some extensions first [08:08:05] it's seamless I rarely feel I use a desktop env but might that I'm used to it [08:08:35] I suppose that's probably the best thing you can say about DE, it doesn't get in your way [08:09:53] anyway, with per screen scaling (especially fractional scaling) this is all finally comfortable again [08:10:42] yes... these are the "hard features" [09:06:15] transfer completed for wdqs2001 [09:06:39] but not catchup, right? [09:07:21] It does not look like it, but I thought that the cookbook would be waiting for catchup before repooling [09:07:27] yep, both are catchup up right now, according to graphs [09:07:36] so did I, it doesn't? [09:08:17] https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-transfer.py#L183 [09:08:27] perhaps there's a tolerance [09:08:43] does not look catched up to me: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-24h&to=now&var-cluster_name=wdqs [09:08:52] it's not [09:09:13] ok, so the script is suppose to wait, from what I see [09:09:16] it does not wait in fact [09:09:19] pool only happens after the wait [09:09:25] depooling 2001 until it catches up [09:09:45] and 2008 as well [09:09:47] yep [09:09:57] it just throws an exception if the lag is too high to let the operator repool manually I guess [09:10:26] well, 2008 is already at lag < 10' [09:10:41] as it should be, process took 1.5h [09:10:43] gehel: it's already back to normal not sure it's worthwhile to depool [09:10:44] and 2001 as well [09:10:51] damn, that's fast! repooling all that [09:11:01] whoa [09:11:12] nice work, well, us [09:11:45] yeah! that's pretty cool! [09:12:14] let's start 2008->2002 [09:12:24] agreed [09:14:26] is there an easy way to force backend, or do I did to tunnel? [09:15:03] you need to tunnel [09:15:48] data transfer started for wdqs2008 -> wdqs2002 [09:15:56] thanks! [09:33:31] early lunch [09:33:46] ejoseph: I'll be back around 2pm CEST [09:45:09] taking a break as well [10:33:57] lunch [11:44:18] wdqs2002 transfer completed [11:44:40] including catchup [11:44:45] we can start another one [11:44:56] time for 2003! [11:45:43] it turns out we will actually do the codfw transfer before eqiad :) [11:45:49] so much for plans [12:27:36] ejoseph: are you back? how's the laptop? [12:33:10] They changed it [12:33:22] I’m still on my way back home [12:33:24] cool! So brand new one! [12:34:12] with 32GB of RAM, which I think we all agree should be a bare minimum for any developer :P [12:34:20] (yes, I'm very subtle) [12:46:06] lunch break [13:20:11] transfer completed for 2002 [13:20:35] s/2002/2003/ [13:21:18] and wdqs1009 is catched up [13:21:27] cool! [13:22:06] getting ready for 2004 [13:24:55] dcausse, zpapierski: are you feeling brave enough to start a transfer in eqiad as well? [13:25:27] gehel: starting the internal cluster should be low risk [13:25:46] it's in this order in the patch chain anyways [13:33:14] starting transfer wdqs1009 -> wdqs1003 [13:33:23] thanks! [13:33:57] slightly annoying: 1009 isn't behind LVS, but 1003 is, that's a case that the cookbok does not support: depooling 1003 manually (and let's not forget to re-pool) [13:34:48] ah we should use 1003 as a source for next ones probably [13:43:37] yeah, that would be simpler [13:54:35] too late to answer, but definitely brave enough [13:54:59] I see our estimate on wdqs1009 was a bit too conservative [14:06:41] gehel: I did this - https://gerrit.wikimedia.org/r/c/wikimedia/irc/ircservserv-config/+/731106 - what are the next steps for irc configuration? [14:07:31] done! [14:07:34] ok, very little steps then :) [14:07:42] how is it applied? [14:07:56] s/little/few [14:08:11] There might be a command to send to ircservserv-wm to get the config change applied [14:09:38] !isspull [14:09:45] ok, from what I see, only channel founders can do this [14:09:48] !issync [14:09:49] Syncing #wikimedia-search (requested by gehel) [14:09:50] No updates for #wikimedia-search [14:10:06] huh [14:10:23] no changes, maybe it takes ime [14:10:24] !isspull [14:10:27] s/ime/time [14:11:06] read the instructions please https://meta.wikimedia.org/wiki/IRC/Bots/ircservserv#Making_changes [14:11:12] this !ispull needs to be executed on ops channel [14:12:18] * gehel is reading the instructions and still isn't super clear on what's missing [14:13:03] gehel: ispull goes into -ops channel [14:13:18] Oh, is that in the doc? [14:13:21] yep [14:13:21] and you would need to be on this array https://github.com/wikimedia/wikimedia-irc-ircservserv-config/blob/8c553ef393c093fe001464427ff4df6916cba010/config.toml#L9 to do it [14:13:22] * gehel needs to learn to read [14:13:22] !isspull: Pull configuration updates that have been merged into Git. Should tell you which channels potentially need syncing. This command should be run in #wikimedia-ops. [14:14:14] Oh, ok, I misread that doc, I am only elligible to be trusted, but not actually trusted [14:14:34] majavah: any chance you could pull that config for us? [14:14:39] {{done}} [14:14:43] thanks! [14:14:43] ah, I misread that part as well [14:14:50] !issync [14:14:51] Syncing #wikimedia-search (requested by gehel) [14:14:52] Set /cs flags #wikimedia-search ejoseph +AVfiortv [14:15:30] -NickServ- ejoseph is not registered. [14:15:35] huh? [14:15:43] yeah, we'll get him to register his nick [14:15:47] ejoseph: did you register you nick with NickServ? [14:15:48] he's still onboarding [14:16:04] if not, you should do so as quickly as possible, for various reasons [14:16:26] ejoseph: and you should probably ping one of us for an introduction to the joys of IRC [14:16:59] do we issue a stern warning when people apply about IRC? I think it should be in the job description :) [14:17:05] quick errand to get some food for dinner [14:17:27] I did not @zpapierski [14:18:07] there is a point about it on the checklist, but if you still need an assist, I'm happy to help [14:18:13] I am home now [14:18:30] I am trying to setup [14:18:32] with your new and shiny mac I presume :) [14:21:21] zpapierski: it should be in the standard Wikimedia benefits footer :) [14:21:36] it == irc [14:21:52] what is that they say about one man's treasure ;) ? [14:22:22] https://xkcd.com/1782/ [14:22:43] one of my favs :) [14:46:14] * gehel did not think that getting a COVID test for a 6 year old would be that complicated [14:48:40] going offline, have a nice week-end! [14:49:42] dcausse: enjoy! [15:00:32] \o [15:00:37] o/ [15:00:59] zpapierski: oauth proxy works locally, ship it? :P [15:01:41] I'm doing the review (sorry, sat down a bit to late to gerrit today) [15:02:02] tis ok, i'm sure there's a mountain of problems still [15:03:31] I'm so happy to see that session store gone [15:09:10] 2004 and 1003 completed transfers, waiting for catchup [15:13:13] nice, it's going really well [15:13:39] if we'd automate it, we could migrate eqiad over the weekend ;) [15:14:47] keep that idea for next time! [15:15:01] But yes, it would be nice to have a fully automated system [15:17:33] re-pooling 1003 [15:18:08] and getting ready for the next batch: 1008 and 2007 [15:19:54] will use 1003 as a source instead of 1009 to work around the depool limitations of the cookbook [15:24:07] data transfer started for 1008 and 2007 [15:34:32] ryankemper: I'll let you take over from there [15:35:00] gehel: thanks! [15:35:01] ejoseph: we are in our unmeeting if you want to join: https://meet.google.com/xgq-wvik-dkp [15:40:17] my internet sucks [15:40:29] i cant load simple pages [15:40:39] :( [15:50:12] ebernhardson: apart from the test patch and maven stuff I'm refering to gehel, LGTM [15:50:24] test I don't yet fully understand, but it is marked WIP [15:52:41] zpapierski: it's not any kind of automated testing, it's just a script to start kask+cassandra+jetty [23:29:37] It's absolutely insane how fast the new streaming updater catches up on lag [23:29:39] very exciting