[03:45:54] Looks like Europe DST switched today, so for me the triage meeting and the SRE meeting are at the same time [03:46:36] SRE meeting is important so I'll probably just send an email w/ some async update notes wrt triage [06:58:06] hello folks! [06:58:19] there are some little issues with: [06:58:38] 1) wdqs1007 - blazegraph seems returning 503s (at least from icinga's perspective) [06:59:18] 2) relforge nodes have warnings in puppet due to the absence of the /usr/share/kibana/optimize dir (needed to create /usr/share/kibana/optimize/bundles/stateSessionStorageRedirect.style.css) [06:59:39] could you check when you have a moment? [06:59:43] Cc: inflatador, ryankemper --^ [07:11:04] elukey: thanks for the heads up! I went ahead and restarted blazegraph on `wdqs1007`. I'll defer the investigation on `2)` until the morning since it's not immediately obvious what's going on there (and relforge isn't userfacing etc) cc: inflatador [07:16:40] ack! [09:49:46] Dont' know if y'all saw this but it looks like wdqs 1007 is lagged a lot after the restart; it would probably be good to depool it while it catches up :) https://phabricator.wikimedia.org/T322010 [09:52:47] tarrow: sure depooling [09:52:48] tarrow: Thanks! I've just depooled it [09:52:55] cheers! [09:53:07] but a single host lagging should not cause the max lag to kick in [09:55:04] dcausse: interesting; I'm now rather detached from the details but my understanding was we used the most lagged pooled server to calculate it [09:55:37] ah ok so something has changed, I think it was the median of the servers in the past [09:55:47] see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikidata.org/+/845016/ / https://phabricator.wikimedia.org/T238751 [09:58:17] o/ [09:58:26] but I also know basically nothing, I just spotted that we maxlag was high this morning; maybe Lucas_WMDE has some actual knowledge :D [09:58:44] unfortunately I have a meeting now, but I could talk about it in an hour or so ^^ [09:58:49] sure :) [09:59:05] > but a single host lagging should not cause the max lag to kick in [09:59:25] > [...] something has changed, I think it was the median of the servers in the past [09:59:59] (just from git log) maybe https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikidata.org/+/589873 ? [10:00:10] * Lucas_WMDE afk now [10:01:15] I see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikidata.org/+/845016/1/src/QueryServiceLag/MostLaggedPooledServerProvider.php [10:01:25] so perhaps this was on purpose [10:55:39] lunch [10:55:57] dcausse: I might ping you early afternoon to catch up after 2 weeks of vacation... [10:56:04] Or pfischer_ [10:59:06] gehel: sure [11:00:58] * Lucas_WMDE catches up [11:02:03] ok looks like you found out what’s going on (and https://phabricator.wikimedia.org/T322010#8355973 sounds reasonable to me) [11:03:11] Lucas_WMDE: yes I think this explain what changed, and why we've hit max lag today (such cases should not have triggered max lag previously) [11:54:44] lunch [13:03:42] dcausse / pfischer_ : ping me if you have a moment to bring me up to speed on what happened for the last 2 weeks [13:16:10] gehel: I'm around [13:16:47] dcausse: meet.google.com/zjj-fuax-gqr [13:20:09] inflatador: p/ [13:20:35] inflatador: o/ [13:20:48] gehel welcome back! [13:20:55] glad to be back! [13:55:56] inflatador: https://meet.google.com/uxq-smei-nsa [15:00:42] \o [15:01:35] o/ [15:13:10] will not make sprint planning, I have a personal development course at that time [16:03:09] ryankemper, pfischer_ : triage meeting? [16:03:56] gehel: See my irc message from 9 hours ago, due to the DST change the SRE meeting is at the same time so I'm in that meeting. I sent some updates via email a few minutes ago [16:04:08] Oh right! [17:10:44] cindy is almost working...except one test that fails in the test suite but looks fine when i pull up the search page :S [17:11:04] tried a few things but no luck yet...will ponder and get back to it [17:13:18] oh, actually just saying that made it more obvious :P It's looking for the result to have an image, and it does, but it's different now because of the new thumbnailing bits in core [17:46:25] dinner [18:25:52] ryankemper: too many meetings today already. Any objection to skipping our 1:1 later today? [18:26:35] gehel: no better welcome back than a slew of meetings :P no objection [18:31:48] thanks! [20:15:55] back