[05:53:41] How do I add a proper license to a file on Wikitech? https://wikitech.wikimedia.org/wiki/File_talk:Wikimedia_network_overview.png ? [06:53:18] Is this entirely your creation? [06:54:24] yes [06:56:06] then it is as easy as this: https://phab.wmfusercontent.org/file/data/4pu24fdpvf32ncdrqmvq/PHID-FILE-ttmypxinorlkbgkbzeko/license.png [06:56:19] I recommend cc-by-sa-4.0 [06:57:44] done, thanks! [06:58:07] please note that if you did it as part of your work, at least for me, it's copiright is owned by the foundation and and yourself in its entirety [06:58:55] yep, I don't draw WMF network diagrams outside of my working hours :) [06:59:42] dunno if I can use https://en.wikipedia.org/wiki/WTFPL :) [07:01:35] I think CC0 would be preferred over that :-P [07:37:24] I think one is required to use an OSI-apprived licence [08:45:10] Heads up, we'll be rebooting the phabricator hosts in 15 minutes. They should be unavailable for <10 minutes. [09:07:12] eoghan: is it already rebooted? can I still quickly deploy the latest OpenSSL debs on those, then the reboot directly reload them via the reboot [09:07:36] No, I was just about to do it. I can wait if you want to go [09:07:42] give me 20s [09:07:45] Sure thing [09:08:03] done, please proceed [09:08:12] Good timing. Rebooting now! [09:11:58] phabricator is back now. [09:20:06] debmonitor will be unavailable for ~ 5m [09:28:48] https://www.irccloud.com/pastebin/6BXh7sDt/ [09:28:58] Emperor: ^^ non-ATS test confirming the 502s from swift [10:05:33] Emperor: regarding the logs [10:05:42] https://www.irccloud.com/pastebin/hXAXZwKe/ [10:06:00] that's nginx complaining about not being able to connect to swift-fe on time to serve my request [10:25:36] weirdly distributed amongst the servers (I managed to look at ms-fe2009 and ms-fe2012 which have 0 such errors; ms-fe2010 had 12,445 today) [12:58:36] vgutierrez: Emperor while it is most probably not related, while investigating the thanos-swift and cfssl and tegola issue, I noticed that pn lvs2013 there were some messages from time to time such as Could not depool server thanos-fe2001.codfw.wmnet because of too many down [13:06:33] effie: different cluster, right? :) [13:07:47] effie: that's pybal complaining about pool health, basically the depool threshold is keeping servers pooled even if healthchecks said otherwise [13:08:49] <_joe_> so yeah that is indeed an issue [13:09:05] <_joe_> effie: are those messages ongoing? [13:11:31] _joe_: I think I would see them like 2-3 times in a 24h log, the frequency was not alarming, so I put a pin on it [13:11:50] vgutierrez: sure sure, it is a similar cluster, though, so I thought about mentioning it [13:12:21] IMHO those kind of messages shouldn't be there at all [13:12:38] <_joe_> effie: 1 time is too many [13:12:43] <_joe_> but at least it's not down [13:13:27] uh.. it's expected that thanos-web only has one server pooled per DC? [13:13:46] https://www.irccloud.com/pastebin/zO8dYKkC/ [13:13:49] <_joe_> vgutierrez: who are yo asking that to? [13:14:09] <_joe_> godog / cwhite I guess? [13:14:19] I was in a few rabbitholes at the same time at the, since the timestamps were not lining up with mine, I just made a note to have a look later [13:14:24] that smells like misconfiguration to me [13:16:06] vgutierrez: yes expected, a limitation with mod_sso and not sharing sessions https://phabricator.wikimedia.org/T331512 has more context for the curious [13:16:12] thank you for the ping _joe_ [13:16:13] vgutierrez: I do not know [13:16:17] no wonder lvs is screaming.. 25% servers pooled and depool threshold at 50% :) [13:17:10] <_joe_> maybe we should just set the other servers to pooled=inactive then [13:17:22] godog: "because varnish -> ats backend selection is random," that's definitely not true [13:18:44] <_joe_> especially in esams [13:18:46] <_joe_> :P [13:19:46] on any given DC [13:20:23] in some like esams or eqsin, varnish->ats traffic is always on the same server [13:20:45] on others chashing is used to pick the backend server [13:20:50] but never randomly [13:22:42] vgutierrez: ah! my mistake, thank you for the clarification [13:23:27] vgutierrez: I'll followup next week with more questions/details, I'm definitely interested in having multiple backends for thanos-web (or in general anything that does sso) [20:41:48] anyone else having problems with CI ? [20:42:43] Gearman queue looks high and I see alerts in releng https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=21 [20:52:23] inflatador: I pushed a stack of patches which makes it slow, sorry. [20:54:33] James_F no problem, thanks for letting me know. Now I have an excuse to quit working ;P [20:54:39] * James_F grins. [20:57:46] Ah James answered here [20:57:55] Blame James is good [20:58:19] It's generally correct, in all things. [20:58:37] Heh [20:58:53] If it's not you then it's often Re.edy [20:58:59] At this time you though :) [20:59:10] Fair. [21:08:01] when all else fails, blame Domas