[01:58:37] legoktm: I'm late, finally watching https://commons.wikimedia.org/wiki/File:How_to_Run_a_Top-10_Website,_Publicly_and_Transparently.webm and it's a great talk! thanks for doing it [01:58:52] :))) thanks [08:51:19] I keep running into this dashboard, which now links to a non-existent dashboard, can that be deleted? CC vgutierrez [08:51:27] https://grafana.wikimedia.org/d/000000557/prometheus-varnish-http-errors [08:52:20] what's pointing you to that dashboard? [08:52:30] first time in 4.5 years that I see it [08:52:31] nothing [08:52:41] ah [08:52:45] you mean how I reach it? [08:52:48] yes [08:53:02] I search for http or varnish or errors and it is the top result every time [08:54:18] jynus: I guess this is the one you're looking for (young padawan): https://grafana.wikimedia.org/d/000000503/varnish-http-errors?orgId=1 [08:54:58] what I mean is- if the redirect is there for a reason, it should probably be fixed, otherwise maybe deleted? [08:55:17] (I can do it myself) [08:55:48] dunno why the redirect dashboard is there [08:55:52] so I cannot answer that question [08:56:17] sure, just ask it around on your team, no rush [09:06:21] I'm guessing that's from the migration to prometheus [09:06:31] so at some point in time we had two versions of the same dashboard [09:10:42] and after the migration the old one was removed and the prometheus one took the name of the original dashboard [09:15:57] Hello, Are you interested in making $1,500 plus additional $500 for diligence and hardwork in two weeks (legit) by sparing just 15/30 minutes of your time every 48hrs without no start up fee ? If yes get back to me for more details [09:15:57] https://t.me/+rRk2pLTli4AzYjk0 [10:07:42] anyone is doing anything with phabricator? I'm getting logged out after any action [10:10:00] dcaro_away: which cp servers are you hitting? [10:11:21] vgutierrez: no idea, how do I check? [10:11:53] it seems I'm not the only one also, dhinus are you still having the logout issue? [10:12:18] yes same problem here [10:13:16] should we open an incident, is this causing to make impossible any interaction with phabricator? [10:14:09] any logged in interaction yes (adding comments, etc) [10:15:41] dcaro_away: It did the same to me, but now it's working [10:16:10] using chrome works for me [10:16:20] oh, maybe it's sorted now then xd [10:16:23] Oh, nope, it screwed up again [10:16:39] i'm logged in still here [10:16:49] I managed to squeeze a comment and close a task in the same action, then I reloaded another page and I was logged out [10:16:51] https://www.irccloud.com/pastebin/VkwOnhJZ/response-headers-rhinosf1 [10:17:31] yep, now it works [10:18:17] xd, seems intermittent [10:18:28] Ticket roulette [10:18:29] vgutierrez: do you belive it is something you are testing, or just did it out of precaution [10:18:47] out of precaution [10:18:56] if RhinosF1 is affected, my experiment is unrelated [10:19:11] as from his headers he is reaching phabricator via esams [10:19:16] and I was running the experiment in a single drmrs node [10:19:52] ori: ^^ could this be related to the ramp up in query parameters sorting [10:20:13] https://docs.google.com/document/d/1Ka9MQB8OwdzAzJVfZuaIGo5VfnyRNRr_WxLPZ6YFMkE [10:20:27] who is knowlegable about phabricator? [10:20:28] vgutierrez: i am still logged in [10:20:41] RhinosF1: ok [10:20:54] dcaro_away: could you paste x-cache / x-cache-status headers from your browser? [10:21:16] ack [10:21:54] please someone log in to phabricator host and check apache logs / error logs [10:22:24] moritzm: ^^ [10:22:27] someone to check nel [10:22:50] and I will try to contact someone from releng [10:24:12] x-cachecp6010 miss, cp6012 pass [10:24:14] x-cache-statuspass [10:24:31] (it was way harder to copy the headers from firefox than expected xd) [10:25:12] dcaro_away: please use phabricator with the inspect tab open and paste the headers when the issue is triggered (if that happens) [10:25:53] cp6012 is the varnish instance that you're getting and it will be the same one as long as your IP doesn't change [10:26:12] but varnish picks ats-be instances from the same DC based on the URL [10:26:31] vgutierrez: I'll try, seems to be working now for me [10:26:56] vgutierrez: Phab also works fine for me, I had just left a comment there and I'm also going via Esams from Germany [10:27:00] dcaro_away: you can log int? [10:27:13] log in you mean? [10:27:19] yes, sorry [10:27:31] and perform actions? [10:27:45] right now yes [10:28:10] Me too. [10:28:23] I won't try to log out. But it appears fine at the moment. [10:29:02] can confirm [10:29:17] https://phabricator.wikimedia.org/T316337 [10:29:32] I will reuse that to report about the temporary issue [10:30:29] jynus:thanks [10:30:32] so seeing that affected folks are being mapped to drmrs it could been triggered by my experiment [10:30:45] ok, let's follow up on ticket vgutierrez [10:31:04] I don't think this was an outage, given the limited impact and time [10:31:11] but worth following on a ticket [10:31:27] o/ [10:31:40] jynus poked me and I reached out to andre cause well he knows more than a few things about Phabricator [10:31:46] I felt it could be helpful [10:32:08] logged out and in again, seems to be working fine now [10:32:29] thanks, seems to be a temporary glitch, please help me report on ticket (writing) [10:33:14] the one above is closed as invalid (test ticket) do you have another one? [10:33:29] I don't see any pointers in https://phabricator.wikimedia.org/people/logs/query/all/ , but indeed a lot of folks seem to get stuck at "Login: Partial Login" [10:33:40] dcaro_away: I am on it [10:33:45] πŸ‘ [10:35:15] andre: hashar thanks for the help, based on vgutierrez it seems at first instance traffic related [10:35:29] but pinging for awareness, as you may get reports of people being logged out [10:35:30] yep [10:35:43] thanks [10:35:51] cp6016 was stripping the phab session cookie and returning a hit for an anonymous user [10:35:54] I wasn't even aware Phabricator was being the cache infra (which is a good thing!) [10:36:01] so that's why you got unlogged [10:36:10] dcaro_away: vgutierrez: https://phabricator.wikimedia.org/T316337 [10:36:10] hashar: πŸ˜… [10:36:15] awesome [10:36:29] I am very happy that got figured out so fast :] [10:36:40] vgutierrez: impact only on phab, could other services have been affected? [10:36:57] maybe add a test to cover that feature for the next time? :] [10:37:25] jynus: any authenticated app that doesn't use Session|Token as part of the ir session cookie [10:37:32] s/the ir/their/ [10:37:43] so, in other words, mw shouldn't be affected? [10:37:49] it isn't affected [10:37:51] ok [10:37:55] that limits the impact [10:38:23] vgutierrez: the phab issue? [10:38:26] vgutierrez: if you can gather, assuming it was related to your test a timestamp of start and end? [10:38:36] jynus: it's on SAL [10:38:48] ok, I understand then start and end of test [10:39:08] 09:56 - 10:13 UTC [10:39:17] ori: yep.. totally unrelated, sorry :) [10:45:40] andre: thank you to have showed up and found that login history link in phab :-] [10:46:35] np :) [10:47:14] outage considered resolved, ok? more followup on ticket [11:31:51] afk for some time [12:05:43] back [15:06:40] hi, please read documented linked :-/ [15:06:48] *document [15:08:37] reading [15:10:25] glad it was resolved and identified. thanks, IC [15:10:39] you can call me jaime :-D [15:10:55] :-P [15:11:08] Mr. IC was my father [15:12:30] on onfire we agreed backlogging on status page was not needed, as the status page was intended for wikis status or things with a wide audience [15:12:35] thank you, Jaime [15:58:48] jclark-ctr is asking to replace a couple of failed disks in ms-be1071, I think it would be wise for him to do it, but I am about to log out, so I won't be able to be around: https://phabricator.wikimedia.org/T315437#8188981 [15:59:29] this could make a bit better the situation of swift hosts (re recent outage), but on the other side, can cause extra alerts, etc. [16:00:27] these were the very same disks that were discussed during the incident [16:01:24] if someone wants to monitor that and tell him to proceed, please do, otherwise I don't think is wise for me to say so and immediately bail out [16:03:19] Thanks @jynus [16:03:58] sorry to leave you hanging, hopefully someone can volunteer, if not, we will do it next week? [16:04:49] swift is jbod so from a pure technical perspective is safe to proceed, is the alerts that worry me [16:05:28] jclark-ctr: sadly the 2 owners are this week on vacations [16:19:14] No problem if it’s best left till Monday we can [19:13:22] robh: apergos: guess what, we are being told we can shut down Apple search bridge. that we still see this day [19:13:38] holy moley [19:14:43] https://phabricator.wikimedia.org/T316296 :) [19:15:19] never thought I would see the day tbh [19:15:36] uploaded patch to remove from ATS without hesitation (or we can first remove from DNS if that seems better than an error page) [19:15:45] haha, yea, that's why I mention it [19:16:51] https://w.wiki/5d3s [19:20:59] What's left is a pretty slim wrapper :P [19:21:53] sampled 128x ? [19:26:44] <_joe_> mutante: are you kidding me? that's amazing [19:27:41] :) [19:29:21] yeah someone should send some wikilove to mikhail [19:29:25] _joe_: first thought was to just flip it in https://gerrit.wikimedia.org/r/c/operations/puppet/+/826884 and use that to get enough +1s to confirm it.. wait a bit to see if we get complaints .. then later think about removing from k8s. Could also be the first k8s service that gets decom'ed [19:30:11] then I thought maybe remove it from DNS instead for the same effect. not sure.. there is still some minimal traffic, that is what the URL shortener link above is [19:35:29] <_joe_> mutante: your patch wouldn't have the effect you want I fear [19:36:03] <_joe_> mutante: I don't remember if we still have search.wikimedia.org configured on the appservers [19:36:13] <_joe_> if not, it works :) [19:36:35] oh, I see what you mean [19:38:19] <_joe_> I just don't remember if we properly cleaned it up [19:41:36] we still have it in httpbb tests in the repo but it's not a virtual host on apache (mwdebug1001), only namevhost searchcom.wikimedia.org, not search [19:45:33] [deploy1002:~] $ httpbb --hosts mwdebug1001.eqiad.wmnet /srv/deployment/httpbb-tests/apple-search/test_search.yaml [19:45:53] Body: expected to contain 'en:Cthulhu_Mythos', got '\n\n' [19:46:05] Cthulhu :) [20:19:54] mutante: wow. ref apple search [20:20:40] thus ends the worst legal agreement ever that happened to be made by someone back when i first started [20:20:40] robh: yea, remember it was an old thing back when we were at old SF office [20:20:43] ie: NOT ME [20:20:44] hehe [20:20:56] lol [20:25:15] Apparently it expired a while ago [20:27:38] like one of those 99 year leases, I dont think we knew it could