[09:46:35] a bit late, but reminder to update our standup notes (cc inflatador, pfischer) [12:40:43] gehel: done [12:45:14] pfischer: thanks ! [13:45:09] pfischer: minor note on the standup notes: having links to phab tasks does help me navigate them (and might help others as well). Don't worry too much about those (I'm still able to find them when I need to), but if you have the link at hand, a copy / paste is welcomed! [14:04:27] working on the update now [14:21:25] ryankemper if you have time once you get in, take a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/862369/ [14:26:42] Weekly update posted to https://app.asana.com/0/0/1203480989921452 [14:28:03] ACK, I finished the standup notes just now FWiW [14:29:35] inflatador: thanks! I'll see if I can update the update [14:33:16] those updates seem not entirely completed yet, so I'll use them as part of next week's update [15:19:39] gehel and others, looks like we're having a WDQS incident again based on operations [15:20:58] inflatador: scream if you need help ! [15:23:14] will do [15:24:44] inflatador:: if not all servers are restarted yet, could you get a few thread dumps for further analysis? https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Further_analysis [15:24:59] We might want to add that as an option in the restart cookbook... [15:25:43] roll-restart --collect-evidence :D [15:26:53] +1 [15:27:21] think it's too late, but will work on volans ' suggestion for the cookbook [15:27:38] oops, gehel 's suggestion that is ;) [15:27:41] that was gehel's :D I just suggested the name :D [15:28:26] without threaddumps, it is going to be difficult to investigate anything. Not the end of the world. Given we've had 2 such incidents in 2 weeks, sadly we'll probably have other occasions to gather evidence. [15:29:14] mpham: ^ another WDQS incident. Tangentially related to your usual concerns, but we'll need to think hard about how much time we want to invest in increasing stability during next quarter. [15:48:59] \o [15:51:59] restarting cloudelastic1006 psi, it's still complaining [16:14:11] gehel ebernhardson would it be appropriate to contact dcausse at this point? [16:14:28] for wdqs? If it's restarted and fine shouldn't be necessary [16:14:49] ebernhardson it's been restarted 3 times already and is still alerting [16:14:50] at the moment, there isn't much that David would be able to help with [16:15:13] we need to find the right traffic to block, it is unlikely that we'll be able to fix anything [16:15:29] ahh, hmm. Yea i still doubt david would do much different. I would also lean towards spying the queries coming in and figuring out what to block [16:15:37] which host? [16:16:29] heh, all of them... [16:17:19] so, you can port forward into a machine like so: ssh wdqs1016.eqiad.wmnet -L 9999:localhost:9999 then visit http://localhost:9999/bigdata/#query [16:17:23] it will tell you the running queries [16:17:27] ebernhardson yeah, was about to try checking the queries using the dashboards as described at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Identifying_the_user_agent [16:17:33] ah, even better [16:17:35] will try that now [16:18:19] can kill the query with the X button there, it may or may not actually stop the query though [16:18:47] that query looks awfully familiar ... [16:19:13] i removed it from the repo a few weeks ago as a test query: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/858407 [16:19:48] which query? Ive got the UI up [16:20:42] oh, i'm totally misreading this :P i was expected it to be listing running queries, its showing a history of queries run through the bigdata api :P [16:20:51] i'm sure this showed live queries before though ....sec [16:20:58] hmm OK [16:21:44] I see a lot of queries with VALUES ?occuptaions that seems a typo, not sure it can cause issues [16:22:14] the ?foobar are arbitrary labels, they only matter in that you have to spell them the same in the same query [16:22:30] ok, so red herring [16:25:24] ebernhardson what about under the status tab? Are those currently running queries or just a history? [16:25:28] i see lots of queries that seem to be asking something about imdb labels, could be a bot? [16:25:43] inflatador: the `status` tab in the bigdata dashboard i linked earlier should list all running queries [16:25:53] on :999/bigdata/ [16:25:57] :9999 [16:26:08] ebernhardson got it, then we're looking at the same thing [16:26:54] and "cancel" will stop the query? Not that it helps much to stop it on a single host [16:27:25] yes, but in the case of these imdb queries (no clue if they are the problem) the problem would be how many there are. If it was an over agressive bot stopping one query would just have the bot go to the next one [16:27:48] i should check a few more hosts than just one though...poking around [16:28:19] heh, 1013 wont even give me the frontend, 429 too many requests :P [16:28:30] few retries and it came up [16:29:10] i would lean towards these imdb ones being the issue and try and block them though...maybe we have enough info in the kafka query logs that come out, looking [16:29:58] ebernhardson more context happening in #mediawiki_security , let's continue the conversation there if that is OK [16:30:03] kk [18:41:34] WDQS has been stable for 30m, heading to lunch...back in ~45 [19:22:09] * ebernhardson puzzles over why my dev env reports have wikibase and wbcs, but doesn't generate labels mappings :( must need more config... [19:38:04] oh yea now its fixed...takes many seconds now instead of instantly creating the index :P [19:38:13] back [20:47:49] oh how fun, it turns out the mediawiki html output for api responses doesn't include false values. Have to use `format=json&formatversion=2`. It's basically doing what the old json formatverison=1 did :S [20:52:09] does anyone know where I can find the OKRs for our team? Office wiki says betterworks, but I don't see them there [20:55:25] hmm, I'm guessing this is it (from Asana) The Search Platform team will primarily focus on updating our search and query service platform infrastructure to handle scaling requirements. To further the organization’s Thriving Movement objective, the team will also focus on improving the search experience for readers in emerging communities. [20:57:16] sounds like a good guess :) I'm not sure either [21:04:42] I found a bunch of stuff on "Thriving Movement" in the office wiki, so I'm going with it ;)