[07:08:54] ebernhardson: good question. We'll probably get mostly zero traffic until we make an announcement (the traffic on the current beta endpoint is already very low). So I don't think that's going to make a huge difference. [07:09:34] In principle, it make sense to have a clean go-live cutover. We can probably add an nginx flag to return 403 until we are ready [07:18:04] we can reuse Data reload flag for now, probably [07:33:51] gehel, mpham : I've added two subtickets to T280006 and modified the original description a bit [07:33:52] T280006: Set up the application authentication for WCQS on commons-query.wikimedia.org - https://phabricator.wikimedia.org/T280006 [07:34:30] point of the main one is to get ANY authentication to WCQS prod (it turned out not to be as easy as I originally thought, due to microsite deployment) [07:35:27] sub tickets - one for replacing the current token cache, with JWT that will be less of a maintainenance burden [07:35:45] second for exposing sparql endpoint through api.wikimedia.org, somewhere down the line [07:36:18] errand (out to battle the bureaucracy) [08:19:16] zpapierski: thanks ! [09:39:38] Lunch [09:45:38] dcausse, zpapierski are you around ? [09:45:46] I'm around and looking [09:45:47] For the wdqs overload ? [09:46:09] seems someone hammering wdqs and ending up reaching codfw [09:46:12] Thanks! I'm alone with Augustin for the next 15 minutes. Scream if you need me [09:47:16] That's the problem of not having throttling at the cluster level but only at server level [09:52:22] not sure how to identify the culprit [10:01:06] I'm back [10:01:22] dcausse: can I help in some way? [10:01:45] I'm looking in the wild for a query or a pattern [10:02:19] best I have atm in the nginx access logs [10:02:43] are they in logstash? [10:03:32] I don't know [10:04:08] I'll find out by myself [10:05:31] sigh... seeing user-agents like abot4bbw--obxzd :( [10:05:55] you think it's malicious? [10:06:36] it was yesterday [10:06:40] ebernhardson: zpapierski: I have updated the Jenkins job https://integration.wikimedia.org/ci/job/wikidata-query-gui-build/ [10:07:17] Erik changed some clean up rule to keep a `custom-config.json` file https://gerrit.wikimedia.org/r/c/integration/config/+/714633/2/jjb/wikidata.yaml [10:07:43] yep, thanks! [10:08:00] now let's hope we didn't break anything already working :) [10:10:35] we will see ;] [10:12:23] I think I have a query [10:13:43] great, what next? [10:14:21] well not sure actually [10:15:00] we had similar situation before, didn't we simply block the ip? [10:15:41] I mean I'm not sure it's the right query [10:15:50] ah, right [10:16:41] usage from a user-agent doing a query with a regex jumped from 1670 counts to 53790 today [10:18:19] sounds suspicious - was this user agent present for some time? [10:19:23] was here yesterday but not the day before, but this user-agent is quite generic [10:20:33] even more suspicious [11:09:35] I don't have other clues, gehel if I have an ip where should I put this? [11:10:25] dcausse: https://wikitech-static.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - there's a part about this [11:11:01] huh, it's only about ua actually [11:11:16] should have been caught by the ratelimiter on the ip then [11:11:27] but if it's a bot it's probably better to use UA ? [11:11:36] Probably varnish in the end, but we can do a quick hack on the nginx config [11:11:53] this mentiones nginx only, unfortunately [11:13:07] for changes in varnish config, I would ping -traffic [11:13:17] ema is probably around at this time of day [11:13:52] but let's start by hacking the nginx config manually [11:16:15] original IP should be in X-Client-IP HTTP header [11:18:17] so probably: [11:19:07] ```if ($http_x_client_ip ~ 'some ip') { [11:19:07] return 429; [11:19:07] }``` [11:19:35] following the example on https://github.com/wikimedia/puppet/commit/d683630d32855fbe6b167d4d215d87d5aca61366 [11:19:56] how about ips - the user agent used only one? [11:20:13] zpapierski: not sure I understand [11:20:34] blocking a range of IPs? [11:20:57] following an advice from the runbook - if the code isn't malicious (as in, intentionally) it probably means some bot running somewhere [11:21:24] bot running from cloud instances may change ips rather quickly [11:21:38] also, might share ips with innocent application [11:22:02] I'm just wondering if blocking ua instead of ip possibly makes more sense [11:22:19] if we have an identified UA, then yes, it makes a lot more sense [11:22:39] reading the backlog, I was under the impression the you identified an IP but no stable UA [11:23:14] dcausse: is there a stable UA? [11:23:27] ah, rate limit is on UA not ip? [11:23:49] maybe a pattern to UAs at least? [11:23:56] tbh I'm not sure that's it, it's single ip with a single UA that was not here yesterday and hammered the query service today with 216k requests so far [11:24:20] the ratelimiter should have dealt with that [11:24:48] the rate limiter is based on clock time. If those queries generate enough parallelism, this isn't taken into account [11:25:26] we can ban on the UA even-though it's quite generic [11:25:54] as long as we dont have a better option, we should try banning on the UA [11:27:46] * gehel has a meeting in a few, but can merge a puppet patch if someone is around to keep an eye on i [11:28:22] s/i$/it/ [11:29:27] there's a list of agents to ban in the deploy repo I can use that I guess [11:31:16] gehel, zpapierski: https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/717283 [11:32:46] Lgtm [11:37:13] Jena ARQ, it's definitely generic :/ [11:37:23] but so are the others I see here [11:37:55] OTOH, banning generic UAs makes sense, developers should use specific UAs for their applications [11:38:14] that was the intent behind that ban-list [11:43:16] some servers seem to be recovering [11:43:44] should I do a rolling restart of blazegraph in codfw? [11:45:34] forgive my ignorance, but shouldn't they recover by themselves? [11:46:20] scap is doing it apparently [11:46:22] they **should**, but we know of cases of deadlocks under load in blazegraph, which don't recover [11:46:32] Oh right, scap would do that [11:47:20] the courier just told me he will be with 770kg of out bathroom tiles in my house in 10 min, I need to leave, be back in ~1h [11:47:40] zpapierski: good luck! [11:54:48] meh not sure it worked I still 200 responses with this UA in nginx access logs... [11:56:51] we don't actually block those, but we have a single throttling bucket for them (if I'm reading that code correctly) [11:57:02] yes just saw that now [11:58:04] might make sense to still ban in nginx [12:00:38] meeting with Emmanuel. Scream if you need me [12:00:46] sure [12:00:53] I'll let the system run for a while [12:01:05] going to take a quick lunch [12:01:21] bon apetit! [12:45:53] does anyone knows how search works on wikitech-static? It was mentioned in T290130. It would make no sense to have it based on CirrusSearch, so I'm wondering... [12:45:54] T290130: Incident response tools operational readiness review - https://phabricator.wikimedia.org/T290130 [12:51:31] gehel: seems to be using the default db search [12:53:01] I never heard about this page until yesterday and no I heard about it three times - what is wikitech-static? [12:53:39] a "static" copy of wikitech, so that we have access to operational docs in case we fully loose access to our DC [12:53:55] ah, I see [12:54:03] makes sense to have something like that [12:54:11] it's hosted externally and shares no infrastructure with wikitech [13:08:35] sigh... codfw is hit again [13:18:42] complete ban then? [13:19:03] crap :/ [13:21:33] we could try, but it could also be a single query... [13:21:52] regexp one? [13:22:49] if we know what query it is [13:45:19] need to take a break - if I'm needed, ping me, I'll see a mention on my phone [13:46:18] dcausse: want to jump in a meet to exchange ideas? [13:46:24] sure [13:46:33] meet.google.com/ifn-rdjw-nrp [14:20:58] we do have a SystemOverloadFilter that is in place, but not active [14:21:13] seems that we invent a new mechanism each time we run into trouble :/ [14:26:32] :) [14:26:50] maybe we should just start by activating that one [14:28:58] we can try but not sure the load is a good indicator here [14:29:19] not sure either [14:35:00] 2189 threads in sun.misc.Unsafe.park [14:35:41] wouldn't that be just threads that have been returned to the pool, but not yet recycled? [14:39:15] yes these should be threads doing nothing either waiting for a job or something else [14:41:58] I'm stupid, the concurrency limit should be done in nginx [14:42:41] actually, not sure, we might get requests terminated by timeout at nginx level, but which still have something dangling on the blazegraph side [14:43:45] first version, without logging of long requests [14:44:15] yes nginx receiving a 502 might indicate leaked resources on the blazegraph side [14:58:17] anything I can help with? [15:00:38] zpapierski: review of https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/717436 ? [15:00:46] on it [15:02:02] thx [15:02:50] \o [15:04:16] gehel: I'm assuming that you left creation of the Duration and then getSeconds due to JIT? [15:04:53] don't assume! [15:05:05] honestly, this should be a config parameter. Or at least a constant [15:05:09] good catch! [15:05:22] yep, that's what I had in my comment [15:05:30] o/ [15:05:42] but performance wise it isn't super important - JIT will inline this immediately [15:07:05] corrected [15:08:14] gehel: I wonder if you should avoid this semaphore for "local" queries [15:08:42] local like the LVS checks? Or monitoring? [15:09:35] yes and the updater [15:09:48] gehel: done with the review [15:13:39] ok, after the latest patchset build should be ok (spotbugs complained about the same thing I did) [15:16:05] Semaphore, a class I haven't seen in some time :) [15:16:41] I need to do some more low level Java concurrency coding [15:16:53] frameworks like Flink make me forget about it [15:17:22] * gehel isn't sure he ever used a Semaphore in real code before! [15:19:16] I did, but rarely - CountdownLatch, otoh, was my very good friend for a Master thesis [15:20:12] zpapierski, dcausse, mpham, ryankemper: would you have time to jump in a meet for a situation update? [15:20:19] sure [15:20:23] https://meet.google.com/ifn-rdjw-nrp [15:44:16] dcausse: do we have actual deadlocks in Blazegraph? [15:44:48] zpapierski: deadlocks like ones detected by jstack? [15:44:54] yep? [15:45:11] or can do you suspect livelocks? [15:45:14] I found fews a long time ago but not in this instance [15:45:52] https://phabricator.wikimedia.org/T242453 [15:50:07] the Unsafe.park are not deadlocks [15:50:26] they look more like a thread pool that had to grow but hasn't shrinked back yet [15:50:31] yep, I know [15:50:58] but high numbers there may point to livelocks [15:51:08] might be [15:51:24] we do have resource contention between read and writes [15:51:29] (live locks are like a nemesis of mine from the old days) [15:51:39] users start to notice: https://phabricator.wikimedia.org/T290330 [15:51:46] * gehel also has some memories of livelocks [15:52:10] the resource contention are probably exacerbated by the fact that the current updater does fairly large batches of writes [15:53:48] sounds ok? https://docs.google.com/document/d/1OQzlgbeE_pBds7XrcAusnWdZTAk9HBohhQDJojY7TiU/edit?usp=sharing [15:55:32] sounds good [15:56:00] LGTM [15:56:13] do we have a ticket already on this issue? [15:56:18] we should link it to the email [15:56:32] I'll create one [15:57:28] gehel: link this one T290330 [15:57:28] T290330: 502 Bad Gateway on WDQS - https://phabricator.wikimedia.org/T290330 [15:57:35] will do [15:58:05] Oh, I'll close mine as duplicate [16:01:02] zpapierski: can you also add ops@ to that email? [16:01:16] it's a tad bit late, since I just sent it out [16:01:30] but I'll forward it there [16:01:33] what's that, btw? [16:02:07] ops@lists.wikimedia.org [16:02:24] I think I need to sign up there first, give me a sec [16:02:32] the list related to all things operational [16:02:45] it might need approval. I'll forward it right now. [16:02:55] ok, thx [16:07:24] wdqs2001 got back into the fold - somebody did something? [16:08:01] yes, blazegraph restart [16:09:13] no I wonder what restarted it [16:09:19] me [16:09:24] ok :) [16:09:31] I should log that [16:10:12] Missed a few hours of #wikimedia-operations logs since nickserv booted me overnight, but I gather that codfw wdqs is getting slammed? [16:10:23] ryankemper: yes [16:10:34] ack, catching up [16:11:15] ryankemper: I've completely hacked a periodic restart by manually creating /etc/cron.hourly/restart-blazegraph on wdqs200[12347] [16:11:37] in a way that puppet won't override it? [16:11:58] gehel: ack. is it more valuable for me to be trying to look into what's causing it, or should I work on getting an actual puppet patch for a systemd timer to restart every 1hr? [16:12:01] I don't think puppet will cleanup that dir, but I haven't tested yet, there is a puppet maintenance in progress [16:12:16] cool, we'll know soon enough [16:12:17] ryankemper: focus on the puppet patch first [16:12:38] ack, we have that systemd timer for the elasticsearch madvise stuff so I should be able to steal what I did for that and tweak it very slightly [16:12:41] and try to randomize it so that not all servers restart at the same time [16:12:48] * gehel needs to prepare dinner for the kids, away for a bit [16:13:08] got it [16:13:15] I won't be here on Monday, at least not for the whole day - I can come in after my presentation [16:13:46] not sure if systemd timers have an elegant way to add randomness but if not then I can have the timer fire every hour but the actual command it would run would be like "sleep between [0-600]s then restart" [16:13:50] I'd try to be here before it, but from experience I know that before any public appearance I'm a completely unfocused mess [16:14:26] ryankemper: cron definition allows for random values for e.g. minutes [16:15:28] minute => fqdn_rand(60), [16:15:33] for example [16:16:05] you'll find more complete examples in modules/query_service/manifests/crontasks.pp [16:17:14] there's RandomizedDelaySec [16:18:23] yeah that's what i'm looking at too dcausse [16:18:37] and if for whatever reason the systemd timer way doesn't play nice i'll do an actual cron per zpapierski's ex [16:27:31] I have meeting now, but I can jump in if required. I also should be around for at least another 1.5h (after that it might be difficult) [16:27:46] btw looks like bugreporter suggested depooling before restart: https://phabricator.wikimedia.org/T290330#7331288 [16:27:52] that will add complexity so we don't want to do that [16:28:48] (imo) [16:28:50] we thought about this, but basically agreed with what you said here [16:29:30] cool [16:29:54] yeah I think the impact of not depooling is pretty minor, we are gonna drop some in-flight requests but it should be a tiny % of the total if we're restarting every 60 mins [16:30:23] in other news, my current implementation will be restarting eqiad too, is it worth restricting to just codfw? [16:31:03] we're not really guaranteed that the issue won't crop on eqiad too, but ofc it does mean a small # of eqiad queries will be impacted [16:31:03] ryankemper: yes I think so [16:31:31] I agree [16:31:45] sounds reasonable [16:35:53] should have patch up in 2 mins [16:44:25] https://gerrit.wikimedia.org/r/c/operations/puppet/+/717494 [16:46:49] dcausse: zpapierski: any objection to me merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/717494 ? [16:47:09] +1 [16:47:22] cool, will merge and test it out [16:49:45] Bleh I can't stick RandomizedDelaySec where I did: [16:49:48] https://www.irccloud.com/pastebin/3ujIO55G/ [16:50:39] gonna see if I can fix it quickly, otherwise will just build the randomness into the shell script instead I guess [16:51:59] `start` can only be one of the settings in table 1 here: https://www.freedesktop.org/software/systemd/man/systemd.timer.html but it's not clear what syntax puppet will want for `RandomizedDelaySec` [16:54:44] Based off https://github.com/wikimedia/puppet/blob/d793d599e858e5d6f486b838017113e0cac80bd0/modules/systemd/manifests/timer.pp#L11-L16 ` it will probably be called `splay` [16:54:53] Based off https://github.com/wikimedia/puppet/blob/d793d599e858e5d6f486b838017113e0cac80bd0/modules/systemd/manifests/timer.pp#L11-L16 it will probably be called `splay`* [16:56:01] Reverting the initial patch while I hash this out [17:01:43] Gross, so `systemd::timer::job` is our (wmf) abstraction of a job implemented as a systemd timer. but it doesn't expose the part of timer we need for randomizeddelaysec [17:02:42] :/ [17:03:18] Probably should add that capability to `systemd::timer::job` in the future since we're replacing all crons...anyway, moving on [17:04:52] Choices are just do it the cron way or add a random sleep in the bash script [17:05:12] At this point latter is probably quicker so will try that out real quick [17:05:20] agreed [17:12:19] Okay trying out https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 [17:15:08] Testing on `wdqs2001`, timer created properly, and I can see it waiting to finish its sleep (fortunately I got lucky and it's sleeping for `106s` whereas expected value would be `300s`) [17:15:10] https://www.irccloud.com/pastebin/hrPfESY4/ [17:15:50] Currently with `sudo systemctl list-timers` it looks like this which looks "wrong" (the n/a implies it won't run again), but because it's still in the middle of the run of the timer that might be why [17:15:58] `n/a n/a Fri 2021-09-03 17:13:36 UTC 1min 40s ago wdqs-restart-hourly-w-random-delay.timer` [17:16:13] so if the `n/a` doesn't go away after the sleep & restart complete then I'll need to tweak the config [17:16:33] `Fri 2021-09-03 18:08:36 UTC 52min left Fri 2021-09-03 17:13:36 UTC 2min 38s ago wdqs-restart-hourly-w-random-delay.timer` okay perfect, it works, deploying to fleet [17:38:02] Stepping out for 5-10 mins to eat some quick food but we have the mitigation in place and working. I'll need to clean up g.ehel's manually created crons but besides that it sounds like https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/717436 is what to focus on next? [17:41:52] I +2ed, but it probably won't help [17:48:07] seems that blazegraph has already this kind of limits [17:49:33] which is set to 2000 threads (executorMaxThreads), emitting 503 when the number of active threads is past that [17:50:28] there are 2000+ threads created when the server hangs so it perhaps mean that this limit is too high [18:00:57] related T206189 [18:00:58] T206189: Set sensible thread limit to Blazegraph - https://phabricator.wikimedia.org/T206189 [18:03:08] need to drop off to take care of the kids. if. I'm needed, just ping me here [18:54:15] ryankemper: do you need any help in the clean up ? Are we good for the weekend ? [18:54:33] I'll keep an eye on things tomorrow at least. [18:58:30] gehel: Yeah I think we're good going into the weekend, I'm not getting any query failures on the random codfw hosts I've been tunneling to [18:58:42] I'll be around this weekend to keep an eye on things as well [19:00:01] gehel: Do you know if we track WDQS frontend response codes anywhere? Looks like https://grafana.wikimedia.org/d/000000522/wikidata-query-service-frontend?orgId=1 is out of date [19:00:26] It'd be good if there were a simple way to see if we start getting `502` again...maybe logstash / kibana? [19:01:47] We probably have that in the web request in hive, but I'm not clear on how to dig into it. ebernhardson might know better. [19:02:05] Okay [19:02:18] all webrequests are available in a log, a little slow to access but not horrible. There might be something more wdqs specific though [19:02:22] gehel: one last question, I noticed your cron restarted updater too but the fix I rolled out is just restarting blazegraph [19:02:30] s/wdqs-updater/updater [19:02:42] Was that just out of an abundance of caution or is there a reason I should be making sure to restart both? [19:03:25] The updater should be robust enough to recover, but if Blazegraph is down for too long, the updater will exit and not restart [19:03:57] ack [19:07:40] ryankemper: something like this to get status codes: https://superset.wikimedia.org/superset/sqllab/?savedQueryId=318 [19:08:25] those should be the status codes returned to users at the edge, if i remember right about how that log is collected [19:09:01] i guess thats not a simple way to see your answer though, it's hourly data and lagged [19:10:14] while not as easy as being able to see a full visualization, it's def good enough [19:10:18] ebernhardson: I get `(MySQLdb._exceptions.OperationalError) (2005, "Unknown MySQL server host 'analytics-slave.eqiad.wmnet' (-2)") (Background on this error at: http://sqlalche.me/e/13/e3q8)` when I try to run the query tho [19:10:46] ryankemper: hmm, maybe the link didn't maintain the database set? in top left it should say Database: (presto) presto_analytics_hive [19:11:14] ebernhardson: that's gotta be it. weirdly though not only is `database` not filled in but when I click the ui element there's no options to select [19:11:48] hmm, maybe it's something with ldap groups [19:12:13] these logs are PII so are gated by some groups, i suppose i thought ops would have that access anywaays [19:12:44] we definitely *should*, perhaps it's rare enough for ops people to go spelunking thru hive that it was never noticed [19:13:07] / the subset of ops ppl who have have probably been here for years and got perms at some point (just conjecture but it kinda makes sense) [19:13:57] ryankemper: as to the frontend logs being out of date, i just noticed the link you sent had 2018 selected. Chosing last 24 hours has some data, but doesn't seem to show this codfw problem [19:14:09] ebernhardson: https://superset.wikimedia.org/superset/profile/ebernhardson you're an `admin`, I'm not [19:14:10] https://grafana.wikimedia.org/d/000000522/wikidata-query-service-frontend?orgId=1&from=now-24h&to=now [19:14:35] oh haha how weird it defaults to 2018 [19:15:36] i suppose the only thing about codfw that stands out if codfw p50 hit 1s and stayed there, i wonder what that is though as 1s seems too low for the timeout, but it's a hard line [19:15:56] i guess 1.2s [19:17:28] are you just looking at a graphite metric directly (ie playing around in a temp grafana panel) [19:17:39] or is there a great of p50 somewhere [19:17:39] a graph* [19:17:52] ryankemper: bottom left of wikidata-query-service-frontend [19:17:55] err, bottom right [19:18:01] 'Varnish Latency' graph [19:18:38] i suppose i don't know what exactly that is reporting, was just the only metric i noticed on this dashboard that seems to have gone out of line in codfw today [19:19:38] very interesting [19:19:51] the 1.20s ceiling is more than just the last 24 hours,. but if you go back a week you'll see periods where it's below that, and periods where it's at the ceiling [19:21:00] looks like that graph comes from trafficserver based on the names of the metrics, so it likely includes all requests [19:21:29] I don't understand how most of the traffic layer works but maybe it has a really aggressive "timeout" for getting an initial response back? like if it doesn't get any packets at all it just drops it? [19:21:40] 1.20s seems both way too small and arbitrary but you're right that there's definitely a hard ceiling there [19:21:48] hmm, yes there often is a separate connection timeout and waiting for response timeout [19:22:01] basically i'm thinking somehow the fact that the backend blazegraph is completely nonresponsive is what is causing that [19:22:05] i'm not 100% sure on how the connect timeouts work, but in many cases thats how long the other side has to say 'i exist' [19:22:13] as opposed to a "just overloaded" scenario [19:22:17] yea that seems plausible [19:23:56] so maybe we have a 1.2s connection timeout and a 60s request timeout [19:23:56] not clear if thats a good number to alert on, seems plausible but i'd like to know more about the 1.2s :) [19:24:06] ok [19:24:07] yea [19:25:05] So this graph is actually quite useful to us, it looks like `p99` is always going to be 1.20 even with the hourly restarts, which is not too surprising, but we should expect to see p50 and probably p95 well below 1.20 [19:25:38] naturally if we see p50 at 1.20 then our service availability is really bad [19:27:04] Yeah this lines up with what I expect, if I look 6 hours back I see even p50 was at 1.20 thus why users were complaining [19:28:39] yea that sounds like a good route. I'm also a little sad about that p99, means at times we are probably dropping 1 in 100 requests [19:29:12] i wonder if retries at the trafficserver layer cover that up somehow, seems users would complain more [19:29:26] * ebernhardson then wonders if trafficserver does retries :P [19:31:13] > I'm also a little sad about that p99, means at times we are probably dropping 1 in 100 requests [19:31:15] yeah i had the same thought [19:31:25] and the same question about retries [19:34:18] And this is more longer-term but sometimes I wonder if these dead/livelock issues aren't as loadbased as we think (or they are correlated but will still happen after improving our throughput), in which case the streaming updater would still help us catch up lag faster but wouldn't help out with this problem (which imo is our biggest problem given how almost every wdqs production incident is some version of this dead/livelock problem) [19:39:16] it does seem possible. A subset of users are going to hate it but at some point I suspect the only way to reduce load will be to cut down the timeouts and prevent long running queries from eating up compute [20:50:08] hmm, with the puppet patch merged to change wcqs from insetup to wcqs::public i had expected them to come up, but ssh to them still gives the Password: prompt we get from brand new instances, once a normal role installs password auth gets disabled [20:51:21] hmm, actually no i'm totally forgeting and i guess we dont disabled password auth. So it only means i don't have ssh access there yet :) [22:32:07] ebernhardson: around? [22:32:17] Do you want me to merge you gui-deploy patch? [22:32:28] (or anything you want from wmde in general)