[13:04:02] subbu: when you're around, could use info on what (if any) retries you're aware of for parsoid submissions (parse or save), e.g. from restbase, envoy, varnish, or specific clients like VE. https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-01_partial_parsoid_outage [13:04:40] also, there was a 10% outage for 9 hours :) [13:39:53] huh ... is php-fpm restarted after train deploys? [13:41:12] "Impact: For 9 hours, 10% of submissions to Parsoid to parse or save wiki pages were failing on all wikis." ... that probably doesn't translate to the same user-facing impact because parse requests are retried by restbase ... and even if the parse request fails, on the next source edit of that page OR on a VE edit of the page, it will get reparsed. [13:41:51] And for VE to save a page, it POSTs to RESTBase to get the wt to save .. and once again I expect RESTBase would retry. [13:42:14] but, i don't know how many retries, and if it is uniform across all api endpoints for only for some endpoints [13:42:32] I expect if we did have 10% user-facing impact of VE saves, we would have heard from users. [13:43:31] and i don't know anymore about retries from envoy or varnish or if VE itself retries. [13:52:30] https://logstash.wikimedia.org/app/dashboards#/view/AW4Y6bumP44edBvO7lRc?_g=h@0e494b1&_a=h@8679c0e shows the parsoid-php fatals in the first chart. [13:53:38] Krinkle the 'parsoid' link on https://logstash.wikimedia.org/app/dashboards#/view/default should go to the above dashboard ^ without the query params (that link still goes to the now defunct parsoid-js one). [13:56:40] subbu: is that link meant to be the default view of Parsoid-PHP dash? (I get an error that my session is not your session and thus I don't see the modified query) [13:58:22] right ... https://logstash.wikimedia.org/app/dashboards#/view/AW4Y6bumP44edBvO7lRc doesn't open for you? [13:58:58] it does, shows me the same default view but without error message :) [13:59:14] I went to last 24h which I'm guessing is what you were doing [13:59:21] default is 15min [13:59:27] right. [13:59:29] if there were other changes or ad-hoc queries, I wouldn't see those [13:59:37] unless you share>short permalink [13:59:45] got it. [14:00:05] the local query modification is in localStorage I think so those query params don't work for anyone else. It's a bit of a weird system. [14:00:26] anyway, I didn't know restbase would re-try save attempts. [14:00:34] that's cool. [14:00:37] yes, i normally do that (share the short permalink) but i got distracted by sharing the link to fix the default link .. [14:00:49] restbase isn't retrying 'svae' attempts .. it is retrying a 'transform' request. [14:01:01] it probably also is the only instance in prod I'm aware of where user-facing request that involves MW backend db writes is re-tried. [14:01:09] VE issues a save of the (transformed html to) wt it gets from restbase. [14:01:41] all parsoid requests are format transforms. so, safe. [14:01:43] ah right, so it doesn't use the restbase proxy for saving the wt as well. [14:02:01] I believe restbase does have a save endpoint as well, but VE only calls RB/Parsoid for transform [14:02:06] which should be safe to retry indeed [14:02:20] makes sense [14:02:31] i haven't kept up with restbase .. and it ssave endpoint ... but yes, afaik, VE doesn't use that. [14:03:22] there's also MW-side edit stashing optimization, which is another reason for why VE's MW-API endpoint would want to get a hold of the wikitext so that it can feed it to that and store it under the user session. [14:03:46] there is also a 'review changes' in ve which needs it. [14:03:51] right