[00:20:15] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Soda) >>! In T337649#8900787, @Xover wrote: > @Samwilson @Soda @Tpt @Inductiveload PRP + OSD seems to be causing 2 thumb loads per... [07:11:52] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10KTT-Commons) Not sure how these information are useful to you, but Soda suggest that my troubles in T338100 may relate to the same... [08:12:15] FYI, I'll be moving the codfw URL downloaders to new bullseye replacement VMs in a bit (and then eqiad tomorrow) [08:22:24] <_joe_> moritzm: uhm, you mean the IPs will change? [08:22:45] <_joe_> we need to wait for us to deploy a change to all k8s deployments I think [08:22:55] <_joe_> to add the new IPs to the egress rules [08:23:01] <_joe_> jayme: ^^ [08:23:03] that's already done [08:23:15] <_joe_> and we even redeployed everything already? [08:23:34] <_joe_> (also jains is off, so I guess that leaves me) [08:23:37] yeah, I lost track of which patches/tasks were involved, but it's all configured and redeployed [08:23:45] <_joe_> ok [08:24:14] but happy to wait if you want a second check, there's no need to rush [08:24:39] <_joe_> just let me take a look very quickly [08:24:44] <_joe_> what's the new hostname? [08:25:54] urldownloader2003 and urldownloader2004, DNS will point to 2004 [08:28:22] <_joe_> ah we're using a globalnetworkpolicy now, great [08:28:28] <_joe_> yeah it should just work(TM) [08:35:25] thanks for doublechecking, will merge in ~10m [08:39:39] <_joe_> I would warn the oncall people anyways [08:40:27] 10serviceops: restbase1027.eqiad.wmnet down - https://phabricator.wikimedia.org/T338122 (10Clement_Goubert) [08:41:49] will do [08:51:08] 10serviceops: restbase1027.eqiad.wmnet down - https://phabricator.wikimedia.org/T338122 (10Clement_Goubert) 05Open→03Resolved Rebooted OK from console powercycle, all alerts cleared. [09:15:05] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Xover) >>! In T337649#8901152, @Soda wrote: >The dual image load is being caused by a high dpi screen and overall should only hit... [09:26:15] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Xover) The situation appears (subjectively) to have been worsening over the last few days. Over the last couple of hours I've bee... [09:35:40] jelto: good morning, we have a `git::clone` failure on the new releases1003 cause it tries to clone from Gerrit rather than Gitlab. May you puppet-merge Daniel fix at https://gerrit.wikimedia.org/r/c/operations/puppet/+/925033 ? :) [09:48:44] I'm going to switch back the URL downloaders to the old proxy for now, the Citoid requests don't seem to reach the new proxy yet, will check with Janis when he's back [09:52:26] hashar: I left a review. Also ccing eoghan as he worked mostly on releases servers [09:53:26] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Xover) Also worth noting: the testing I reported above paints a completely different picture of the severity compared to what I am... [09:56:51] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Joe) Yes, it's pretty clear there is a bug that is causing over time us to get to the point where djvus/tiffs/pdfs are unrenderabl... [10:17:38] jelto: thanks :) I went ahead and manually fix the glitch by doing a manual clone [10:18:18] and poked the relevant task with the explanation. Looks like `git::clone` needs to take care of changing the remote url when the repository name or the type (gerrit vs gitlab) is changed [10:18:24] anyway puppet passes now :] [10:28:43] release-tools on the old releases host still has a ownership issue. So you just run a manual clone on the new release host? [10:29:33] I'd favor to do the setup with puppet instead of running manual commands to be sure we cover everything in puppet [11:00:33] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10MatthewVernon) [I might be wrong, but I don't think in this particular case the swift and thumbor issues are related - in particul... [11:31:29] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JArguello-WMF) [11:31:39] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JArguello-WMF) 05Open→03Resolved [11:31:53] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JArguello-WMF) 05Open→03Resolved [15:26:20] So, we're at a point where we're ready to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/924596 , which causes: https://puppet-compiler.wmflabs.org/output/924596/41506/mw1419.eqiad.wmnet/index.html (for T334703 ) [15:27:31] the new endpoints are functional everywhere on the pybal side now, e.g.: [15:27:34] $ curl -I http://pybal-low-traffic.svc.eqiad.wmnet:9090/metrics [15:27:36] HTTP/1.1 200 OK [15:27:39] Date: Mon, 05 Jun 2023 15:27:23 GMT [15:27:41] Content-Length: 1271741 [15:27:44] Content-Type: text/plain; version=0.0.4; charset=utf-8 [15:27:46] Server: TwistedWeb/18.9.0 [15:28:04] -- [15:28:34] I didn't really want to push it without checking in here, but we've also got codfw LVS hardware replacements going on today (and more this week I assume), so it's timely to avoid issues, too. [15:30:30] but also, you're almost all in EU-side timezones, so I donno if anyone's even around to check up on it [15:30:45] worst case, we can revert, but it seems pretty sane to go now and has joe's +1 from last week. [15:46:05] 10serviceops, 10All-and-every-Wikisource, 10Thumbor: Thumbor fails to render thumbnails of djvu/tiff/pdf files quite often in eqiad - https://phabricator.wikimedia.org/T337649 (10Soda) >>! In T337649#8901633, @Xover wrote: >>>! In T337649#8901152, @Soda wrote: >>The dual image load is being caused by a high... [15:53:12] bblack: sorry we had our team meeting -- sounds good [15:53:19] bblack: we're good with it, +1'd. Just see with rzl to do a service restart afterwards to see if all works well [15:54:08] ack, thanks you two :) [15:56:38] bblack: yeah whenever you're done deploying, you can go ahead and `sudo restart-php7.4-fpm` on any appserver, and as long as that still works, we're happy [15:59:56] ran agent + tested your command on mw1419, seemed to work fine [16:00:03] gonna let the rest roll out naturally [16:02:35] FYI, this morning I also pushed a similar fix for a wikidata maintenance script that polls pybal to deduce x-dc maxlag [16:02:38] https://gerrit.wikimedia.org/r/c/operations/puppet/+/927200 [16:02:58] that one was pretty broken as soon as we started the lvs2009 decom, so we went with the Be Bold option to fix it back up [18:15:10] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10daniel) p:05Triage→03High [18:19:06] 10serviceops, 10RESTbase Sunsetting, 10API Platform (RESTbase Deprecation Roadmap), 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10daniel) p:05Triage→03Medium