[03:59:29] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10santhosh) I think we have a serious problem here. At https://phabricator... [07:29:34] 10serviceops, 10MW-on-K8s, 10Observability-Logging: Some apache access logs are invalid json - https://phabricator.wikimedia.org/T340935 (10Joe) The change should now be live, I'm tentatively re-closing this task as I can only find truncated messages that are unparsed now, not any due to bad encoding. [07:29:41] 10serviceops, 10MW-on-K8s, 10Observability-Logging: Some apache access logs are invalid json - https://phabricator.wikimedia.org/T340935 (10Joe) 05Open→03Resolved [08:38:24] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [09:05:25] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [09:08:51] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [09:13:26] 10serviceops, 10Data-Engineering, 10Event-Platform: Upgrade change propagation to nodejs18 - https://phabricator.wikimedia.org/T348950 (10elukey) [09:20:05] 10serviceops, 10MW-on-K8s, 10MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376 (10Joe) @Krinkle the instrumentation added doesn't distinguish between failing to get a lock bec... [09:25:27] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) >>! In T344982#9300282, @santhosh wrote: > CX specifically need t... [09:57:34] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10santhosh) > If you need access to pagebundles or the transform endpoints,... [09:59:14] <_joe_> heads up: I've just pushed a mass update of our docker images, as part of T344478 [09:59:27] <_joe_> this process will keep running every week on sunday now [10:33:35] hnowlan: o/ [10:33:53] how are the beta changeprop values applied? Do we run cp via docker in there, or similar? [10:34:47] elukey: it's quite manual https://wikitech.wikimedia.org/wiki/Changeprop#To_deployment-prep [10:35:11] if the config values aren't changing you don't need to worry about rolling the change out, but I don't want to get caught out by them getting out of sync [10:35:23] although i guess they are changing :) [10:37:13] ah wow TIL [10:38:57] it's not ideal but with no k8s it works surprisingly well [10:40:38] okok I'll update the patch [10:41:00] the only thing that I don't like is that one needs to remember to modify those, if we modify helmfile.d [10:41:09] maybe I can add a comment in helmfile's values [10:41:13] so we remember [10:44:42] sounds good [10:52:21] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) >>! In T344982#9300861, @santhosh wrote: >> If you need access to... [11:07:46] <_joe_> uhm [11:07:53] <_joe_> I see some failures to build images [11:08:09] <_joe_> elukey: docker-registry.discovery.wmnet/kserve-build fails to build [11:08:28] <_joe_> ottomata: docker-registry.discovery.wmnet/eventrouter fails to build [11:08:33] <_joe_> I'll open a task [11:10:30] _joe_ Tobias upgrade the docker images recently to a new version, but IIRC they were working [11:13:46] <_joe_> err sorry, jayme [11:13:52] <_joe_> for eventrouter [11:14:56] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) [11:15:08] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) p:05Triage→03High [11:15:21] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) a:03Joe [11:16:37] _joe_ are you filing a patch or should I ? [11:17:31] <_joe_> elukey: doing so [11:17:38] <3 [11:19:47] <_joe_> elukey: https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/971128 [11:21:17] already +1ed [11:47:02] hnowlan: hopefully fixed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/971113, lemme know when you have a moment :) [11:53:42] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) [11:53:48] thanks for fixing _joe_ [12:00:25] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) The error in the loki build is due to its dependency on the fact it depends on golang-1.13 which has been dismissed years ago. @colewhite do you think we can just remove the lok... [12:01:04] <_joe_> btullis: do we really need 3 versions of spark images? [12:01:27] <_joe_> they're extremely expensive to build and use a lot of space, can we maybe decommission at least 3.1? [12:19:36] Ben is out this week, back next [13:06:05] _joe_: maybe some context in and around https://phabricator.wikimedia.org/T344910 ? [13:12:01] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10santhosh) http://parsoid-external-ci-access.beta.wmflabs.org - Does this... [13:26:31] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [13:32:41] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [13:52:06] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [14:20:19] 10serviceops, 10Parsoid: Puppetize parsing-qa-02 server config - https://phabricator.wikimedia.org/T295907 (10MSantos) I believe we are blocked and need support from #serviceops, please let me know if I'm missing anything. [14:20:38] 10serviceops, 10Content-Transform-Team, 10Parsoid, 10Parsoid-Read-Views: upgrade nodejs on parsing-qa-02 - https://phabricator.wikimedia.org/T349941 (10MSantos) I believe we are blocked and need support from #serviceops, please let me know if I'm missing anything. [15:10:25] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) [15:47:45] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) >>! In T344982#9301593, @santhosh wrote: > http://parsoid-extern... [15:50:38] <_joe_> all of the cert-manager and the istio images are failing to build [15:53:27] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) [16:28:40] 10serviceops, 10MediaWiki-Documentation, 10Documentation, 10Patch-Needs-Improvement, 10User-Dereckson: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950 (10Dzahn) >>! In T109950#9175846, @Dereckson wrote: > Who are the ones responsible for this rev... [16:56:07] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10cscott) VE already has a transformation endpoint exposed that you can use... [17:01:13] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December): Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) [17:45:40] 10serviceops, 10MW-on-K8s, 10MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376 (10Krinkle) >>! In T349376#9300752, @Joe wrote: > @Krinkle the instrumentation added doesn't dis... [17:50:33] 10serviceops, 10MW-on-K8s, 10MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376 (10Krinkle) [17:50:55] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 4 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) [17:51:33] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 4 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) >>! In T349376#9271431, @Joe wrote: > EtcdConfig uses eventually APCUBagOfStuff, see https://... [17:52:03] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 3 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) [17:53:12] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 3 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) [17:53:46] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 3 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) [18:02:00] 10serviceops, 10MW-on-K8s, 10MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376 (10Krinkle) 05duplicate→03Open [18:13:24] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 3 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) [18:28:49] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10Developer Productivity, 10MediaWiki-Platform-Team (Radar): php-fpm logs from Kubernetes lack 'message' and 'normalized_message' - https://phabricator.wikimedia.org/T350430 (10Krinkle) [21:09:55] 10serviceops, 10Data Engineering and Event Platform Team, 10Data-Engineering, 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) > But also - why are we not retrying to enqueue jobs if it fails? We should pro... [21:10:50] 10serviceops, 10Data Engineering and Event Platform Team, 10Data-Engineering, 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) envoy proxy request timeout to eventgate-main is currently 61 seconds though, a... [21:20:35] 10serviceops, 10Data Engineering and Event Platform Team, 10Data-Engineering, 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) Also q: `per_try_timeout: "20s" is set on eventgate-main services_proxy, but I... [22:14:12] 10serviceops, 10MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376 (10Krinkle) Un-merging. I see now that the issues don't overlap completely. It also isn't specific to Kubernete... [23:36:35] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 3 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10Krinkle) >>! Task description: > ` > Uncaught ConfigException: Failed to load configuration from etcd:...