[00:13:50] PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:07:49] RECOVERY - Check systemd state on doc1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:29:46] PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The following units failed: docker-system-prune-dangling.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:55:08] (03PS1) 10Krinkle: Emphasize "too long" warning label [releng/phatality] - 10https://gerrit.wikimedia.org/r/814009 [04:55:10] (03PS1) 10Krinkle: Switch kbnDocViewerTable -> osdDocViewerTable [releng/phatality] - 10https://gerrit.wikimedia.org/r/814010 [06:54:42] 10Phabricator (Upstream), 10Upstream: About [[Phabricator:phabricator-people-148aaf2e06c62283/en]]: extremely unsecure suggestion! - https://phabricator.wikimedia.org/T313023 (10Verdy_p) I did not know that Phorge.it ever existed, but it is developed as a recent fork of Wikimedia Phabricator, but this message... [07:19:06] 10Phabricator (Upstream), 10Upstream: About [[Phabricator:phabricator-people-148aaf2e06c62283/en]]: extremely unsecure suggestion! - https://phabricator.wikimedia.org/T313023 (10Aklapper) phorge.it is a fork of Phabricator (the general software, "upstream"). It is not a fork of Wikimedia's instance (installati... [10:21:34] (03CR) 10Hashar: tox: also exclude ruby files from ec lint checks (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/813893 (owner: 10Jbond) [10:23:10] (03PS2) 10Hashar: tox: also exclude ruby files from ec lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/813893 (owner: 10Jbond) [10:24:30] (03PS11) 10Hashar: WIP: add files for custom image for beaker builds [integration/config] - 10https://gerrit.wikimedia.org/r/812463 (https://phabricator.wikimedia.org/T253635) (owner: 10Jbond) [10:44:19] (03CR) 10Hashar: [C: 03+2] "And that fixed child change https://gerrit.wikimedia.org/r/c/integration/config/+/812463 ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/813893 (owner: 10Jbond) [10:46:33] (03CR) 10Hashar: [C: 03+2] "I have created the jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/813931 (https://phabricator.wikimedia.org/T313075) (owner: 10Jforrester) [10:46:54] (03Merged) 10jenkins-bot: tox: also exclude ruby files from ec lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/813893 (owner: 10Jbond) [10:46:58] (03CR) 10Hashar: [C: 03+2] Zuul: Provide node16 generic jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813932 (https://phabricator.wikimedia.org/T313075) (owner: 10Jforrester) [10:47:32] (03CR) 10Hashar: [C: 03+1] "Please roll whenever ready!" [integration/config] - 10https://gerrit.wikimedia.org/r/813933 (owner: 10Jforrester) [10:48:31] (03Merged) 10jenkins-bot: jjb: Provide basic node16 jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813931 (https://phabricator.wikimedia.org/T313075) (owner: 10Jforrester) [10:49:06] (03Merged) 10jenkins-bot: Zuul: Provide node16 generic jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813932 (https://phabricator.wikimedia.org/T313075) (owner: 10Jforrester) [10:49:42] !log Reloaded Zuul for https://gerrit.wikimedia.org/r/813932 [10:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:50:14] (03CR) 10Hashar: [C: 03+2] zuul: drop in-wikimedia-production template from CongressLookup [integration/config] - 10https://gerrit.wikimedia.org/r/813967 (https://phabricator.wikimedia.org/T312894) (owner: 10Zabe) [10:50:40] (03CR) 10Jbond: [C: 04-1] "gonna give this a self -1. for this to be useful we really need to build it frequently (i.e. daily) whic isn't currently possible" [integration/config] - 10https://gerrit.wikimedia.org/r/812463 (https://phabricator.wikimedia.org/T253635) (owner: 10Jbond) [10:51:33] (03CR) 10Jbond: tox: also exclude ruby files from ec lint checks (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/813893 (owner: 10Jbond) [10:52:11] (03Merged) 10jenkins-bot: zuul: drop in-wikimedia-production template from CongressLookup [integration/config] - 10https://gerrit.wikimedia.org/r/813967 (https://phabricator.wikimedia.org/T312894) (owner: 10Zabe) [10:54:39] (03CR) 10Hashar: [C: 03+2] "This time I tested it with:" [integration/docroot] - 10https://gerrit.wikimedia.org/r/812953 (owner: 10Abijeet Patro) [10:55:12] !log Reloaded Zuul for https://gerrit.wikimedia.org/r/813967 [10:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:55:26] (03Merged) 10jenkins-bot: Add banana-i18n library [integration/docroot] - 10https://gerrit.wikimedia.org/r/812953 (owner: 10Abijeet Patro) [11:15:34] (03PS7) 10Hashar: dockerfiles: Install pcov in base PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [11:20:31] (03PS1) 10Hashar: dockerfiles: cascade for installing pcov in PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) [11:20:48] (03CR) 10Hashar: "Not sure why this one stayed idle so long. Since we now have pcov for our php 7.4 component, the image builds properly." [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [11:22:08] (03CR) 10Daimona Eaytoy: dockerfiles: Install pcov in base PHP images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [11:28:28] hashar: thanks for taking a look at that patch! I'm really looking forward to having pcov everywhere, although I assume it won't truly be "everywhere" until we have PHP72 jobs around [11:43:34] Daimona: hi! sorry it took so long [11:43:48] looks like the coverage jobs are using php7.3 now [11:45:09] I mean for mediawiki we have quibble-buster-php73-coverage [11:45:59] which already has pcov [11:54:26] (03PS2) 10Hashar: dockerfiles: cascade for installing pcov in PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) [11:57:19] Yeah, I'm more worried about libraries that run PHPUnit via composer. [11:58:34] At least those that still support PHP 7.2 [11:58:43] But hopefully we'll stop supporting that dinosaur soon. [11:59:25] (03CR) 10Hashar: dockerfiles: Install pcov in base PHP images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [11:59:54] Daimona: there is a memory issue which would affect our php7.4 image using pcov 1.0.6 https://github.com/krakjoe/pcov/issues/67 [12:00:04] +1 on having found the changelog hehe [12:00:17] so I am guessing we can build the image as is and start experimenting [12:00:34] + file a task for SRE to upgrade php7.4-cov and once that is done we can rebuild our php7.4 images [12:00:39] then claim `success` [12:01:03] Oh wow, 14 GB sounds like a terrible leak [12:01:11] Yeah, makes sense to me! [12:01:25] may you +1 the change? and I will be build the images :] [12:03:12] Daimona: and can we ignore having a php7.2-cov package? [12:03:24] as I understand it we will fully migrate to php7.4 this quarter [12:04:11] (03CR) 10Daimona Eaytoy: [C: 03+1] dockerfiles: Install pcov in base PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [12:04:15] (03CR) 10Daimona Eaytoy: [C: 03+1] dockerfiles: cascade for installing pcov in PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) (owner: 10Hashar) [12:04:54] (03CR) 10Hashar: [C: 03+2] "There is no pcov for our php7.2 ( T243847#7689119 ) then we are supposed to drop php7.2 entirely this quarter in favor of php7.4." [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [12:05:49] Well, having pcov in the php72 stuff would be great because there'd be no special cases. But since we're close to dropping support for 72, I guess it's not worth the time to package pcov72 for wikimedia apt [12:06:17] yeap ;] [12:06:18] Just to clarify, once we have pcov "everywhere" I really really want to get https://phabricator.wikimedia.org/T269489 done [12:06:19] "close" [12:06:56] maybe someone from SRE can build the package for php7.2 it would probably "just" work [12:07:04] Right, I forgot the {{cn}} [12:07:08] I don't know who is in charge of building them thugh [12:07:15] I think moritz does most of them [12:07:37] (03Merged) 10jenkins-bot: dockerfiles: Install pcov in base PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/694621 (https://phabricator.wikimedia.org/T280170) (owner: 10Daimona Eaytoy) [12:08:29] Yeah and FTR there's also a task about it: https://phabricator.wikimedia.org/T243847. Maybe it'd be good to do //something// about it, be it decline the PHP72 part or get it done at some point [12:09:14] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Add pcov to composer images - https://phabricator.wikimedia.org/T280170 (10hashar) [12:09:16] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10SRE, and 2 others: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar) [12:11:49] 10Continuous-Integration-Config: composer-package-php72-docker runs with xdebug enabled - https://phabricator.wikimedia.org/T269489 (10Daimona) [12:11:51] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Add pcov to composer images - https://phabricator.wikimedia.org/T280170 (10Daimona) [12:13:53] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10SRE, and 2 others: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar) `pcov` got build and uploaded to `compo... [12:14:06] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10SRE, and 2 others: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10hashar) a:05Legoktm→03None [12:14:10] I will drop an email to the ops list [12:16:13] 10Continuous-Integration-Config: composer-package-phpXX-docker jobs run with xdebug enabled - https://phabricator.wikimedia.org/T269489 (10Daimona) [12:19:01] (03PS1) 10Daimona Eaytoy: [WIP] Enable pcov instead of xdebug in composer-package-phpXX [integration/config] - 10https://gerrit.wikimedia.org/r/814154 (https://phabricator.wikimedia.org/T269489) [12:20:28] !log rebuilding `php??` images for pcov https://gerrit.wikimedia.org/r/c/integration/config/+/694621 # T280170 [12:20:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:20:30] T280170: Add pcov to composer images - https://phabricator.wikimedia.org/T280170 [12:21:24] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? [12:21:26] wtf [12:22:38] i dont even understand how it is possible [12:23:51] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 (10hashar) [12:23:59] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 (10hashar) p:05Triage→03Unbreak! [12:25:44] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 (10hashar) [12:26:19] docker has been disabled bah [12:30:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 (10hashar) The host has been rebooted on July 12th: ` reboot system boot 4.19.0-20-amd64 Tue Jul 12 15:50 still running ` But somehow the docker service di... [12:30:27] !log Starting docker on contint2001.wikimedia.org # T313119 [12:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:30:29] T313119: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 [12:40:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 (10hashar) p:05Unbreak!→03Medium a:03hashar docker.service requires docker.socket and both are marked as not enabled (ie disabled). On contint2001 I start... [13:04:28] (03CR) 10Jforrester: "C-1, this should have been moved out of the Wikimedia production section too." [integration/config] - 10https://gerrit.wikimedia.org/r/813967 (https://phabricator.wikimedia.org/T312894) (owner: 10Zabe) [13:05:19] hashar: Thanks for the node16 merges/deploys, and the fixing of docker. I'm not actually building the node16 images. [13:05:28] !log Docker: Building node16 images for CI for T313075, this time actually. [13:05:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:05:30] T313075: Create WMF CI images and jobs for Node.js 16 - https://phabricator.wikimedia.org/T313075 [13:05:34] James_F: ;))) [13:05:48] James_F: and we keep the same npm version shared with node 14/12 which I guess is fine [13:05:52] or what is expectd [13:06:05] I could not remember what was our decision last time regarding which version of npm should be used [13:06:06] We said we were going to try to switch to native npm. [13:06:15] I think we settled on having the same everywhere [13:06:19] ah [13:06:20] ;D [13:06:24] But I tried that and it didn't work (npm was "installed" but just instantly returned). [13:06:31] bummer :-\ [13:06:33] So I gave up and used the old npm version for now. :-) [13:06:34] Yeah. [13:06:47] anyway we got node16 which is great [13:06:50] hashar: https://gerrit.wikimedia.org/r/c/integration/config/+/813930/2/dockerfiles/node16/Dockerfile.template#21 [13:07:21] isn't there a SRE provided one for node14 nowadays ? [13:07:43] There's a production image for node14, yes. [13:07:48] But not 16 or 18. [13:07:53] pulled and there is one for 16 as well apparently [13:08:10] operations/docker-images/production-images/images/nodejs16/Dockerfile.template [13:08:17] Oh, right, there's 16 but not 18, that's right. [13:08:25] build by _joe_ in May [13:08:28] Yup. [13:08:48] so maybe we can consider moving the CI image from the upstream npm toward the SRE provided base images? [13:20:30] Project beta-update-databases-eqiad build #60026: 04FAILURE in 18 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60026/ [13:22:02] ... "Invalid MediaWiki configuration parameter "GroupInheritsPermissions": Array value found, but an array is required" [13:22:51] https://github.com/wikimedia/mediawiki/commit/cf39a40f164799ffed328a5b8d7a42823441f507 [13:24:35] yeah, found that too, thanks [13:25:16] duesen: ^ any ideas why we're seeing the above errors on beta? looks like a case mismatch somewhere? [13:25:23] i pinged them in -core [13:28:00] taavi: oh, that error message is stupid, it confuses json types and php types. it means "this array should not have string keys". [13:28:15] Let me check what's going on there... [13:28:28] duesen: that's not going to be very useful for 3rd parties [13:29:13] RhinosF1: unfortunately, that message comes from the json schema validator. we don't control it. [13:29:22] ah [13:29:34] it has been reported upstream iirc [13:29:46] Hm, the schema is actually wrong for GroupInheritsPermissions. [13:29:51] I'll have a fix up in a minute. [13:33:09] RhinosF1, taavi: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/814169 [13:33:16] * taavi looks [13:34:15] duesen: ty, what about the other 4 [13:34:43] is it worth running the test against beta/prod wikis [13:35:13] duesen: is this documented anywhere? in other words, how am I supposed to know what's the difference between 'list' and 'map'? [13:35:17] 7 even [13:35:17] RhinosF1: yea... for that i'd need a way to run it without running the updater, it'S currently glued together. [13:35:37] taavi: it's documented at the head of MainConfigSchema [13:35:55] RhinosF1: I can't see any error message. Which other four? [13:36:02] I only see the link to the diff [13:36:09] https://www.irccloud.com/pastebin/mITVZdl7/ [13:36:18] ahh. thanks! [13:36:20] duesen: ^ [13:36:34] looks like James_F already got it [13:36:52] ty James_F [13:38:11] duesen: wouldn't this check be useful in config CI? [13:38:27] RhinosF1: we do check it in CI, but CI doesn't have prod's config [13:38:44] Prod config's CI has prod's config, though. [13:38:56] taavi: quick fix is adding --skip-config-validation to the updater [13:38:57] But it doesn't have MW's code. [13:39:13] James_F: heh [13:39:19] true, pulling mediawiki would make config CI take ages [13:39:31] Yes, please let's not slow down deploys. [13:39:36] duesen: is adding that suggested? [13:39:38] James_F: so, do you have a fix for each of them? I think AllowRequiringEmailForResets shouldn't be a real deprecation. [13:40:00] I don't. [13:40:49] that error sounds nonsense, it should probably say is unstable rather than deprecated because it then says it's a new feature. [13:41:31] I think the CSP ones are wrong in the schema [13:41:38] they are structures with known keys. [13:42:15] duesen: Also https://gerrit.wikimedia.org/r/c/mediawiki/core/+/814169 is failing tests. [13:42:36] SettingsTest::testSchemaCompleteness [13:42:59] ok, give me a few more minutes. [13:43:48] (Docker is still building images on contint2001.) [13:43:53] James_F: I'll fix the schema issues, can you fix the config? AccountCreationThrottle is probably using the legacy format. IncludeLegacyJavaScript probably shouldn't be set. [13:44:15] duesen: Dealing with Docker issues, sorry. :-( [13:51:35] I am off to take care of kids, might come back later tonight [13:53:27] taavi: i updated the patch. should pass tests now. it should fix most of the issues, but not all. [13:53:35] the remaining ones are actual problems with the live config. [13:57:58] (03PS1) 10Jforrester: Zuul: Move non-production code from production zone [integration/config] - 10https://gerrit.wikimedia.org/r/814174 [13:58:59] 10Continuous-Integration-Infrastructure: Create WMF CI images and jobs for Node.js 16 - https://phabricator.wikimedia.org/T313075 (10Jdforrester-WMF) 05Open→03Resolved [14:02:44] (03CR) 10Jforrester: [C: 03+2] Zuul: Move non-production code from production zone [integration/config] - 10https://gerrit.wikimedia.org/r/814174 (owner: 10Jforrester) [14:03:20] (03PS3) 10Jforrester: Zuul: [mediawiki/tools/wikilambda-cli] Switch to node16 jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813933 [14:03:24] (03CR) 10Jforrester: [C: 03+2] Zuul: [mediawiki/tools/wikilambda-cli] Switch to node16 jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813933 (owner: 10Jforrester) [14:05:03] (03Merged) 10jenkins-bot: Zuul: Move non-production code from production zone [integration/config] - 10https://gerrit.wikimedia.org/r/814174 (owner: 10Jforrester) [14:05:37] (03Merged) 10jenkins-bot: Zuul: [mediawiki/tools/wikilambda-cli] Switch to node16 jobs [integration/config] - 10https://gerrit.wikimedia.org/r/813933 (owner: 10Jforrester) [14:05:53] !log Zuul: [mediawiki/tools/wikilambda-cli] Switch to node16 jobs [14:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:16:56] taavi: i ended up changing the schema so everythign we have in production now passes as non-deprecated. [14:17:12] I converted some deprecation to just warnings in the docs [14:17:27] Basically, if we still use it, it's not deprecated, by definition... [14:17:56] I also made a config patch, but it's not strictly needed: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/814169 [14:18:20] wrong link, sorry: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/814176 [14:18:30] RhinosF1: --^1 [14:20:01] Project beta-update-databases-eqiad build #60027: 04STILL FAILING in 1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60027/ [14:20:10] wmf-insecte: we know [14:20:10] RhinosF1 you may not issue bot commands in this chat! [14:20:52] duesen: thanks: unfortunately it's a friday today so the config patch needs to wait for next week [14:20:53] duesen: Make $wgAccountCreationThrottle must be an array sounds weird, maybe "Make $wgAccountCreationThrottle an array." [14:20:59] but otherwise +1 [14:21:26] taavi: is broken beta CI not an exception? [14:21:32] RhinosF1, taavi: I have to go afk soon. If my patch for MainConfigSchema doesn't fix it, add --skip-config-validation to the updater, or revert the original patch. [14:21:59] if not essential then it should wait [14:22:10] duesen: ok, will try and keep an eye [14:22:17] the config change isn't essential [14:22:26] no idea though how to edit what the job does [14:22:31] the issue only affects the updater, so no danger to prod [14:22:50] me neither... [14:25:41] https://github.com/wikimedia/puppet/blob/1364483a8622bf336af186065d985f821fc3f59e/modules/beta/files/wmf-beta-update-databases.py#L71 [14:36:30] taavi: can you trigger updates now merged or best to just wait 30 minutes [14:37:29] Project beta-update-databases-eqiad build #60028: 04STILL FAILING in 1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60028/ [14:40:38] TheresNoTime: you'll need code-update first [14:41:10] hm [14:42:49] 10Continuous-Integration-Infrastructure: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10TheresNoTime) [14:42:58] TheresNoTime: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ [14:42:59] (I know its known ^) [14:43:38] Project beta-update-databases-eqiad build #60029: 04STILL FAILING in 9.9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60029/ [14:44:08] TheresNoTime: okay, now new errors [14:44:31] TheresNoTime: UploadStashScalerBaseUrl is deprecated: since 1.36 Use thumbProxyUrl in [14:44:32] 15:43:37 $wgLocalFileRepo [14:44:44] I can send puppet patch to just skip until monday [14:44:47] cc taavi duesen [14:46:44] 10Continuous-Integration-Infrastructure: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10TheresNoTime) Post `beta-code-update-eqiad` run: https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/60029/conso... [14:47:39] TheresNoTime: https://gerrit.wikimedia.org/r/c/operations/puppet/+/814134 [14:51:33] RhinosF1: I'm not entirely sure `--skip-config-validation` is the best idea (just in case there is a legitimate config error?), but I honestly don't know enough to comment further ^^ [14:52:21] TheresNoTime: i mean it runs fine now and has for years [14:52:44] i don't think it'll blow up over the weekend until someone can properly check all wikis [14:53:59] mm I guess its just going to affect the beta cluster even if it *did* go wonky [14:56:57] well I've done my https://bash.toolforge.org/quip/AU7VVae36snAnmqnK_xL :D [14:57:25] lol [14:57:31] TheresNoTime: now to find someone insane enough to +2 it [14:57:42] hey Lucas_WMDE [14:58:07] * Lucas_WMDE does not have puppet +2 rights [14:58:29] (phew /j) [14:58:38] this but unironically :P [14:58:49] I don’t mind not having the power to break everything ;) [15:00:05] breaking things on Fridays is praxis though :3 [15:01:44] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-SettingsBuilder, 10Patch-For-Review, 10ci-test-error: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10RhinosF1) p:05Triage→03High [15:01:58] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-SettingsBuilder, 10Patch-For-Review, 10ci-test-error: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10RhinosF1) [15:02:09] wouldn’t breaking things on Monday be more praxis [15:02:24] bking is looking [15:02:35] let people enjoy the depths of wikipedia during the weekend, then disrupt exploitative work during the week [15:02:46] * Lucas_WMDE does not actually intend to break things on any day of the week [15:03:19] sure sure [15:06:31] TheresNoTime: see -sre [15:06:41] you can magically make beta go green after [15:11:21] TheresNoTime: if you run puppet on deployment-deploy-03 and the rerun, it should work [15:11:52] ack [15:14:00] Project beta-update-databases-eqiad build #60030: 04STILL FAILING in 0.58 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60030/ [15:14:23] TheresNoTime: what [15:14:27] how did it get worse [15:15:09] `/bin/sh: 2: --wiki=aawiki: not found` o.o [15:15:28] oh wait [15:15:43] TheresNoTime: it's been stupid because it went line too long isn't it [15:15:50] multiline treated as separate, yeah [15:16:11] TheresNoTime: what's the best way to do this then [15:16:50] i know [15:17:36] """ at the end of line 71 and beginning of line 72? [15:18:45] TheresNoTime: https://gerrit.wikimedia.org/r/c/operations/puppet/+/814135/2/modules/beta/files/wmf-beta-update-databases.py should work [15:18:53] Lucas_WMDE: does ^ work [15:18:58] abusing the same assumption [15:19:10] probably, yeah [15:19:49] Lucas_WMDE, TheresNoTime: can I get a +1 first [15:20:02] Project beta-update-databases-eqiad build #60031: 04STILL FAILING in 1.7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60031/ [15:21:03] hm I always thought you did `\` on multilines in Python too o.o [15:21:42] \ is when you don’t want the newline to appear [15:21:49] but this one will actually be a newline in the string content [15:21:52] that’s why the `;` can be removed [15:22:05] ah! [15:22:19] (though imho the `;` might as well be kept 🤷) [15:22:57] (I just appreciate the opportunity to bash on Python tbh) [15:23:04] RhinosF1: James_F: we can pull and checkout mediawiki/core in a few seconds by cloning from the local mirror typically mounted read-only at /srv/git. Not sure what was the use case mentioned above [15:23:41] (*this* wouldn't have happened with PHP :> PHP would have failed *properly* and dropped the tables) [15:23:42] TheresNoTime: still better than https://yaml-multiline.info/ tho [15:23:46] (03CR) 10Hashar: [C: 03+2] dockerfiles: cascade for installing pcov in PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) (owner: 10Hashar) [15:24:01] TheresNoTime: try now [15:24:29] ack [15:25:21] hashar: my idea was to have a script run to validate this new config stuff once we fix the million errors, likely to take too long anyway though doing it via mediawiki [15:26:26] (03Merged) 10jenkins-bot: dockerfiles: cascade for installing pcov in PHP images [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) (owner: 10Hashar) [15:27:09] Daimona: I am building the rest of the pcov related images [15:27:27] then Iam not sure I wanna update the jobs on a friday evening ;D [15:28:03] RhinosF1: `sudo run-puppet-agent` isn't picking up those changes, and confirmed that they're not live by `more /usr/local/bin/wmf-beta-update-databases.py` :/ [15:28:25] is it not seeing a change? [15:28:54] TheresNoTime: is there anything cloud side to update puppet [15:29:00] maybe on the puppetmaster [15:30:01] for deployment-prep instances, puppet is using a local puppet master which fetch the changes from operations/puppet every X minutes [15:30:28] ah! [15:30:45] deployment-puppetmaster04.deployment-prep.eqiad1.wikimedia.cloud [15:31:05] I just forgot about that delay ^^ [15:31:29] repo is in /var/lib/git/operations/puppet and is at 6816c8bcee8 (origin/production, origin/HEAD) updating site.pp for cloudweb servers, setup incorrectly for private vlan [15:31:35] + the ten or so cherry picks on top of that [15:32:18] hashar: we're next commit after [15:32:19] and the timer ran so it got updated! [15:32:25] TheresNoTime: ^ [15:32:45] ack [15:33:10] cool that picked up it, yup : [15:33:12] :) [15:33:17] systemctl list-timers puppet-git-sync-upstream.timer [15:33:17] NEXT LEFT LAST PASSED UNIT ACTIVATES [15:33:17] Fri 2022-07-15 15:41:53 UTC 8min left Fri 2022-07-15 15:31:40 UTC 1min 27s ago puppet-git-sync-upstream.timer puppet-git-sync-upstream.service [15:33:28] TheresNoTime: ok, let me know when it works [15:33:47] RhinosF1: https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/60032/console running [15:33:49] hashar: must be 10 minutes then [15:34:07] TheresNoTime: that's a good sign [15:34:43] normally takes 10 minutes [15:35:32] pfft is this all SRE does smh [15:43:15] Yippee, build fixed! [15:43:15] Project beta-update-databases-eqiad build #60032: 09FIXED in 9 min 44 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/60032/ [15:44:14] woo [15:44:49] Yey! [15:44:54] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-SettingsBuilder, 10ci-test-error: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10TheresNoTime) @RhinosF1's patches have [[ https://integr... [15:45:17] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-SettingsBuilder, 10ci-test-error: beta-update-databases-eqiad failing due to invalid MediaWiki configuration parameters - https://phabricator.wikimedia.org/T313128 (10RhinosF1) A hack is in place to stop CI failing. @danie... [15:45:36] TheresNoTime: we both said same thing same time [15:46:04] !log contint2001: `docker-system-prune-dangling.service` it failed overnight cause Docker was not running. That should clear Icinga state # T313119 [15:46:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:46:07] T313119: Docker is not running on contint2001 - https://phabricator.wikimedia.org/T313119 [15:46:09] at least it was said :p lest a "temporary fix" stay in place for years [15:47:05] TheresNoTime: don't you love them [15:47:25] no :> [15:47:43] https://bash.toolforge.org/quip/AU7VTzhg6snAnmqnK_pc [15:47:51] (obligatory) [15:47:54] 10/10 [15:48:28] for the historical perspective, our previous bug tracker (bugzilla) had a feature to show up at the top of the page random quotes. I think the feature was named "quips" [15:48:33] RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:48:54] they got conserved to bash.toolforge.org ;) [15:50:24] and some of them are extremely good [15:51:13] turns out people are saving some of my non sense writings [15:52:07] https://bash.toolforge.org/quip/AU7VVmnh6snAnmqnK_yr that one is probably my favorite when it comes to welcome new comers [15:52:45] hahaha [15:53:22] hashar: that's ace [15:53:28] Bash is lovely [15:53:32] maybe I can get bd808 to add a similar system to show up the best phabricator comments [15:53:38] there are a few gems [15:54:00] I once had a coach ride 2.5 hours each way [15:54:05] I read a lot of bash [15:55:03] Bash has some gems [15:59:11] (03CR) 10Hashar: [C: 03+2] "Successfully published image docker-registry.discovery.wmnet/releng/composer-test-php81:0.0.1-s3" [integration/config] - 10https://gerrit.wikimedia.org/r/814124 (https://phabricator.wikimedia.org/T280170) (owner: 10Hashar) [15:59:34] !log Built pcov php docker images T280170 [15:59:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:59:37] T280170: Add pcov to composer images - https://phabricator.wikimedia.org/T280170 [16:22:48] hashar: something that did mooeypoo's dramatic phabricator readings automatically would be ideal. You should nerd snipe her with that. :) [16:27:13] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), and 2 others: Upgrade Jenkins Gearman plugin from a forked repo - https://phabricator.wikimedia.org/T271683 (10hashar) @Tambura605 sorry I... [16:42:14] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), and 2 others: Upgrade Jenkins Gearman plugin from a forked repo - https://phabricator.wikimedia.org/T271683 (10hashar) @Tambura605 questi... [16:53:21] bd808: dramatic phabricator readings? [16:54:34] hashar: the all-hands talks Moriel has given where she reads comments from Phabricator (and sometimes other places) with very exaggerated emphasis. [16:54:44] OHHH [16:55:02] guess I was either asleep due to jetlag or missed that one somehow :-\ [16:57:16] I think they are usually in the talent show content [16:57:48] bd808: any public videos? [16:57:55] unlikely [16:58:15] RhinosF1: not that I know of [16:58:34] video available - please don't share outside WMF without permission. [16:58:36] sorry :-\ [17:02:34] maybe we can get her to do some during the wikimania hackathon... [17:02:37] bd808: :( [17:02:39] Maybe [17:02:51] I gonna try and turn up for hackathon [17:04:14] Not sure how much peace and quiet I'll have for this one [17:06:09] Hackathon was really good in May. Will try and get myself to an in person event one day. [17:10:53] London meet up won't be too far when I move. Don't think I'll make September though. [19:28:35] 10Continuous-Integration-Config, 10PostgreSQL, 10SQLite: Make postgres/sqlite CI jobs voting in gate-submit for wmf-deployed extensions with abstract schema files - https://phabricator.wikimedia.org/T313138 (10Umherirrender) [20:29:42] RhinosF1: moriel reading this comment made me laugh far too much: https://phabricator.wikimedia.org/T195906#4240768 cc thcipriani [20:30:05] (tyler probably knows which one without clicking) [20:38:57] :D [20:39:12] tripleblock [20:42:17] greg-g: we'll have to try at hackathon to get a public copy [20:42:24] Hopefully I make it [23:24:52] PROBLEM - Check systemd state on doc1002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc2001.codfw.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state