[00:30:25] 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Months of history missing from https://phabricator.wikimedia.org/source/phabricator-translations.git - https://phabricator.wikimedia.org/T309910 (10thcipriani) Note: I was pairing with @brennen on this—we didn't `--force` push, we ran: `git push... [01:43:59] Hey jeena: So the tests on I73905a446 are just the npm test for the node service, i.e. the standard service-template-node tests run via mocha. I can't quite do what I'd like with blubber and the available images under docker-registry, since there isn't a convenient node image that also runs mysql/mariadb, at least for now. [01:46:10] The util/Dockerfile.needed dockerfile was kind of a rough (but not really functional) sketch of a new image I'd like to push up to docker-registry. Though it likely needs a bit more polishing along the lines of the existing civicrm image (which installs default-mysql-server via apt). [02:38:25] 10Project-Admins: WIP request for new SDAW-SearchVue Project Tag - https://phabricator.wikimedia.org/T309934 (10Seddon) [07:59:11] 10Project-Admins: Create project tag for DSE-Kubernetes-Cluster (DSE-K8S) - https://phabricator.wikimedia.org/T309095 (10Aklapper) Hi! See my previous comment; I've added you to #trusted-contributors [08:04:28] (03CR) 10Nikerabbit: "I was waiting for Idb4073950ae50a0d3378520a0e9ec3173d7e3e6e to be merged. Okay to merge now." [integration/config] - 10https://gerrit.wikimedia.org/r/792134 (owner: 10Nikerabbit) [08:12:55] 10Project-Admins: Create project tag for DSE-Kubernetes-Cluster (DSE-K8S) - https://phabricator.wikimedia.org/T309095 (10JArguello-WMF) Thanks a million @aklapper! my apologies for not reading carefully your previous comment :) [08:17:08] (03PS1) 10Jaime Nuche: Release 4.9.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803867 [08:22:41] (03CR) 10Jaime Nuche: [C: 03+2] Release 4.9.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803867 (owner: 10Jaime Nuche) [08:26:46] (03Merged) 10jenkins-bot: Release 4.9.0-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803867 (owner: 10Jaime Nuche) [09:23:45] 10Phabricator, 10Release-Engineering-Team: Months of history missing from https://phabricator.wikimedia.org/source/phabricator-translations.git - https://phabricator.wikimedia.org/T309910 (10hashar) > ` > git push origin +HEAD:refs/heads/wmf/stable` > ^^^ > ` Indeed that removes the non fast-fo... [09:24:46] (03CR) 10Hashar: [C: 03+2] Add doc publish for Translate [integration/config] - 10https://gerrit.wikimedia.org/r/792134 (owner: 10Nikerabbit) [09:27:09] (03Merged) 10jenkins-bot: Add doc publish for Translate [integration/config] - 10https://gerrit.wikimedia.org/r/792134 (owner: 10Nikerabbit) [09:28:46] !log Reloaded Zuul for "Add doc publish for Translate" https://gerrit.wikimedia.org/r/792134 [09:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:48:16] (03CR) 10Hashar: [C: 03+2] "I have triggered CI manually against https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/792194 from contint2001 with:" [integration/config] - 10https://gerrit.wikimedia.org/r/792134 (owner: 10Nikerabbit) [11:16:23] (03PS1) 10Jaime Nuche: install-world: ensure repo tags are fetched before selecting version [tools/scap] - 10https://gerrit.wikimedia.org/r/803890 (https://phabricator.wikimedia.org/T307081) [11:19:08] (03PS2) 10Jaime Nuche: install-world: ensure repo tags are fetched before selecting version [tools/scap] - 10https://gerrit.wikimedia.org/r/803890 (https://phabricator.wikimedia.org/T307081) [11:24:17] (03CR) 10Jaime Nuche: "Tested on deploy1002" [tools/scap] - 10https://gerrit.wikimedia.org/r/803890 (https://phabricator.wikimedia.org/T307081) (owner: 10Jaime Nuche) [11:24:30] (03CR) 10Jaime Nuche: [C: 03+2] install-world: ensure repo tags are fetched before selecting version [tools/scap] - 10https://gerrit.wikimedia.org/r/803890 (https://phabricator.wikimedia.org/T307081) (owner: 10Jaime Nuche) [11:28:27] (03Merged) 10jenkins-bot: install-world: ensure repo tags are fetched before selecting version [tools/scap] - 10https://gerrit.wikimedia.org/r/803890 (https://phabricator.wikimedia.org/T307081) (owner: 10Jaime Nuche) [11:35:08] (03PS1) 10Jaime Nuche: Release 4.9.1-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803891 [11:41:28] (03CR) 10Jaime Nuche: [C: 03+2] Release 4.9.1-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803891 (owner: 10Jaime Nuche) [11:45:35] (03Merged) 10jenkins-bot: Release 4.9.1-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/803891 (owner: 10Jaime Nuche) [11:49:18] 10Project-Admins: WIP request for new SDAW-SearchVue Project Tag - https://phabricator.wikimedia.org/T309934 (10Seddon) This can now proceed :) [12:05:22] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Jelto) 05Resolved→03Open puppet runs on the test instance `gitlab-prod-1001` fail with ` Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional... [12:33:40] (03PS1) 10Jaime Nuche: install-world: update minimum selectable version for install [tools/scap] - 10https://gerrit.wikimedia.org/r/803899 [12:56:21] 10Project-Admins: WIP request for new SDAW-SearchVue Project Tag - https://phabricator.wikimedia.org/T309934 (10Aklapper) @Seddon: Feel free to update the task status, task title, and to add a code repo URL - thanks. [14:33:30] 10GitLab (Infrastructure): Document and test failover for GitLab and GitLab Replica - https://phabricator.wikimedia.org/T296713 (10Jelto) 05Open→03In progress p:05Triage→03Medium a:03Jelto We gathered some experience regarding failover when migrating GitLab to the new physical hosts in T307142. I used... [14:45:58] 10Beta-Cluster-Infrastructure, 10Abstract Wikipedia team: Alerting for function-* services on Beta - https://phabricator.wikimedia.org/T310184 (10ori) [14:59:00] (03CR) 10Ahmon Dancy: [C: 03+2] install-world: update minimum selectable version for install [tools/scap] - 10https://gerrit.wikimedia.org/r/803899 (owner: 10Jaime Nuche) [15:08:50] 10Phabricator: Due Date stamp doesn't show on a Phab task even though the field is filled - https://phabricator.wikimedia.org/T310188 (10MBinder_WMF) [15:09:08] (03Merged) 10jenkins-bot: install-world: update minimum selectable version for install [tools/scap] - 10https://gerrit.wikimedia.org/r/803899 (owner: 10Jaime Nuche) [15:10:40] I want to set up simple uptime alerting to #wikipedia-abstract-tech on irc for two services running on the Beta Cluster (deployment-prep). Is there existing alerting for anything in deployment-prep I can use as a model? [15:18:46] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) Does this only affect this instance or maybe all users who have a local puppetmaster in their VPS project? It seems like we haven't touched anything and it... [15:29:19] 10Release-Engineering-Team (Deployment Training Requests): Deployment training request for jnuche - https://phabricator.wikimedia.org/T310191 (10jnuche) [15:38:54] 10Phabricator, 10SRE, 10serviceops-radar: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644 (10Dzahn) 05Open→03Declined something between resolved and declined. please feel free to reopen though if you feel differently about it. [15:39:04] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Seen), 10Scap, 10SRE, 10serviceops: Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 (10dancy) Noting the following settings from the deployment-prep horizon project puppet config page: ` profile:... [15:54:52] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Seen), 10Scap, 10SRE, 10serviceops: Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 (10dancy) I'm going to change profile::mediawiki::php::restarts::ensure to true and see how things go. [15:55:49] brennen: kudos on catching the leading `+` in `+HEAD:refs/heads/stable` is an alias to allow non fast forward updates [15:56:10] and for the start of a Phabricator deployment runbook [15:56:11] :] [15:57:20] !log Set `profile::mediawiki::php::restarts::ensure: present` in deployment-prep hiera config for T237033 [15:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:57:22] T237033: Scap can't clear opcache on mw servers in Beta Cluster - https://phabricator.wikimedia.org/T237033 [16:01:27] 10Release-Engineering-Team (Doing), 10Scap, 10MediaWiki Train Development Environment: train-dev's Gerrit zuul plugin returns a different object than production Gerrit - https://phabricator.wikimedia.org/T308290 (10hashar) I have filed T310192 to have Gerrit upgraded in train-dev before actually upgrade Gerr... [16:39:20] (03CR) 10Hashar: [C: 04-1] "There is a gotcha with "$(which composer)" I would rather hardcode /usr/bin/composer." [integration/config] - 10https://gerrit.wikimedia.org/r/803525 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [17:00:54] (03CR) 10Hashar: [C: 03+2] zuul: Add Bluehill395 to the allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/802912 (owner: 10Zabe) [17:02:51] (03Merged) 10jenkins-bot: zuul: Add Bluehill395 to the allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/802912 (owner: 10Zabe) [17:14:43] !log Reloaded Zuul for I39342265033e82ae13998f53defe6612dc6819b4 [17:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:15:07] (03CR) 10Hashar: [C: 03+2] "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/802912 (owner: 10Zabe) [17:19:58] (03CR) 10Hashar: [C: 03+2] "I apologize for the delay, we had holidays here in France. I will deploy once it has merged then the frontend cache has a 1 hour TTL." [integration/docroot] - 10https://gerrit.wikimedia.org/r/802138 (owner: 10Lucas Werkmeister (WMDE)) [17:21:00] (03Merged) 10jenkins-bot: Update Wikibase section [integration/docroot] - 10https://gerrit.wikimedia.org/r/802138 (owner: 10Lucas Werkmeister (WMDE)) [17:25:24] do changes to CommonSettings-labs.php require scap in prod, even though they don't affect anything? [17:25:51] in other words, if I merge a change to that file, am I on the hook for syncing it in production? [17:26:10] You can get away without syncing, but you should pull the change down to the production deploy server to avoid complaints. [17:27:57] ack, thanks. Looks like James_F beat me to the punch, but good to know for next time. [17:32:28] ori: Ha, sorry. [17:33:54] James_F: I witnessed the weird scap php-rpm restart progress reporting. I'll see what I can do about it. [17:33:58] (03CR) 10Hashar: [C: 03+2] "Deployed and verified on https://doc.wikimedia.org/?cachekill which bypasses the frontend cache" [integration/docroot] - 10https://gerrit.wikimedia.org/r/802138 (owner: 10Lucas Werkmeister (WMDE)) [17:35:41] dancy: Thanks! Sorry for the lack of filing a Phab bug. [18:31:00] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T308068 (10dduvall) [18:42:30] 10Release-Engineering-Team (Priority Backlog 📥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.39.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T308068 (10dduvall) [18:54:49] 10Release-Engineering-Team, 10Data-Persistence (Consultation), 10Security-API-Service, 10Security-Team, and 3 others: Determine CI best practices for service which connects to MySQL - https://phabricator.wikimedia.org/T308789 (10jeena) Another option is to use helm test (if you are planning to make a helm... [18:59:59] A heads up that some code coverage jobs will fail (they are non voting) since https://gerrit.wikimedia.org/r/c/mediawiki/core/+/741970 merged [19:01:50] I started patches to fix this in https://gerrit.wikimedia.org/r/c/integration/config/+/803487 and https://gerrit.wikimedia.org/r/c/integration/config/+/803525 but probably won’t be able to work on them before tomorrow [19:01:57] cc hashar ^ [19:02:21] kostajh: noted ;) [19:04:39] 10Release-Engineering-Team, 10Data-Persistence (Consultation), 10Security-API-Service, 10Security-Team, and 3 others: Determine CI best practices for service which connects to MySQL - https://phabricator.wikimedia.org/T308789 (10sbassett) Hey @jeena - Thanks for the reply. So the image we're talking abo... [19:06:45] (03CR) 10Hashar: "When doing:" [integration/config] - 10https://gerrit.wikimedia.org/r/803487 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [19:10:46] 10Release-Engineering-Team, 10Data-Persistence (Consultation), 10Security-API-Service, 10Security-Team, and 3 others: Determine CI best practices for service which connects to MySQL - https://phabricator.wikimedia.org/T308789 (10jeena) I think the reason for helm test would be that you wouldn't have to cre... [19:20:24] hey dduvall , RhinosF1 just made us aware of https://phabricator.wikimedia.org/T310216 ... we are working on restoring the broken index but it will be Friday at the earliest. I heard this is holding up the train, are there any workarounds possible on your end? [19:20:51] ebernhardson has suggested turning off writes (see #wikimedia-search scroll for more context) [19:21:22] thanks, inflatador. we can ignore the errors but it is a lot of logspam and i'm a little worried about all wiki promotion tomorrow increasing that further [19:23:25] perhaps i can create a temporary filter in logstash [19:29:17] (03CR) 10Krinkle: jjb: Use composer phpunit:entrypoint (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/803525 (https://phabricator.wikimedia.org/T90875) (owner: 10Kosta Harlan) [19:30:47] dduvall: the categorylinks issue is imho blocking and would be nice to the wikinews community if we rolled back (if not already) while we work on fixing it. [19:32:17] * Krinkle sees this is already done [19:32:21] thx :) [19:34:02] dduvall, inflatador: shouldn't the search issue affect all deployed versions rather than just the current week? [19:34:21] I wouldn't expect it to only affect one version so promoting shouldn't cause a further spike [19:35:40] we have a 500 from a canary appserver and we are in the deployment window [19:35:55] I'm afraid I don't know enough about the deployment process to give a good answer to that [19:36:09] Project beta-scap-sync-world build #54655: 04FAILURE in 1 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/54655/ [19:37:22] I just wanted to raise that like a warning [19:37:28] in case the deployer is in the middle of it [19:37:47] and tried on the canary first [19:43:31] mutante: Can you +2 https://gerrit.wikimedia.org/r/c/operations/puppet/+/803908 ? Undoing a prior change you merged earlier. [19:44:08] dancy: yea! merged unseen because it was cloud-only. merging revert [19:44:34] Nod. It was safe to merge so thanks for doing that. It just didn't work out. I need to figure out one more bit [19:44:37] was thinking the easiest is to try and see. we should be able to compile those though [19:44:45] ack, sec [19:45:11] thx [19:45:34] 10Release-Engineering-Team, 10Data-Persistence (Consultation), 10Security-API-Service, 10Security-Team, and 3 others: Determine CI best practices for service which connects to MySQL - https://phabricator.wikimedia.org/T308789 (10sbassett) I think I'm just looking for whatever is the simplest solution to ge... [19:45:37] done, all you need is the sync from prod master to beta master [19:46:07] Project beta-scap-sync-world build #54656: 04STILL FAILING in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/54656/ [19:46:36] ^ This will be resolved once the revert is applied [19:50:34] re:search errors, those failures should all come from jobrunner instances. I would be very surprised if the errors writing to the cloudelastic index can happen from a canary (although, maybe separate argument for jobrunner canarys) [20:03:19] Project beta-scap-sync-world build #54657: 04STILL FAILING in 8 min 29 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/54657/ [20:08:24] Yippee, build fixed! [20:08:25] Project beta-scap-sync-world build #54658: 09FIXED in 1 min 28 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/54658/ [20:18:14] 10Deployments, 10Wikimedia-production-error: mw1415: Wikimedia\Rdbms\DBQueryError: Error 1054: Unknown column 'page_restrictions' in 'field list' (dbXXXX)Function: MediaWiki\Page\PageStore::getPageByNameViaLinkCacheQuery: SELECT page_id,page_namespace,page_title,page_... - https://phabricator.wikimedia.org/T310225 [20:18:51] 10Deployments, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10Krinkle) [20:20:59] 10Deployments, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10dancy) I looked at mw1415:/srv/mediawiki/wikiversions.json and all of its entries reference wmf.10. It's co... [20:21:32] 10Deployments, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10dancy) [20:22:42] 10Deployments, 10serviceops, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10Krinkle) https://sal.toolforge.org/production?p=0&q=mw1415&d= > 2022-05-09: > * dancy: well, at least it's not serving prod I think? [20:24:10] unless it was silently repooled [20:24:53] Unclear to me. I see a bunch of those errors continuing to arrive so something is accessing it. [20:25:23] under https://config-master.wikimedia.org/pybal/eqiad/ it's not in api-https, jobrunenr or appserver [20:26:03] not even pooled=false [20:26:04] strange [20:28:38] Krinkle: just got back from lunch. yes, rolled back group1. do you think it warrants a full rollback from group0 as well? i do see a handful of wikinews sites in group0 [20:28:57] i'm thinking might as well rollback group0 too [20:29:42] unrelatedly, i'm seeing a whole bunch of wmf.10 (?) errors in logspam-watch. very strange [20:29:52] https://phabricator.wikimedia.org/T310225 for the wmf.10 errors [20:29:59] and -sre [20:30:12] ah, thank you! [20:30:21] I'm stepping away for a while. Good luck! [20:30:30] ty! [21:00:50] thcipriani: *hugs* for making the call to not go ahead with the Phab change because you weren't confident. Great to see professional judgements like this made, and they're too often ignored. [21:15:46] Echoing James_F, deciding not to deploy until you're confident is hard but it's great to see people making those sort of calls when needed [21:38:13] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Security-Team, and 2 others: Figure out the future of (or replacements for) PipelineLib in a GitLab world - https://phabricator.wikimedia.org/T287211 (10dduvall) A lot of progress was made during Release Engineering's re... [21:41:39] (03PS1) 10C. Scott Ananian: Adding parsoid as a dependency is no longer required for CI [integration/config] - 10https://gerrit.wikimedia.org/r/803990 [21:46:27] 10Deployments, 10serviceops, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10Dzahn) mw1415 does not service 500s anymore. T307755#7990623 [21:50:51] 10Deployments, 10serviceops, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10Dzahn) What happened here is: The machine died on May 5th. Ticket was opened with dcops to... [21:51:10] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Security-Team, and 2 others: Figure out the future of (or replacements for) PipelineLib in a GitLab world - https://phabricator.wikimedia.org/T287211 (10dduvall) [21:55:53] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Security-Team, and 2 others: Figure out the future of (or replacements for) PipelineLib in a GitLab world - https://phabricator.wikimedia.org/T287211 (10dduvall) [22:25:58] 10Release-Engineering-Team, 10Data-Persistence (Consultation), 10Security-API-Service, 10Security-Team, and 3 others: Determine CI best practices for service which connects to MySQL - https://phabricator.wikimedia.org/T308789 (10dduvall) >>! In T308789#7990305, @sbassett wrote: > I think I'm just looking f... [22:29:08] 10Project-Admins: Request for new SDAW-SearchVue Project Tag - https://phabricator.wikimedia.org/T309934 (10Seddon) 05Stalled→03Open [22:40:06] 10Deployments, 10serviceops, 10Wikimedia-production-error: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions) - https://phabricator.wikimedia.org/T310225 (10dancy) 05Open→03Resolved a:03dancy Thanks for the summary @Dzahn . [22:49:24] 10GitLab (Project Migration), 10Release-Engineering-Team: Create new GitLab project group: - https://phabricator.wikimedia.org/T310238 (10Sabrecalyx) [22:54:24] 10Project-Admins: Request for new SDAW-SearchVue Project Tag - https://phabricator.wikimedia.org/T309934 (10Seddon) [23:16:50] 10GitLab (Project Migration), 10Release-Engineering-Team: Create new GitLab project group: - https://phabricator.wikimedia.org/T310238 (10Dzahn) @Sabrecalyx If this is a legit request, please replace with the actual group name requested and fill out the rationale section.