[00:01:09] dancy: so regarding /srv/mediawiki on deploy1002 which is blocking completion of T253547, what shall I/we do? Do you think we're in a state where we can remove deploy1002 from dsh and then turn it into a symlink and see where that leaves us? [00:01:10] T253547: Command line profiling not working on production - https://phabricator.wikimedia.org/T253547 [00:08:52] 10Phabricator, 10Phatality, 10Patch-For-Review: Phatality gives "Error: invalid date" - https://phabricator.wikimedia.org/T310937 (10colewhite) 05Openβ†’03Resolved a:03colewhite Unfortunately I am unable to test with the linked log because it has been removed per our data retention policy. We handle tim... [00:29:18] cwhite: nice work :) [00:30:10] Thanks! :) [03:42:54] (03PS1) 10Zoranzoki21: Revert "Archive the UnblockMe extension" [integration/config] - 10https://gerrit.wikimedia.org/r/889271 (https://phabricator.wikimedia.org/T323247) [03:43:01] (03PS2) 10Zoranzoki21: Revert "Archive the UnblockMe extension" [integration/config] - 10https://gerrit.wikimedia.org/r/889271 (https://phabricator.wikimedia.org/T323247) [08:40:45] 10Gerrit, 10serviceops-collab, 10Patch-For-Review: Issues with Gerrit test instance - https://phabricator.wikimedia.org/T329444 (10Ameisenigel) There is no need for a hurry. I just wanted to try out Gerrit without messing around at the real Gerrit instance. Yes, I know that we also have GitLab, but I have no... [08:57:58] 10Continuous-Integration-Config, 10Wikidata, 10Wikidata Dev Team, 10Wikidata Query Builder, and 4 others: [SW] Move build scripts from CI to the repository - https://phabricator.wikimedia.org/T328543 (10ItamarWMDE) [09:34:28] GitLab needs a short maintenance break in around 30 minutes [09:45:43] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10binbot) Google my friend told me that RPC error may be connected to firewall, so I temporarily suspended my firewall. That did not help either. [10:13:51] done [10:19:25] (03PS1) 10Matthias Mullie: Add WikimediaMessages as dependency of SearchVue [integration/config] - 10https://gerrit.wikimedia.org/r/889499 [10:20:16] 10Continuous-Integration-Config, 10Wikidata, 10Wikidata Dev Team, 10Wikidata Query Builder, and 4 others: Move build scripts from CI to the repository - https://phabricator.wikimedia.org/T328543 (10ItamarWMDE) [10:24:12] 10Phabricator: 504 Gateway Time-out while executing some search queries - https://phabricator.wikimedia.org/T328865 (10I) [10:26:49] 10Phabricator: 504 Gateway Time-out while executing some search queries - https://phabricator.wikimedia.org/T328865 (10I) [10:29:07] 10Phabricator: Maximum execution time of 30 seconds exceeded /srv/deployment/phabricator/deployment-cache/revs/f68dc24350bc897955d57eeac681db872b0b9e61/phabricator/src/applications/calendar/parser/data - https://phabricator.wikimedia.org/T328118 (10I) [10:51:40] 10Phabricator: Calendar search query: Maximum execution time of 30 seconds exceeded - https://phabricator.wikimedia.org/T328118 (10Aklapper) [11:46:54] 10Gerrit: Code changes can no longer be saved due to 403 error - https://phabricator.wikimedia.org/T329726 (10I) [12:01:48] Hi releng o/ I've just added windows for the codfw row B switch upgrade as well as the upgrade of the codfw wikikube cluster to https://wikitech.wikimedia.org/wiki/Deployments#Tuesday,_February_21 [12:02:49] There is a train that day wich falls into the wikikube cluster upgrade window. I would assume that the cluster being offline would lead to errors [12:03:56] ^^ thcipriani, hashar, dduvall [12:05:50] I'm not 100% sure about that but would it be an option to run the train in the secondary timeslot that day? [12:06:52] hello! Is it possible for a service in the deployment pipeline to use a second container for tests (running integration tests against a dummy instance of a database for example)? [12:15:58] 10GitLab: The developer account named I can't log in to gitlab.wikimedia.org because its username is too short - https://phabricator.wikimedia.org/T329728 (10I) [12:20:20] 10GitLab: The developer account named I can't log in to gitlab.wikimedia.org because its username is too short - https://phabricator.wikimedia.org/T329728 (10I) p:05Triageβ†’03Unbreak! We haven't found this problem before, because our user names are long enough, and I happen to be a special case, it stopped wo... [12:21:04] 10GitLab: The developer account named I can't log in to gitlab.wikimedia.org because its username is too short - https://phabricator.wikimedia.org/T329728 (10taavi) p:05Unbreak!β†’03Triage [12:22:25] 10Gerrit: Code changes can no longer be saved due to 403 error - https://phabricator.wikimedia.org/T329726 (10Ammarpad) The error means you do not have access to do that. Why do you want edit it? [12:23:19] 10Gerrit: Code changes can no longer be saved due to 403 error - https://phabricator.wikimedia.org/T329726 (10I) >>! In T329726#8618174, @Ammarpad wrote: > The error means you do not have access to do that. Why do you want edit it? I found some minor errors in the script, but I couldn't find anywhere else to ed... [12:31:23] 10Gerrit: Code changes can no longer be saved due to 403 error - https://phabricator.wikimedia.org/T329726 (10Ammarpad) 05Openβ†’03Invalid You can add comment (if that's necessary) on the patch here https://gerrit.wikimedia.org/r/c/phabricator/extensions/+/236417 [12:55:27] (03CR) 10Simone Cuomo: [C: 03+1] "LGTM, unfortunately I do not have permission to merge this." [integration/config] - 10https://gerrit.wikimedia.org/r/889499 (owner: 10Matthias Mullie) [13:07:32] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Disallow direct access to the docker.io registry from gitlab runners - https://phabricator.wikimedia.org/T329679 (10Addshore) In practice I have never used `gitlab's registry proxy feature.`, and it appears that you may need t... [13:17:48] 10Release-Engineering-Team, 10Observability-Logging: Run logstash canary checks via logs-api.svc - https://phabricator.wikimedia.org/T329735 (10fgiunchedi) [13:20:04] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Jgiannelos) From the CI on restbase that is using beta cluster I am getting this error for mathoid: ` { "type": "internal_http_error", "detail": "connect ECONNREFUSED 185.15.56.41:443", "interna... [13:26:49] 10GitLab: The developer account named I can't log in to gitlab.wikimedia.org because its username is too short - https://phabricator.wikimedia.org/T329728 (10Aklapper) @I: Please do not set any task priorities, per https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities . Thanks. [13:38:28] 10GitLab: Cannot log into gitlab.wikimedia.org with LDAP username which has only one character - https://phabricator.wikimedia.org/T329728 (10Aklapper) [13:45:52] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10Dalba) Does this error also occur if you clone mirror repos ( https://github.com/wikimedia/pywikibot.git and https://github.com/wikimedia/pywikibot-i18n.git )? [14:03:53] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10binbot) >>! In T329452#8618432, @Dalba wrote: > Does this error also occur if you clone mirror repos ( https://github.com/wikimedia/pywikibot.git and https://github.com/wikimedia/pywikibot-i18n.gi... [14:15:37] 10GitLab (Infrastructure), 10serviceops-collab, 10Patch-For-Review: Align and refactor GitLab restore scripts - https://phabricator.wikimedia.org/T326315 (10Jelto) a:03Jelto [14:26:35] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Vgutierrez) according to the DNS records associated with 185.15.56.41, mathoid.beta.math.wmflabs.org seems to be handled by math19.math.wmflabs.org, I don't have access to that WMCS project, somebody else can take a l... [14:36:31] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Score, 10SRE-swift-storage: "FileBackendError: Iterator page I/O error" on a page on Beta Cluster - https://phabricator.wikimedia.org/T329744 (10matmarex) [14:47:43] 10Beta-Cluster-Infrastructure: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) [14:48:07] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) [14:54:25] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10taavi) possibly related: {T329653} [15:01:17] 10Phabricator: Mark ReleaseTaggerBot Phabricator account as bot in database - https://phabricator.wikimedia.org/T329748 (10Aklapper) p:05Triageβ†’03Low [15:08:13] 10Phabricator, 10Community-Tech, 10Product-Analytics: Track most-subscribed-to/most-tagged/high-priority tasks on Phabricator - https://phabricator.wikimedia.org/T329749 (10Lectrician1) [15:08:39] 10Phabricator, 10Community-Tech, 10Product-Analytics: Track most-subscribed-to/most-tokened/high-priority tasks on Phabricator - https://phabricator.wikimedia.org/T329749 (10Lectrician1) [15:09:07] (03PS1) 10Subramanya Sastry: Use single colons for nth-last-child psuedo-class [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/889555 [15:16:07] (03CR) 10Arlolra: [C: 03+2] Use single colons for nth-last-child psuedo-class [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/889555 (owner: 10Subramanya Sastry) [15:16:45] (03Merged) 10jenkins-bot: Use single colons for nth-last-child psuedo-class [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/889555 (owner: 10Subramanya Sastry) [15:26:10] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Physikerwelt) [15:26:12] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Physikerwelt) 05Openβ†’03Resolved a:03Physikerwelt sorry! >>! In T329747#8618722, @taavi wrote: > possibly related: {T329653} exactly. [15:33:22] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) Thanks @Physikerwelt @taavi for sorting things out! Service now looks like its running but i am getting a different error related to SSL: ` { "type": "intern... [15:33:46] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) 05Resolvedβ†’03Open [15:33:48] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Jgiannelos) [15:34:00] (03PS2) 10Zfilipin: zuul: [mediawiki/extensions/CheckUser] Enable Sonar Codehealth Checks [integration/config] - 10https://gerrit.wikimedia.org/r/889182 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:34:05] (03PS2) 10Zfilipin: zuul: [mediawiki/extensions/ContentTranslation] Enable Sonar Codehealth Checks [integration/config] - 10https://gerrit.wikimedia.org/r/889201 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:34:23] (03PS2) 10Zfilipin: zuul: [mediawiki/extensions/DiscussionTools] Enable Sonar Codehealth Checks [integration/config] - 10https://gerrit.wikimedia.org/r/889187 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:34:45] (03PS3) 10Zfilipin: zuul: [mediawiki/extensions/DismissableSiteNotice] Enable Sonar Codehealth [integration/config] - 10https://gerrit.wikimedia.org/r/889197 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:40:01] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) Ok tests are passing now. Thanks again! [15:40:08] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Jgiannelos) [15:40:10] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Jgiannelos) 05Openβ†’03Resolved [15:41:23] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Physikerwelt) [15:41:37] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Mathoid on beta cluster is down - https://phabricator.wikimedia.org/T329747 (10Physikerwelt) 05Resolvedβ†’03Open Sorry, somehow the SSL certificates expired as well:-/ There are new ones now [15:46:51] (03CR) 10Zfilipin: [C: 03+2] zuul: [mediawiki/extensions/DismissableSiteNotice] Enable Sonar Codehealth [integration/config] - 10https://gerrit.wikimedia.org/r/889197 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:48:24] (03Merged) 10jenkins-bot: zuul: [mediawiki/extensions/DismissableSiteNotice] Enable Sonar Codehealth [integration/config] - 10https://gerrit.wikimedia.org/r/889197 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [15:50:22] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Jgiannelos) I am still getting this error both on tests and when i try to login: `[Y@z-FeLrRRvlf4qIMcOt4AAAAFc] /w/index.php?returnto=Main+Page&title=Special:UserLogin FileBackendError: Iterator page I/O error. Back... [15:53:39] 10Continuous-Integration-Config, 10MediaWiki-Configuration: diffConfig no longer detecs any changes in operations/mediawiki-config.git - https://phabricator.wikimedia.org/T329518 (10Krinkle) a:03Krinkle [15:56:55] 10Continuous-Integration-Config, 10Performance-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: diffConfig no longer detecs any changes in operations/mediawiki-config.git - https://phabricator.wikimedia.org/T329518 (10Krinkle) [16:00:14] (03CR) 10Zfilipin: [C: 03+2] "Sorry, I've merged this thinking I'll be able to deploy it. Unfortunately, looks like my machine is not set up to be able to deploy zuul c" [integration/config] - 10https://gerrit.wikimedia.org/r/889197 (https://phabricator.wikimedia.org/T321837) (owner: 10Pwangai) [16:02:27] !log restart releases jenkins for updates [16:02:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:03:13] !log restart ci jenkins for updates [16:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:13:54] 10GitLab (Auth & Access): Cannot log into gitlab.wikimedia.org with LDAP username which has only one character - https://phabricator.wikimedia.org/T329728 (10brennen) [16:14:49] Yippee, build fixed! [16:14:49] Project beta-code-update-eqiad build #431032: 09FIXED in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/431032/ [16:16:33] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Security-Team, 10SecTeam-Processed, 10Security, 10Vuln-Misconfiguration: Debian security update for git silently broke mediawiki-i18n-check-docker - https://phabricator.wikimedia.org/T329266 (10sbassett) 05Openβ†’03Resolved [16:17:04] 10GitLab (Auth & Access), 10Release-Engineering-Team, 10Upstream: Cannot log into gitlab.wikimedia.org with LDAP username which has only one character - https://phabricator.wikimedia.org/T329728 (10brennen) 05Openβ†’03Stalled I agree that this is an unfortunate limitation. It looks like an upstream issue:... [16:18:21] Yippee, build fixed! [16:18:21] Project mediawiki-core-doxygen-docker build #41064: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/41064/ [16:19:20] 10GitLab (Infrastructure), 10serviceops-collab: Investigate incremental backups for GitLab - https://phabricator.wikimedia.org/T324506 (10brennen) [16:19:30] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Make a tool to convert .pipeline/config.yaml to .gitlab-ci.yaml - https://phabricator.wikimedia.org/T327332 (10brennen) [16:19:35] 10GitLab (Auth & Access): Updating Wikimedia Developer Account email does not propagate to Gitlab - https://phabricator.wikimedia.org/T323880 (10brennen) [16:33:56] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team, 10SecTeam-Processed, and 2 others: Jenkins plugins security advisory - 2023-02-15 - https://phabricator.wikimedia.org/T329755 (10sbassett) [16:34:02] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team, 10SecTeam-Processed, and 2 others: Jenkins plugins security advisory - 2023-02-15 - https://phabricator.wikimedia.org/T329755 (10sbassett) p:05Triageβ†’03Low [16:39:07] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Mitigate thundering herd on GitLab runners - https://phabricator.wikimedia.org/T327416 (10dduvall) Summary of our current solution: 1. We've installed Istio's service mesh for the purpose of managing traffic to buildkitd and... [16:39:12] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10Jdforrester-WMF) >>! In T329592#8618947, @Jgiannelos wrote: > I am still getting this error both on tests and when i try to login: > > `[Y@z-FeLrRRvlf4qIMcOt4AAAAFc] /w/index.php?returnto=Main+Page&title=Special:User... [16:47:36] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): buildkitd: Require use of the blubber frontend when running on trusted runners. - https://phabricator.wikimedia.org/T329220 (10brennen) [16:48:04] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Increase poll_timeout on kubernetes gitlab runners - https://phabricator.wikimedia.org/T329196 (10brennen) [16:48:12] 10GitLab (Infrastructure), 10serviceops-collab: automatically check for new gitlab releases and send notifications - https://phabricator.wikimedia.org/T323932 (10brennen) [16:54:05] 10GitLab (Infrastructure), 10Release-Engineering-Team, 10MediaWiki-extensions-Gadgets, 10Security-Team, 10Security: Allow Javascript files from Wikimedia GitLab to be loaded as scripts in Wikimedia wikis - https://phabricator.wikimedia.org/T321458 (10brennen) [17:24:10] 10GitLab (Auth & Access), 10Release-Engineering-Team (Radar), 10Infrastructure-Foundations, 10SRE, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10brennen) [17:24:24] 10GitLab (Auth & Access), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10Infrastructure-Foundations, 10SRE, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10demon) [17:27:30] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10User-brennen: Rename mainline branch from "master" to "main" in GitLab:repos/releng/release - https://phabricator.wikimedia.org/T329770 (10thcipriani) [17:30:25] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10User-brennen: Rename mainline branch from "master" to "main" in GitLab:repos/releng/release - https://phabricator.wikimedia.org/T329770 (10thcipriani) Steps for future migrations: 1. Delete master branch in Gerrit 2. Arc... [17:31:58] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Increase poll_timeout on kubernetes gitlab runners - https://phabricator.wikimedia.org/T329196 (10thcipriani) 05In progressβ†’03Resolved [17:42:41] 10GitLab (Project Migration), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10User-brennen: Rename mainline branch from "master" to "main" in GitLab:repos/releng/release - https://phabricator.wikimedia.org/T329770 (10thcipriani) [17:44:39] 10GitLab (Auth & Access), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10Infrastructure-Foundations, 10SRE, and 3 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10thcipriani) @demon looking into what's need on the GitLab side; maybe "just" configur... [17:53:10] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10BCornwall) None of us can reproduce, you've attempted other networks, and your Git config seems reasonable. Perhaps this might be an OS-level issue. Do you have any other (non-vm) computers/OSes t... [17:55:08] 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Run docker-gc on deploy servers - https://phabricator.wikimedia.org/T329678 (10thcipriani) a:03jnuche [18:08:15] (03PS2) 10Ahmon Dancy: Add WikimediaMessages as dependency of SearchVue [integration/config] - 10https://gerrit.wikimedia.org/r/889499 (owner: 10Matthias Mullie) [18:09:56] (03CR) 10Ahmon Dancy: [C: 03+2] Add WikimediaMessages as dependency of SearchVue [integration/config] - 10https://gerrit.wikimedia.org/r/889499 (owner: 10Matthias Mullie) [18:11:07] (03Merged) 10jenkins-bot: Add WikimediaMessages as dependency of SearchVue [integration/config] - 10https://gerrit.wikimedia.org/r/889499 (owner: 10Matthias Mullie) [18:12:48] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/889499 [18:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:13:07] (03CR) 10Ahmon Dancy: [C: 03+2] "deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/889499 (owner: 10Matthias Mullie) [18:36:01] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10binbot) >>! In T329452#8619592, @BCornwall wrote: > None of us can reproduce, you've attempted other networks, and your Git config seems reasonable. Perhaps this might be an OS-level issue. Do you... [18:36:07] 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10Scap: Scap: Don't transmit "aborted" message to IRC if no prior announcement has been made - https://phabricator.wikimedia.org/T329228 (10dancy) p:05Triageβ†’03Medium a:03dancy [18:51:35] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10BCornwall) 05Openβ†’03Declined Unfortunately, since you're able to clone the mirror repositories via GitHub, this points to an issue with your computer. I'd love to help further but we're not eq... [18:53:36] 10Gerrit, 10Pywikibot: 500 server error when pulling Pywikibot i18n - https://phabricator.wikimedia.org/T329452 (10binbot) Thank you! :-( [18:59:53] 10Beta-Cluster-Infrastructure, 10Community-Tech, 10MediaWiki-extensions-Phonos: An unknown error occurred in storage backend "global-swift-eqiad" on Beta Cluster - https://phabricator.wikimedia.org/T329787 (10TheresNoTime) [19:00:49] 10Beta-Cluster-Infrastructure, 10Community-Tech, 10MediaWiki-extensions-Phonos: An unknown error occurred in storage backend "global-swift-eqiad" on Beta Cluster - https://phabricator.wikimedia.org/T329787 (10TheresNoTime) [19:13:27] hi folx, `samtar@deployment-ms-fe04:~$ swift list -A http://deployment-ms-fe04.deployment-prep.eqiad1.wikimedia.cloud/v1/AUTH_mw -U mw_media -K {a password which can be found}` is giving me a `401 Unauthorized` β€” I'm fairly sure this Worked Before(tm), but am I missing something? [19:14:11] 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10Scap: Scap: Don't transmit "aborted" message to IRC if no prior announcement has been made - https://phabricator.wikimedia.org/T329228 (10dancy) https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/76 [19:23:24] * `-A http://deployment-ms-fe04.deployment-prep.eqiad1.wikimedia.cloud/auth/v1.0` [19:25:24] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10Zabe) [19:29:53] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) @cscott I'm seeing a large spike in errors from Parsoid today. See https://logstash.wikimedia.org/goto/a7eb8ccc8cf7a... [19:40:48] TheresNoTime: my totally unverified guess is that something changed in the swift puppetization that broke things on beta, and then maybe the reboot applied those changes [19:41:03] iirc the swift authentication code was refactored fairly recently [19:42:47] AHAHAH GREAT :D [19:42:54] https://www.irccloud.com/pastebin/3CX59eHZ/ [19:43:57] https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/heads/master/deployment-prep/_.yaml#419 I really hope those aren't the live keys.. [19:45:12] TheresNoTime: you fixing that or should I? [19:45:31] taavi: be my guest [19:45:44] * TheresNoTime is too busy sobbing /j [19:48:18] pretty sure those account keys have been there for ages fwiw [19:48:29] fixed puppet runs on the frontend host, backend has a different error [19:48:35] can you check if that command works now? [19:53:59] taavi: still not working, but I'm also only half sure I'm using the right format, sorry (: [20:04:52] 10Beta-Cluster-Infrastructure, 10Community-Tech, 10MediaWiki-extensions-Phonos: An unknown error occurred in storage backend "global-swift-eqiad" on Beta Cluster - https://phabricator.wikimedia.org/T329787 (10TheresNoTime) 05Openβ†’03Resolved a:03TheresNoTime with thanks to Taavi for resolving ` samtar@... [20:07:46] * dancy pat pats TheresNoTime [20:07:57] !log unbroke puppet on deployment-ms-* [20:07:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:08:19] Deep breaths [20:08:24] :D [20:09:01] * zabe is still somewhat suprised that beta cluster survived the cloud incident [20:10:22] 10Beta-Cluster-Infrastructure: beta cluster down - https://phabricator.wikimedia.org/T329592 (10thcipriani) >>! In T329592#8618947, @Jgiannelos wrote: > I am still getting this error both on tests and when i try to login: > > `[Y@z-FeLrRRvlf4qIMcOt4AAAAFc] /w/index.php?returnto=Main+Page&title=Special:UserLogi... [20:10:40] Spoke too soon /j [20:10:51] ^ you've jinxed it now for sure [20:13:07] 10Beta-Cluster-Infrastructure, 10Discovery-Search: Puppet not running on deployment-awx-* - https://phabricator.wikimedia.org/T329792 (10taavi) [20:14:43] heh, we were already lucky that only one of the two dbs got corrupted (as far as I can tell) [20:15:19] I'm waiting to hear the "why" of that. [20:16:02] esp since there was a claim of no data loss [20:16:26] dancy: databases don’t like being stopped while they are writing [20:16:51] the hosts were hard-rebooted with mariadb still running, a risky thing [20:17:11] so I wrote a database engine in my old job and I wrote it to withstand mid-write interruptions (on a properly configured storage subsystem) [20:17:15] I would expect the same from mariadb [20:17:32] any db claiming to be ACID [20:17:35] zabe: was it completely broken or just transaction in flight? [20:18:16] dancy: it doesn’t very well, replication can break very easy. Replicas are easier to just rebuild if they get messy too. [20:18:26] very sad. [20:18:32] it's all lies! [20:18:50] dancy: it’s normally fairly recoverable tbh [20:19:05] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team, 10User-brennen: gitlab-phabricator may be missing posts for merge request changes - https://phabricator.wikimedia.org/T329793 (10brennen) [20:19:10] With handholding [20:19:15] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team, 10User-brennen: gitlab-phabricator may be missing posts for merge request changes - https://phabricator.wikimedia.org/T329793 (10brennen) a:03brennen [20:19:28] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10User-brennen: gitlab-phabricator may be missing posts for merge request changes - https://phabricator.wikimedia.org/T329793 (10brennen) [20:19:28] I believe zabe recloned though for ease [20:19:45] At first sign of trouble rather than attempt recovery [20:20:11] let's play a game of "which beta cluster instance is this": "The last Puppet run was at Tue Aug 16 23:01:44 UTC 2022 (263357 minutes ago)." [20:20:24] hehe [20:30:26] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Score, 10SRE-swift-storage: "FileBackendError: Iterator page I/O error" on a page on Beta Cluster - https://phabricator.wikimedia.org/T329744 (10TheresNoTime) 05Openβ†’03Resolved Now seems to be working, and it probably(?) had something to do with {T329... [20:30:36] (I fixed that, btw.) [20:33:00] RhinosF1, not sure, it threw errors like "Table 'aawiki.revision' doesn't exist in engine" [20:35:14] fwiw I think zabe did the correct thing. there really is no point in making heroic efforts to repair a thing that can be easily replaced. [20:37:21] zabe: ye that’s messed up [20:37:34] bd808: 100%, that’s why replicas exist [20:37:40] One of two reasons for them [20:43:20] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10ssastry) @dduvall, we are looking. The ones from T329740 can all be simply suppressed since they don't have any user impact r... [20:44:52] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10cscott) The spike seems related to the transition between wmf.22 and wmf.23, perhaps? There are logs like https://logstash.w... [20:47:09] bd808: My comment wasn't about making heroic efforts to repair. My sadness was the fact that a database claiming to be ACID isn't. [20:48:11] the universe has hated MySQL/MariaDB since it's invention ;) [20:50:30] But! If a program asked the OS to fdatasync() a file, and the storage subsystem claimed that it did flush pages to durable storage, but it really didn't.. there's nothing any software could do about that. [20:50:45] I'm wondering about that bit. [20:52:08] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10ssastry) Ya, so, 3 things: * Suppressing notices as in T329740#8618948 gets rid of the spike * The other errors Scott refere... [20:52:37] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Disallow direct access to the docker.io registry from gitlab runners - https://phabricator.wikimedia.org/T329679 (10bd808) This is pretty much the same problem as {T254319} was. Can we instead/also setup default credentials li... [20:56:29] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) >>! In T325586#8620127, @ssastry wrote: > Ya, so, 3 things: > > * Suppressing notices as in T329740#8618948 gets ri... [20:59:49] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): Could not authenticate against github.com - https://phabricator.wikimedia.org/T248387 (10bd808) >>! In T248387#6185787, @hashar wrote: > For histo... [21:00:33] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) >>! In T325586#8620127, @ssastry wrote: > So, in terms of decision, what we need to figure out is whether we suppres... [21:01:50] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10ssastry) >>! In T325586#8620143, @dduvall wrote: > I'm usually fine with suppressing notices, but this seemed like a pretty h... [21:04:05] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) >>! In T325586#8620165, @ssastry wrote: > Let me consult with Scott and we'll respond here with a proposal. Sounds... [21:04:20] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10ssastry) >>! In T325586#8620163, @dduvall wrote: > If there's a clear fix to be made in Parsoid (albeit with release overhead... [21:09:39] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10cscott) I'm starting the patch-and-tag-and-release-to-vendor process with https://gerrit.wikimedia.org/r/c/mediawiki/services... [21:11:50] 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„), 10Scap: Scap: Don't transmit "aborted" message to IRC if no prior announcement has been made - https://phabricator.wikimedia.org/T329228 (10dancy) 05Openβ†’03Resolved Fixed and deploy in scap 4.35.0 [21:13:55] o/, looks like a stuck jenkins job in zuul? https://integration.wikimedia.org/zuul/ (for gerrit:889493) [21:15:16] 5 hours is a bit long [21:16:02] would a `recheck` on 889493 cancel that run and start over? [21:17:13] Hmm. I don't know. [21:17:25] I don't think it would hurt to try [21:18:22] didn't seem to [21:18:28] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon πŸŒ„): Disallow direct access to the docker.io registry from gitlab runners - https://phabricator.wikimedia.org/T329679 (10bd808) I have a Public Repo Read-only token for use with `docker login` that I would like to share as soon as... [21:19:09] I'm looking for something to poke [21:19:14] rebase would though...? [21:20:29] I killed the two queued jobs in Jenkins. Try the rebase [21:20:39] zuul status hasn't updated though. :-/ [21:21:00] 10GitLab (CI & Job Runners), 10Release-Engineering-Team, 10mwbot-rs, 10mwcli: GitLab CI jobs failing with "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" - https://phabricator.wikimedia.org/T329216 (10bd808) I m... [21:21:41] looks like it has now :) [21:21:49] Yes seems happier. [21:21:54] nice teamwork [21:41:56] 10GitLab (CI & Job Runners): Secret storage for CI jobs - https://phabricator.wikimedia.org/T329798 (10bd808) [22:13:48] 10GitLab (Infrastructure), 10serviceops-collab, 10Patch-For-Review: automatically check for new gitlab releases and send notifications - https://phabricator.wikimedia.org/T323932 (10Dzahn) proof of concept / WIP: ` #!/usr/bin/python3 # check the version string of the latest gitlab security release # compare... [22:18:12] 10GitLab (Infrastructure), 10serviceops-collab, 10Patch-For-Review: automatically check for new gitlab releases and send notifications - https://phabricator.wikimedia.org/T323932 (10Legoktm) Why not use LibUp's upstream release monitoring? See https://www.mediawiki.org/wiki/Libraryupgrader [22:26:16] 10GitLab (Infrastructure), 10serviceops-collab, 10Patch-For-Review: automatically check for new gitlab releases and send notifications - https://phabricator.wikimedia.org/T323932 (10Dzahn) Thanks for the suggestion! I guess because upstream already provides a feed for us specifically for security releases... [22:37:55] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10cscott) Ok, https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/889641 is merged to mediawiki-vendor and https://gerrit.wikim... [22:43:21] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) Thanks! I will merge and deploy the backport. I might wait until tomorrow to re-roll train to group1, depending on m... [22:47:28] 10GitLab (CI & Job Runners), 10Release-Engineering-Team, 10mwbot-rs, 10mwcli: GitLab CI jobs failing with "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" - https://phabricator.wikimedia.org/T329216 (10bd808) htt... [22:50:39] I may have broken firewall rule between CI hosts.. but it was like literally a minute or so and fixed again. [22:50:56] contint/doc [23:33:53] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.40.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T325586 (10dduvall) Thanks very much for your help today, @cscott and @ssastry. I've re-rolled group1. > Let us know if the other one (... [23:41:32] 10Phabricator, 10Community-Tech, 10Product-Analytics: Track most-subscribed-to/most-tokened/high-priority tasks on Phabricator - https://phabricator.wikimedia.org/T329749 (10Samwilson) The most-tokened tasks are listed here: https://phabricator.wikimedia.org/token/leaders/ (it's not possible to filter by pro... [23:53:25] 10GitLab (CI & Job Runners), 10Release-Engineering-Team, 10mwbot-rs, 10mwcli: GitLab CI jobs failing with "You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" - https://phabricator.wikimedia.org/T329216 (10bd808) >>!... [23:53:40] 10Continuous-Integration-Config, 10Performance-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: diffConfig no longer detecs any changes in operations/mediawiki-config.git - https://phabricator.wikimedia.org/T329518 (10Ladsgroup) 05Openβ†’03Resolved