[06:46:39] 10GitLab (Infrastructure), 06collaboration-services: Troubleshoot GitLab nftables throttling after switchover - https://phabricator.wikimedia.org/T400971#11117721 (10ABran-WMF) [06:46:42] 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: ProbeDown - gerrit1003 - https://phabricator.wikimedia.org/T402847#11117719 (10ABran-WMF) [06:47:29] 10GitLab (Infrastructure), 06collaboration-services: Troubleshoot GitLab nftables throttling after switchover - https://phabricator.wikimedia.org/T400971#11117724 (10ABran-WMF) [07:07:40] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.45.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T396377#11117729 (10Aklapper) [07:35:15] (03PS1) 10Hashar: utils: add description to zuul-mw-jobs-runner [integration/config] - 10https://gerrit.wikimedia.org/r/1182036 (https://phabricator.wikimedia.org/T389998) [07:55:23] (03merge) 10aklapper: Diffusion file view: Restrict blame and coverage to logged-in users only [repos/phabricator/phabricator] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/merge_requests/95 [08:03:59] (03CR) 10Hashar: "Sorry I do have python-debian installed globally and I thus execute the script without a virtualenv (I do `./utils/docker-updates`). Then" [integration/config] - 10https://gerrit.wikimedia.org/r/1181182 (owner: 10Krinkle) [08:35:37] (03CR) 10Hashar: "That is the script I used to run jobs without recursive dependencies." [integration/config] - 10https://gerrit.wikimedia.org/r/1182036 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [08:48:57] hashar, jnuche: Hey there. Trying to run the train but I cannot put a rebased patch in place. Can anyone of us run "chmod g+w" on deploy1003 or do I need to find SRE folks? [08:49:26] File is owned by `mwpresync` group and I'm only in the `deployment` group, and the file has `-rw-r--r--` instead of `-rw-rw-r--`. Same game as 14h ago [08:51:56] andre_: you can fix permissions with this script: https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Problem%3A_file_permissions_errors Sorry but I won't be able to support the train today, I'm actually out sick today [08:52:06] oh sorry. Thanks! [08:55:27] sending best wishes, get well soon [08:58:33] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.45.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T396377#11118028 (10Aklapper) [08:59:20] 10Release-Engineering-Team (Priority Backlog 📥), 05Release, 05Train Deployments: 1.45.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T396377#11118038 (10Aklapper) Two rebased patches deployed via `scap update-patch` (and fiddling with permission errors), thus removed them as subtasks [09:13:49] 10Release-Engineering-Team (Doing 😎), 13Patch-For-Review, 05Release, 05Train Deployments: 1.45.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T396377#11118103 (10Aklapper) [09:29:29] 10Phabricator, 06Release-Engineering-Team, 06collaboration-services: Phorge setup check caching is misbehaving, leading to many duck-sound=quack requests - https://phabricator.wikimedia.org/T401157#11118178 (10Aklapper) FYI server logs mention this twice from last Friday around 1430UTC: `PHP Fatal error: Un... [09:31:34] (03open) 10aklapper: Revert "temporarily disable PhabricatorWebserverSetupCheck" [repos/phabricator/phabricator] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/merge_requests/96 (https://phabricator.wikimedia.org/T401157) [09:44:04] (03PS1) 10Hashar: Zuul: remove archived GettingStarted from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182099 (https://phabricator.wikimedia.org/T292654) [09:50:41] (03PS1) 10Hashar: Zuul: remove archived GlobalContribs from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182103 (https://phabricator.wikimedia.org/T157240) [09:50:44] (03PS1) 10Hashar: Zuul: remove archived TranslateSvg from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182104 (https://phabricator.wikimedia.org/T331817) [09:54:09] (03PS1) 10Samwilson: Zuul: [mediawiki/extensions/TemplateData] Add Wikibase for phan [integration/config] - 10https://gerrit.wikimedia.org/r/1182105 (https://phabricator.wikimedia.org/T398292) [10:03:20] (03PS1) 10Hashar: Zuul: remove WebDAVClientIntegration and WebDAVMinorSave [integration/config] - 10https://gerrit.wikimedia.org/r/1182110 [12:39:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:39:34] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T402919 (10wmcs-alerts) 03NEW [12:56:08] (03PS3) 10Hashar: Script to sync dependencies with extension registry [integration/config] - 10https://gerrit.wikimedia.org/r/1182115 (https://phabricator.wikimedia.org/T389998) [12:56:08] (03PS1) 10Hashar: Zuul: partially sort extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) [12:57:41] (03CR) 10CI reject: [V:04-1] Script to sync dependencies with extension registry [integration/config] - 10https://gerrit.wikimedia.org/r/1182115 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [13:00:24] (03PS4) 10Hashar: Script to sync dependencies with extension registry [integration/config] - 10https://gerrit.wikimedia.org/r/1182115 (https://phabricator.wikimedia.org/T389998) [13:36:57] (03CR) 10Hashar: [C:03+2] Zuul: remove archived TranslateSvg from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182104 (https://phabricator.wikimedia.org/T331817) (owner: 10Hashar) [13:36:58] (03CR) 10Hashar: [C:03+2] Zuul: remove archived GlobalContribs from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182103 (https://phabricator.wikimedia.org/T157240) (owner: 10Hashar) [13:36:59] (03CR) 10Hashar: [C:03+2] Zuul: remove archived GettingStarted from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182099 (https://phabricator.wikimedia.org/T292654) (owner: 10Hashar) [13:37:38] (03CR) 10Hashar: [C:03+2] "I have deleted both repositories, they were empty." [integration/config] - 10https://gerrit.wikimedia.org/r/1182110 (owner: 10Hashar) [13:38:37] (03Merged) 10jenkins-bot: Zuul: remove archived GettingStarted from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182099 (https://phabricator.wikimedia.org/T292654) (owner: 10Hashar) [13:38:39] (03Merged) 10jenkins-bot: Zuul: remove archived GlobalContribs from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182103 (https://phabricator.wikimedia.org/T157240) (owner: 10Hashar) [13:38:41] (03Merged) 10jenkins-bot: Zuul: remove archived TranslateSvg from dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182104 (https://phabricator.wikimedia.org/T331817) (owner: 10Hashar) [13:39:13] (03Merged) 10jenkins-bot: Zuul: remove WebDAVClientIntegration and WebDAVMinorSave [integration/config] - 10https://gerrit.wikimedia.org/r/1182110 (owner: 10Hashar) [13:41:35] (03CR) 10Hashar: "That is a partial sort to better match the output of the script at https://gerrit.wikimedia.org/r/c/integration/config/+/1182115/ which ou" [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [13:44:55] (03CR) 10Hashar: [C:03+2] Zuul: [mediawiki/extensions/TemplateData] Add Wikibase for phan [integration/config] - 10https://gerrit.wikimedia.org/r/1182105 (https://phabricator.wikimedia.org/T398292) (owner: 10Samwilson) [13:46:37] (03Merged) 10jenkins-bot: Zuul: [mediawiki/extensions/TemplateData] Add Wikibase for phan [integration/config] - 10https://gerrit.wikimedia.org/r/1182105 (https://phabricator.wikimedia.org/T398292) (owner: 10Samwilson) [14:18:56] (03merge) 10brennen: Revert "temporarily disable PhabricatorWebserverSetupCheck" [repos/phabricator/phabricator] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/phabricator/-/merge_requests/96 (https://phabricator.wikimedia.org/T401157) (owner: 10aklapper) [14:23:54] 10Phabricator (phabricator-next), 10Release-Engineering-Team (Doing 😎), 06collaboration-services: Deploy Phabricator/Phorge 2025-08-26 - https://phabricator.wikimedia.org/T402930 (10brennen) 03NEW [14:24:27] (03open) 10brennen: update submodules for 2025-08-26 deploy [repos/phabricator/deployment] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/deployment/-/merge_requests/78 (https://phabricator.wikimedia.org/T402930) [14:27:01] (03CR) 10Krinkle: [C:03+1] Zuul: partially sort extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [14:30:17] (03CR) 10Jforrester: Zuul: partially sort extension dependencies (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [14:30:26] (03CR) 10Jforrester: [C:03+1] Zuul: partially sort extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [14:30:53] (03CR) 10Jforrester: [C:03+1] utils: add description to zuul-mw-jobs-runner [integration/config] - 10https://gerrit.wikimedia.org/r/1182036 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [14:31:41] (03CR) 10Jforrester: Script to sync dependencies with extension registry (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182115 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [14:34:54] (03CR) 10Jforrester: [C:04-1] "Fundamentally this repo's scripts exist to be run by hashar with occasional support by mortals like me and you. The python-debian dependen" [integration/config] - 10https://gerrit.wikimedia.org/r/1181182 (owner: 10Krinkle) [14:59:11] (03CR) 10Hashar: Script to sync dependencies with extension registry (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182115 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [15:01:46] (03merge) 10brennen: update submodules for 2025-08-26 deploy [repos/phabricator/deployment] (wmf/stable) - 10https://gitlab.wikimedia.org/repos/phabricator/deployment/-/merge_requests/78 (https://phabricator.wikimedia.org/T402930) [15:21:54] 10Phabricator (2025-08-26), 10Release-Engineering-Team (Doing 😎), 06collaboration-services: Deploy Phabricator/Phorge 2025-08-26 - https://phabricator.wikimedia.org/T402930#11119612 (10Aklapper) 05Open→03Resolved Included in this amazing deployment were the following exciting custom changes: * {T3867... [15:28:52] (03update) 10dancy: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) [15:28:58] (03update) 10dancy: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) [15:29:29] (03update) 10dancy: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) [15:36:24] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Review and rename our existing Herald rules - https://phabricator.wikimedia.org/T402934 (10Aklapper) 03NEW p:05Triage→03Low [15:36:29] (03update) 10dancy: Update call to build-images.py [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/995 (https://phabricator.wikimedia.org/T402508) [15:36:34] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Review and rename our existing Herald rules - https://phabricator.wikimedia.org/T402934#11119792 (10Aklapper) [15:36:36] 10Phabricator (2025-08-26), 10Release-Engineering-Team (Doing 😎): Allow admins to rename personal Herald rules - https://phabricator.wikimedia.org/T386703#11119793 (10Aklapper) [15:37:27] 10Phabricator (2025-08-26), 10Release-Engineering-Team (Doing 😎): Allow admins to rename personal Herald rules - https://phabricator.wikimedia.org/T386703#11119805 (10Aklapper) 05Open→03Resolved This seems to work - at least the "Edit Rule" button is now available to Phab admins on Herald Rules, before... [15:38:27] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Review and rename our existing Herald rules - https://phabricator.wikimedia.org/T402934#11119812 (10Aklapper) [15:39:53] (03update) 10dancy: Update call to build-images.py [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/995 (https://phabricator.wikimedia.org/T402508) [15:40:07] 10Phabricator (2025-08-26), 10Release-Engineering-Team (Doing 😎), 06collaboration-services: Deploy Phabricator/Phorge 2025-08-26 - https://phabricator.wikimedia.org/T402930#11119821 (10Pppery) ... and also an update of the translations. [15:47:21] 10Scap, 13Patch-For-Review: Exception while building "next" image - https://phabricator.wikimedia.org/T402508#11119893 (10dancy) 05Open→03In progress p:05Triage→03High a:03dancy [16:42:09] 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog 📥), 06collaboration-services, 07Essential-Work, 13Patch-For-Review: Puppetize systemd unit for zuul's nodepool - https://phabricator.wikimedia.org/T401614#11120267 (10Dzahn) >>! In T401614#11099734, @bd80... [16:54:08] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Review and rename our existing Herald rules - https://phabricator.wikimedia.org/T402934#11120343 (10Aklapper) Note to myself: Rule UI shows "Send an email to rule author" as action. In "Edit Rule" mode that action says "Send me an email". I only r... [16:56:10] 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥): Review and rename our existing Herald rules - https://phabricator.wikimedia.org/T402934#11120354 (10Aklapper) [17:11:28] FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [17:11:41] 10Beta-Cluster-Infrastructure: Failed to update Puppet repository /srv/git/operations/puppet on instance deployment-puppetserver-1 in project deployment-prep - https://phabricator.wikimedia.org/T402950 (10wmcs-alerts) 03NEW [17:14:15] (03CR) 10Hashar: [C:03+2] "I am deploying this now, the exact sorting can be improved/enforced later. Then with the new Zuul the system will be slightly different so" [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [17:15:03] maintenance-disconnect-full-disks build 731561 integration-agent-docker-1041 (/: 29%, /srv: 97%, /var/lib/docker: 33%): OFFLINE due to disk space [17:15:40] (03Merged) 10jenkins-bot: Zuul: partially sort extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/1182135 (https://phabricator.wikimedia.org/T389998) (owner: 10Hashar) [17:16:39] !log Reloaded Zuul for "partially sort extension dependencies" - https://gerrit.wikimedia.org/r/c/integration/config/+/1182135 [17:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:20:03] maintenance-disconnect-full-disks build 731562 integration-agent-docker-1041 (/: 29%, /srv: 49%, /var/lib/docker: 33%): RECOVERY disk space OK [17:22:36] (03CR) 10Hashar: "The thing is Timo lacks the depencency and I agree it is annoying. I'd go with a testenv:docker-updates so one can:" [integration/config] - 10https://gerrit.wikimedia.org/r/1181182 (owner: 10Krinkle) [17:55:50] 06Project-Admins: Create a 'Rust' tag, like we have for JS? - https://phabricator.wikimedia.org/T402955 (10Jdforrester-WMF) 03NEW [17:59:09] (03PS1) 10Hashar: tox: lint with Python3, Zuul with Python 2.7 [integration/config] - 10https://gerrit.wikimedia.org/r/1182196 [18:12:42] 10Phabricator (Upstream), 07Upstream: After archiving a Phabricator project, all future edits to it generate an extra "[username] set this project's color to Red" entry in the project's history - https://phabricator.wikimedia.org/T398735#11120732 (10valerio.bozzolan) Ehm I mean this https://we.phorge.it/D26294 [18:21:49] (03approved) 10swfrench: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) (owner: 10dancy) [18:26:12] (03PS1) 10Hashar: tox: detect and lint python scripts without .py [integration/config] - 10https://gerrit.wikimedia.org/r/1182200 [18:26:46] (03CR) 10Hashar: "https://gerrit.wikimedia.org/r/c/integration/config/+/1182196 splits python2 and python3 linting" [integration/config] - 10https://gerrit.wikimedia.org/r/1181182 (owner: 10Krinkle) [18:27:19] pfff [18:27:24] so many hacks [18:27:35] (03CR) 10CI reject: [V:04-1] tox: detect and lint python scripts without .py [integration/config] - 10https://gerrit.wikimedia.org/r/1182200 (owner: 10Hashar) [18:29:19] (03PS2) 10Hashar: tox: detect and lint python scripts without .py [integration/config] - 10https://gerrit.wikimedia.org/r/1182200 [18:29:34] (03CR) 10Krinkle: tox: lint with Python3, Zuul with Python 2.7 (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182196 (owner: 10Hashar) [18:31:26] (03CR) 10Hashar: tox: lint with Python3, Zuul with Python 2.7 (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182196 (owner: 10Hashar) [18:31:35] (03PS2) 10Hashar: tox: lint with Python3, Zuul with Python 2.7 [integration/config] - 10https://gerrit.wikimedia.org/r/1182196 [18:31:35] (03PS3) 10Hashar: tox: detect and lint python scripts without .py [integration/config] - 10https://gerrit.wikimedia.org/r/1182200 [18:32:36] hashar: but getting cleaner! [18:32:58] OK, so docker-updates and such are already python3 scripts. They weren't actually py2, but missing from linting entirely. [18:45:42] (03update) 10dancy: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) [18:52:57] (03update) 10dancy: Update call to build-images.py [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/995 (https://phabricator.wikimedia.org/T402508) [18:56:18] (03open) 10dancy: mirror-repos.py: Reduce default worker count to 3 [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/153 [18:56:21] (03update) 10dancy: mirror-repos.py: Reduce default worker count to 3 [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/153 [18:57:16] (03merge) 10dancy: mirror-repos.py: Reduce default worker count to 3 [repos/releng/train-dev] - 10https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/153 [19:04:25] (03approved) 10swfrench: Update call to build-images.py [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/995 (https://phabricator.wikimedia.org/T402508) (owner: 10dancy) [19:10:39] (03merge) 10dancy: Update call to build-images.py [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/995 (https://phabricator.wikimedia.org/T402508) [19:11:03] (03open) 10dancy: Release 4.210.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/996 [19:13:18] (03merge) 10dancy: Release 4.210.0 [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/996 [19:18:41] (03merge) 10dancy: make-container-image: Revised workflow [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/211 (https://phabricator.wikimedia.org/T402508) [19:21:44] (03CR) 10Krinkle: [C:03+1] tox: lint with Python3, Zuul with Python 2.7 [integration/config] - 10https://gerrit.wikimedia.org/r/1182196 (owner: 10Hashar) [19:44:36] 06Project-Admins: Create a 'Rust' tag, like we have for JS? - https://phabricator.wikimedia.org/T402955#11121199 (10Aklapper) I'm fine with that (as long as we don't introduce `#PHP` :P ); question as usual is how to make folks aware of it and who'll set it regularly... [19:52:57] 10Beta-Cluster-Infrastructure, 07Epic, 13Patch-For-Review: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) - https://phabricator.wikimedia.org/T393487#11121254 (10Daimona) [20:12:52] FYI, we got some more beta scraping today (which isn't too far away from scraping the bottom of the barrel, really): https://phabricator.wikimedia.org/T402971 [20:27:18] /rimshot [21:24:47] Daimona: thanks for the task and analysis. Blocking that range from Cyprus seems like a good place to start. I'll do the needful I guess. [21:25:54] Thank you! The other /24 range seems safe to block too, as well as the single IP. Then we can see how it goes. I'd do this myself but I don't think I have the needed permissions or knowledge. [21:26:59] Daimona: I can help you with both if you would like, but I can also just do the work if you would rather work on other things [21:27:18] what do you guys use to block? [21:27:56] mutante: the local CDN in deployment-prep. https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Blocking_and_unblocking [21:28:17] ah, thanks, I was just curious [21:28:17] Depends on the level of difficulty I guess. It's not the first time I come across beta scraping preventing me from doing something, and being able to do something about it would be useful for sure. [21:28:37] "something about it" --> besides waiting for you to be the hero and save the day, that is [21:28:47] I like how you have different size hammers [21:29:17] Daimona: that wikitech page should explain all the bits. The right needed is the "member" role in the deployment-prep project [21:30:13] I believe I have that. Maybe I could try this now? [21:30:53] Sure! I'm here if you get into a weird place and what some help. [21:31:01] *want [21:36:17] Okay, I was reading that page. Am I supposed to see "abuse_networks:blocked_nets:networks" in https://horizon.wikimedia.org/project/puppet/ ? [21:37:06] yes. it should be the first key if you have the deployment-prep project selected. [21:37:25] * bd808 wishes that deep links work in Horizon [21:37:47] Ahhhh yes I missed the project selector at the top. Found it now! [21:38:03] :+1: That's an easy miss. [21:44:34] I'm slightly confused - I see 85.0.0.0/8 is listed already. Shouldn't that include 85.208.96.0/24 already? [21:47:33] Daimona: yeah, the /8 would contain that /24. /me looks [21:48:00] Yeah and I just double-checked that we got fresh requests from 85.x [21:49:47] Blocked the other two for the time being [21:54:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:58:23] well fuck -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175991 [21:58:42] Daimona: _joe_ removed our blocking method [21:59:17] rip [22:00:07] yup "no more used" [22:01:51] beta should be moved behind a walled garden. Maybe setting up a password would be good enough to defeat bots [22:02:39] or move the WMCS web proxy behind the same edge cache infra to benefit from the blocks in action in prod [22:03:06] hashar: who will rewrite every test in the universe to add password auth? [22:03:51] Prod wants nothing to do with WMCS so moving behind the prod CDN is also a non-starter [22:04:28] for the tests, they use framworks which surely should know about using password auth. And I could see webdriver.io having a setting for that [22:04:32] The positive (for some definition of "positive") side is that it lets us see what would happen without all the blocks in place. https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=deployment-prep&var-instance=deployment-mediawiki14&from=now-30d&to=now&timezone=browser&viewPanel=panel-3 [22:04:39] or mwbot would have certainly [22:05:06] for prod vs WMCS yeah that is a trouble. Short of replicating the superset/requestctl infra in front of WMCS [22:05:15] but that sounds like a shit ton of duplicate work [22:05:33] hence why I tend to favor putting beta behind a wall [22:06:11] we could potentially exempt our traffic from requiring authentication (eg CI jobs running on a daily basis would be left untouched) [22:06:19] at the expense of people would require to use a password [22:07:23] but blocking whole /8 would keep causing issues and regardless IP blocking is a never ending task [22:08:20] anyway. I should be asleep really [22:08:23] hashar: if you are trying to convince me to delete all of Beta Cluster you are doing a good job [22:08:31] well [22:08:39] we can try putting it behind a password [22:08:48] how? [22:09:31] we had a lame but functional system and it was just deleted by the SREs [22:10:20] * bd808 walks away before he says things that will get him yelled at [22:10:48] I guess Guiseppe was not aware we blocked-net was used for beta [22:11:03] it can't be found in operations/puppet since the blocks are made in Horizon [22:11:17] (re: we don't have +2 on puppet to "upstream" those to operations/puppet) [22:11:21] but that one is a dead horse [22:11:23] anyway yeah [22:12:22] Well, if the removal wasn't intentional it could presumably be reverted? [22:12:33] presumably [22:13:07] for the password, my guess is AuthType Basic / AuthName "Enter the password: fabulous". I have no idea how poorly that would behave across multiple domains [22:13:56] or well [22:13:59] we could shut down beta [22:14:23] "Err impossible to maintain sorry" [22:14:38] anyway I should really sleep cause I start yelling non sense :] [22:15:04] but as a start, maybe that puppet patch can be revert [22:16:01] we'll have to figure it out. I don't think we can do a simple revert since then subsequent patches will fail. [22:16:25] maybe that would work for some hours, but not longer-term. [22:16:53] if we want to use vcl, then we'll need a dedicated spot for beta stuff. [22:17:26] and we'll probably need SRE's help to find where that spot is so we can avoid whoopsies [22:21:38] I'm unsure why blocked nets needed to be removed, other than it became redundant with the x-provenance header in prod [22:22:53] how about "block China" because if we have legit users in China they are already forced to use VPN anyways and that should solve most of the issue all at once [22:23:19] I know.. first need something to even be able to do that [22:23:30] also, in the case of beta, seems like we have different abusers [22:23:47] or additional abusers at least [22:25:18] also, did these rules just move? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175989/3/modules/profile/templates/cache/haproxy/ipblocks-all.map.erb [22:53:09] 10Beta-Cluster-Infrastructure, 07Epic, 13Patch-For-Review: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) - https://phabricator.wikimedia.org/T393487#11121800 (10thcipriani) >>! In T393487#11121664, @bd808 wrote: > https://gerrit.wikimedia.org/r... [22:55:43] 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog 📥), 06collaboration-services, 07Essential-Work, 13Patch-For-Review: Puppetize systemd unit for zuul's nodepool - https://phabricator.wikimedia.org/T401614#11121806 (10Dzahn) After a `[zuul1001:~] $ sudo d... [22:56:25] I got to "Nodepool launcher 0.0.0 096308f starting" on zuul1001 :) [22:56:38] 10Beta-Cluster-Infrastructure, 07Epic, 13Patch-For-Review: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) - https://phabricator.wikimedia.org/T393487#11121808 (10bd808) >>! In T393487#11121800, @thcipriani wrote: > Looks like these all moved via... [22:56:44] after getting past TLS cert issues, fixing config syntax and whatnot. [22:57:02] mutante: w00t [22:57:32] tomorrow I will continue on this a bit more. then I will be out until next week after the holiday [22:57:39] thcipriani: I may have the blocking at haproxy applied... just need to figure out how to check it [23:00:15] I added an ip to a server that I have, I could still curl it, so it doesn't look like it's applied [23:02:07] well, the IP I added got added looking at your puppet run :) [23:02:17] I think what I just got turned on is adding an x-provenance header? [23:02:45] so maybe there is another knob to turn to block based on that... [23:03:49] yeah, that looks like what the other patch you found does: block if that header is applied and matches abusers or phab_abusers [23:04:15] but haproxy is failing to reload according to sudo systemctl status haproxy [23:04:33] presumably, when that reloads, all those IPs will be added to abusers and be blocked [23:05:46] ugh. I just checked that the 2nd puppet run was clean without looking at the service [23:05:57] journalctl is giving me...nothing useful other than "soft-stop runnng for too long" baybe just kicking haproxy would fix it [23:06:04] *maybe [23:06:59] you wanna try? We probably should not both restart it at the same time :) [23:07:05] heh, sure [23:07:26] > Job for haproxy.service failed because the control process exited with error code. [23:07:36] kind of immediately after restart [23:10:07] journalctl -u haproxy should show a bit more [23:10:29] ALERT] (2220049) : config : parsing [/etc/haproxy/haproxy.cfg:13] : Lua runtime error: Error opening the specified MaxMind DB file [23:10:44] ah, I saw that in your puppet run [23:10:46] ah, that's the geodns database [23:10:59] `sudo haproxy -c -f /etc/haproxy/haproxy.cfg` is the magic to get error messages [23:11:14] https://www.digitalocean.com/community/tutorials/how-to-troubleshoot-common-haproxy-errors#troubleshooting-with-haproxy [23:11:16] puppet code has options to either install only the free DB files or the paid ones that WMF has a license for [23:11:16] -rw-r--r-- 1 haproxy haproxy 715 Aug 26 22:49 /etc/haproxy/lua/maxmind-lookup.lua [23:11:21] exists [23:12:15] 5 local isp_dbpath = "/usr/share/GeoIP/GeoIP2-ISP.mmdb" [23:12:19] is this missing? [23:12:46] yep [23:12:50] > /usr/share/GeoIP/GeoIP2-ISP.mmdb: cannot open `/usr/share/GeoIP/GeoIP2-ISP.mmdb' [23:13:03] (No such file or directory) [23:13:05] modules/profile/manifests/puppetserver/volatile.pp: 'GeoIP2-ISP', [23:13:18] in prod this is on the puppetmasters in that volatile dir [23:13:30] modules/puppetmaster/manifests/geoip.pp [23:14:36] you can try to apply this puppet class: [23:14:39] GeoIP2-ISP is only in paid [23:14:48] class { 'geoip::data::maxmind': [23:14:58] that is what I would figure: this is a paid service [23:15:01] yea, but we have the license key [23:15:19] and if we put it in beta... [23:15:27] I dont know if we are allowed to use it here but WMF does pay [23:15:33] beta's secrets are not so secret [23:16:03] sounds a bit similar to the issue with providing TLS and where to put the keys [23:16:33] we have a project local puppetserver that is fine for most things [23:16:42] maybe the better fix is to make it configurable which DB file it wants [23:16:46] but paid data maybe not [23:16:48] and replace it with the free ones if in beta [23:16:52] yeah [23:17:00] it should just be less accurate city data [23:17:40] free DB: class { 'geoip::data::maxmind': [23:17:48] paid DB: class { 'geoip::data::maxmind::ipinfo' [23:18:31] we'd still need a license key for the free DB? [23:19:36] no, I dont think so. the class that installs just the free DBs does not have a parameter for license key [23:19:48] it existed before the paid ones [23:19:59] I added the ::ipinfo class later for https://phabricator.wikimedia.org/T288844 [23:20:50] # fall back to public legacy databases [23:20:57] product_ids => [ [23:20:57] 'GeoLite2-ASN', [23:20:57] 'GeoLite2-Country', [23:20:57] 'GeoLite2-City', [23:21:40] thcipriani: MaxMind license key. Defaults to 000000000000, :) [23:23:17] there is a package called "geoipupdate" and an /etc/GeoIP.conf for that and then running geoipupdate downloads the files [23:24:13] mutante: do you know where the geoip stuff is hidden when ::puppetserver is being used instead of the older ::puppetmaster module? [23:25:12] bd808: [puppetserver1001:/srv/puppet_fileserver/volatile/GeoIP] $ [23:25:46] we could manually copy files but if we also want regular updates then we need the geoipupdate package [23:26:28] hrm, modules/profile/files/cache/haproxy/tests/usr/share/GeoIP/update.sh [23:26:51] 10Beta-Cluster-Infrastructure, 07Epic, 13Patch-For-Review: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) - https://phabricator.wikimedia.org/T393487#11121892 (10bd808) HAProxy is failing to start because /usr/share/GeoIP/GeoIP2-ISP.mmdb is miss... [23:27:11] there is a local puppetserver in that project, right? [23:27:26] first step might be to check if it has anything in that ./volatile/ path [23:27:46] the directories are there, but they are empty [23:27:57] how about: dpkg -l | grep geoip [23:28:03] does it have geoipupdate package installed? [23:28:12] deployment-puppetserver-1:/srv/puppet_fileserver/volatile/GeoIP is the path I'm looking at [23:28:32] geoipupdate is installed [23:29:21] try running it like this: [23:29:22] $geoipupdate_command = "/usr/bin/geoipupdate -f ${config_file} -d ${data_directory} -v" [23:29:24] (03PS1) 10Samwilson: Zuul: [mediawiki/extensions/TemplateData] Fix typo in 'Wikibase' [integration/config] - 10https://gerrit.wikimedia.org/r/1182237 (https://phabricator.wikimedia.org/T398292) [23:29:45] (03CR) 10Samwilson: Zuul: [mediawiki/extensions/TemplateData] Add Wikibase for phan (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/1182105 (https://phabricator.wikimedia.org/T398292) (owner: 10Samwilson) [23:29:51] where the config file is /etc/GeoIP.conf [23:29:56] and the data_dir is that volatile dir [23:30:58] error loading configuration file /etc/GeoIP.conf: geoipupdate requires a valid AccountID and LicenseKey combination [23:31:33] :tableflip: [23:31:46] is the Licensekey 000000 or something else? [23:31:53] or just nothing [23:31:59] its the zeros [23:32:24] and the AccountID? [23:33:03] a terrible idea: stick https://github.com/maxmind/MaxMind-DB/blob/main/test-data/GeoIP2-ISP-Test.mmdb into /usr/share/GeoIP GeoIP2-ISP.mmdb [23:33:24] yea, I was actually looking at that one too [23:33:26] (03PS2) 10Samwilson: Zuul: [mediawiki/extensions/TemplateData] Fix typo in 'Wikibase' [integration/config] - 10https://gerrit.wikimedia.org/r/1182237 (https://phabricator.wikimedia.org/T398292) [23:33:30] it only has test data.. but so what [23:34:05] oh wait.. I got something better [23:34:09] thcipriani: its worth a shot [23:34:24] edit the /etc/GeoIP.conf and remove the non-free ones from the EditionIDs line [23:34:29] we don't really need the data, we just need haproxy to start [23:34:29] so that we dont try to download paid stuff [23:34:52] instead of: EditionIDs GeoIP2-City GeoIP2-Connection-Type GeoIP2-Country GeoIP2-ISP [23:34:58] something like: EditionIDs GeoIP2-City GeoIP2-Connection-Type GeoIP2-Country GeoIP2-ISP [23:35:33] > The MaxMind DB file contains invalid metadata [23:35:35] fun [23:35:41] something like: GeoLite2-ASN GeoLite2-Country GeoLite2-City [23:35:42] so my test data thing: no joy [23:35:47] ^ these are free [23:36:06] we have all those in /usr/share/geoip/ already [23:36:19] should I just move one of those to where the lua script is looking? [23:36:33] ok, then I think we are good for that part and "just" need to make it configurable which DB it looks for [23:36:49] oh, sure, or do that :P [23:37:10] of course the function is "fetch_isp" so... [23:37:23] you mean just renaming it to something it is not? well, you can try and see what happens [23:38:07] thcipriani: I think we just need a different guard to block loading that lua file at all [23:38:42] ^ [23:38:48] unless the whole blocking mechanism requires the ISP data for real [23:39:02] well, I did: sudo ln -s /usr/share/GeoIP/GeoIP2-Country.mmdb /usr/share/GeoIP/GeoIP2-ISP.mmdb and then sudo systemctl restart haproxy [23:39:14] and it's on [23:39:39] thcipriani: the other cache host will need that hack too [23:39:45] I see 403's in the logs again [23:39:56] deployment-cache-upload08 [23:40:22] * thcipriani does that [23:41:08] running puppet first [23:42:09] it's gonna be interesting if geoipupdate runs and overwrites that link [23:42:36] I note that that deployment-cache-upload08 doesn't have the /usr/share/GeoIP dir at all... [23:42:41] because somehow you have the package installed but the files were not downloaded [23:44:09] thcipriani: huh. I see the `profile::cache::varnish::frontend::fe_vcl_config::enable_geoiplookup: true` hiera for both [23:44:26] ah, I think it's just disabled if it's not $is_active and that is if it thinks it's the "active puppetserver" [23:45:45] I vaguely remember that feature flag having something to do with the license terms, but that may be a old halucination [23:46:26] faidon, geoip, and wmcs is a tuple in my head, but I can't recall the details [23:47:07] WMF has some paid geoip license stuff too [23:47:19] which has different paths, formats and other stuff from the free ones [23:47:27] bd808: yeah, haproxy is failing in the same way, I guess I'll just make that dir and grab the file from cache-text08 and get it running then we can sort how to fix this for beta [23:48:09] if you are looking at geoip::data::maxmind it's all free stuff, if you are looking at geoip::data::maxmind::ipinfo it's paid stuff [23:48:21] scratch that, the directory exists, the country file doesn't exist. [23:48:30] only one of the puppetserver is considered the "active" one that is the proxy that downloads the files [23:48:35] all other nodes get it from that server [23:49:07] so if the FQDN does not match that active server then you would get the package installed [23:49:27] but the command to download will not be executed [23:52:32] the parameter "ca_server" that is passed to geoip::data::maxmind needs to match the FQDN of the puppetserver. in production it's set to "$profile::puppetserver::ca_server" which is in Hiera as: puppet_ca_server: 'puppetserver1001.eqiad.wmnet [23:52:44] thcipriani: I need to go do other things. godspeed and please make some update on T393487 about where you end up. I can work on Puppet hacks tomorrow if needed. [23:52:45] T393487: 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) - https://phabricator.wikimedia.org/T393487 [23:52:53] in cloud.yaml you have: cloud.yaml:puppet_ca_server: "%{lookup('puppetmaster')}" [23:53:21] bd808: will do, thanks! [23:55:14] alright, restarted haproxy on deployment-cache-upload08 [23:55:20] beta appears to be working [23:55:26] lemme test blocking [23:58:03] neat, I'm blocked [23:58:09] but not from home [23:58:20] limited success.