[00:02:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:02:15] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [00:02:30] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9691255 (10bd808) test [00:07:48] (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [00:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:09:59] 10Wikibugs: Replace Redis queue with custom http solution - https://phabricator.wikimedia.org/T361518#9691268 (10bd808) I have code running in my local environment for the whole stack without Redis anywhere! It needs a bit more polish before I push to gitlab and start testing it at scale in the wikibugs-testing... [00:12:56] (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [00:13:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:23:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:48:30] 10VPS-project-Codesearch, 13Patch-For-Review: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691283 (10cmooney) This rule in filter table / input chain is allowing the traffic to Hound, but only from two spectific IPs: ` 6 15097 906K ACCEPT... [01:03:18] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9691320 (10phaultfinder) [01:25:59] 10Cloud-VPS: ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - https://phabricator.wikimedia.org/T361898#9691330 (10Tim-moody) 05Resolved→03Open just ran again twice ssh -J primary.bastion.wmflabs.org timmoody@med.iiab.eqiad1.wikimedia.cloud timmoody@med.iiab.eqiad1.wikimedia.c... [02:17:56] (SystemdUnitDown) firing: (3) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:18:02] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9691359 (10phaultfinder) [03:13:11] 10VPS-project-Codesearch, 13Patch-For-Review: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691420 (10Krinkle) Thank you @cmooney, that's amazing. That rule is very specific to just port 3002. That explains a few other things I was struggling with. [03:14:08] (03Abandoned) 10Krinkle: frontend: Change Dockerport to expose port 3003 instead of port 80 [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1016887 (https://phabricator.wikimedia.org/T361899) (owner: 10Krinkle) [03:19:21] 10VPS-project-Codesearch, 13Patch-For-Review: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691437 (10Krinkle) [03:20:43] 10VPS-project-Codesearch, 13Patch-For-Review: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9691439 (10Krinkle) I've tested this locally on codesearch8 by using `iptables-save` and `iptables-restore` and adding `lang=diff krinkle@codesearch8:~$ vi... [04:02:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:02:56] (SystemdUnitDown) firing: (5) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:03:04] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9691515 (10phaultfinder) [04:07:48] (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:12:56] (SystemdUnitDown) firing: (5) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:17:56] (SystemdUnitDown) firing: (5) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:19:34] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691719 (10Slst2020) My preference would be to continue with what I perceive is the status quo, which I would describe as something like this: We are not following any explicit best pract... [07:42:00] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691762 (10aborrero) +1 to option B5 [08:02:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:05:41] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691822 (10dcaro) > I think this means I'm most aligned with options B & 1, because I believe this approach ultimately leads (close enough) to the outcome of option 5. So, do you want opt... [08:07:49] (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:13:16] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691843 (10Slst2020) I want option 1 because it's the only one that is not prescriptive. To me it's a coincidence that option 5 aligns with what I think would happen by going with B1. That... [08:19:49] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691856 (10dcaro) >>! In T361804#9691843, @Slst2020 wrote: > I want option 1 because it's the only one that is not prescriptive. To me it's a coincidence that option 5 aligns with what I t... [08:21:13] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691862 (10dcaro) Note that the fact that 1 has less checks, does not mean it's not prescriptive, just that when strictly applied you would actually have less checks (ex. remove mypy from... [08:29:05] 10Cloud-VPS: 14ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - 14https://phabricator.wikimedia.org/T361898#9691867 (10taavi) 05Open→03Resolved 14@bd808 if the first Puppet run is interrupted one also needs to remove the file blocking local logins: `lang=shell-session root... [08:30:53] 10Cloud-VPS (Quota-requests), 06Community-Tech: xtools quota increase - https://phabricator.wikimedia.org/T361912 (10TheresNoTime) 03NEW [08:31:22] 10Cloud-Services: Migrate cloudceph servers to nftables - https://phabricator.wikimedia.org/T361913 (10MoritzMuehlenhoff) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific... [08:32:04] 06cloud-services-team, 10Cloud-VPS: Migrate cloudceph servers to nftables - https://phabricator.wikimedia.org/T361913#9691895 (10taavi) [08:32:34] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691896 (10Slst2020) >>! In T361804#9691856, @dcaro wrote: > I think there might be something not clear in the options xd > > Anything with 'B' in front is non-prescriptive (A -> prescrip... [08:35:21] 10Cloud-VPS (Quota-requests), 06Community-Tech: xtools quota increase - https://phabricator.wikimedia.org/T361912#9691898 (10TheresNoTime) We're seeing some intermittent downtime ({T361876} related, but not directly) //possibly// caused by the increased API load — we'd like to do this increase sooner rather th... [08:41:31] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691906 (10dcaro) > Indeed, I think that B is incompatible with any option that says "let's agree on doing this specific list of things". So (to me) B implies 1. In practice, this combinat... [08:49:16] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691946 (10dcaro) [08:50:39] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691953 (10Slst2020) I don't see that we have to be explicit in the wiki page about what we currently do, other than recommending "follow the conventions in whatever repository you are con... [08:51:18] 10Cloud-VPS (Quota-requests), 06Community-Tech: 14xtools quota increase - 14https://phabricator.wikimedia.org/T361912#9691947 (10TheresNoTime) 05Open→03Invalid 14On discussion with @taavi (thank you), it appears that if //any// quota increase is needed, it'll be for the trove database which is [[ http... [08:54:14] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691960 (10dcaro) >>! In T361804#9691953, @Slst2020 wrote: > I don't see that we have to be explicit in the wiki page about what we currently do, other than recommending "follow the conven... [09:01:08] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9691984 (10Slst2020) >>! In T361804#9691960, @dcaro wrote: I do think that it's good to keep the page somewhat updated, so if you are creating a new repo or similar you have a reference to... [09:04:10] 10Cloud Services Proposals: Decision request - Update python team best practices - https://phabricator.wikimedia.org/T361804#9692003 (10dcaro) >>! In T361804#9691984, @Slst2020 wrote: >>>! In T361804#9691960, @dcaro wrote: >> I do think that it's good to keep the page somewhat updated, so if you are creating a n... [10:28:43] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: 14[trove] wrong quota_usages values in project tf-infra-test - 14https://phabricator.wikimedia.org/T359412#9692249 (10fnegri) 14The values for `in_use` and `reserved` were again showing non-zero values even if there were no active database instances... [10:36:05] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9692280 (10dcaro) >>! In T306391#9691083, @AntiCompositeNumber wrote: > That's not particularly useful for this task about CronJobs. It can be converted to a ContinuousJob... [10:39:49] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:07:09] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10XTools, 07Design, and 2 others: [Design] Communicate to users why there are gaps in IP data on Special:Contributions - https://phabricator.wikimedia.org/T360928#9692388 (10matej_suchanek) [11:17:56] (SystemdUnitDown) firing: (5) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:32:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 2 deleted instances on cloudinfra-internal-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:54:21] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9692471 (10taavi) 05Resolved→03Open Re-opening since I think documentation needs to be added to https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framewo... [11:55:11] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9692477 (10taavi) [11:55:13] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9692478 (10taavi) [11:58:43] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9692482 (10dcaro) >>! In T335592#9692471, @taavi wrote: > Re-opening since I think documentation needs to be added to https://wikitech.wikimedia.org/wiki/Help:Tool... [12:02:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:07:49] (PuppetFailure) firing: Puppet has failed on cloudbackup1002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [12:08:10] 10Cloud-VPS: 14ssh to new instance "med.iiab.eqiad1.wikimedia.cloud" fails for user timmoody - 14https://phabricator.wikimedia.org/T361898#9692510 (10Tim-moody) 14works now, thanks [12:09:15] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9692512 (10taavi) I don't see how I could use a continuous job to update a wiki page once a week. [12:09:41] 10Toolforge (Toolforge iteration 08): [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9692519 (10dcaro) p:05Triage→03High [12:12:59] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [12:13:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:13:34] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [12:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:15:01] 10Cloud-VPS: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9692532 (10Tim-moody) I tried df, blkid, and lsblk and there is no evidence of any storage other than the normal sda. [12:15:14] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [12:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:15:50] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [12:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:27:41] 06cloud-services-team, 10Cloud-VPS: project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - https://phabricator.wikimedia.org/T361893#9692556 (10taavi) a:03taavi Seems like Nova and Cinder are somehow out of sync here: `lang=shell-s... [12:42:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:52:01] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:57:46] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:16:13] 10Toolforge (Toolforge iteration 08): 14Upgrade Toolforge front proxies to Bookworm - 14https://phabricator.wikimedia.org/T361223#9692563 (10taavi) 05Open→03Resolved [13:17:01] 10Toolforge (Toolforge iteration 08): 14Rust image build on toolforge fails - 14https://phabricator.wikimedia.org/T358552#9692571 (10Magnus) 05Open→03Resolved 14This appears to be reolved, and replaced by the `too many open files` bug (another Phab ticket is open). [13:17:37] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [harbor] upgrade to 2.10.1 - https://phabricator.wikimedia.org/T354507#9692584 (10Slst2020) [13:18:10] 06cloud-services-team, 10Cloud-VPS: 14project owidm volume owidm-static can not be removed from attachment to owidm-instance and can not be attached to another instance - 14https://phabricator.wikimedia.org/T361893#9692585 (10Tim-moody) 05Open→03Resolved 14Volume owidm-static is now detached from owid... [13:18:18] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[harbor] upgrade to 2.10.1 - 14https://phabricator.wikimedia.org/T354507#9692588 (10Slst2020) 05In progress→03Resolved [13:18:30] 10Toolforge (Toolforge iteration 08): [harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - https://phabricator.wikimedia.org/T361842#9692598 (10Slst2020) Indeed, this seems to have been an issue only in (my particular setup of) lima-kilo. Sorry for the confusion! Non-a... [13:18:52] 10Toolforge (Toolforge iteration 08): 14[harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - 14https://phabricator.wikimedia.org/T361842#9692614 (10Slst2020) 05Open→03Invalid [13:19:00] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9692596 (10taavi) a:03taavi Anything left to do here? [13:19:14] 10Toolforge (Toolforge iteration 08): 14[harbor, maintain-harbor] Harbor upgrade 2.10 breaks delete-stale-toolforge-artifacts cron job - 14https://phabricator.wikimedia.org/T361842#9692618 (10Slst2020) 05Invalid→03Resolved [13:23:22] 10Cloud-VPS: cloudcumin can't reach bastion-restricted itself - https://phabricator.wikimedia.org/T361831#9692688 (10taavi) a:03taavi [13:25:45] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#9692713 (10taavi) [13:34:00] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9692761 (10dcaro) >>! In T306391#9692512, @taavi wrote: > I don't see how I could use a continuous job to update a wiki page once a week. I don't have a full picture of wh... [14:00:59] 10Toolforge, 07Kubernetes: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime - https://phabricator.wikimedia.org/T306391#9692803 (10dcaro) Did a quick test trying to reproduce the OOM issue, and I was unable to get a 'running' pod after being killed by OOM :/, I think the script @MusikAnimal... [14:13:59] 10Data-Services: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945 (10MusikAnimal) 03NEW [14:14:33] 06cloud-services-team, 10Data-Services: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945#9692861 (10taavi) a:03taavi [14:14:38] 06cloud-services-team, 10Data-Services: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945#9692864 (10MusikAnimal) @taavi reports that "looks like the index maintenance script for the index in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016066 was never ran" [14:22:38] 10Cloud-VPS: Request temporary storage quota increase for project iiab for migration to bookworm image - https://phabricator.wikimedia.org/T361946 (10Tim-moody) 03NEW [14:27:12] 10Toolforge: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#9692931 (10dcaro) >>! In T341066#9691067, @Raymond_Ndibe wrote: > how will this affect the current `3 continuous jobs` limit? does 2 replicas of a continuous job count as 1 or 2 when consider... [14:31:31] 06cloud-services-team, 10Data-Services: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945#9692940 (10taavi) [14:32:40] 10Data-Services: maintain-replica-indexes --help fails - https://phabricator.wikimedia.org/T361948 (10taavi) 03NEW [14:38:08] 06cloud-services-team, 10Data-Services: 14Toolforge view for blocks is very slow - 14https://phabricator.wikimedia.org/T361945#9692966 (10taavi) 05Open→03Resolved [15:03:49] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9693051 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm [15:16:26] (SystemdUnitDown) resolved: (2) The systemd unit backup_cinder_volumes.service on node cloudbackup1001-dev has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1001-dev - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:40:00] 10PAWS: Reduce cluster size - https://phabricator.wikimedia.org/T361952 (10rook) 03NEW [15:40:37] 10PAWS: Reduce cluster size - https://phabricator.wikimedia.org/T361952#9693169 (10rook) [15:46:06] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Use cloudbackup100[12]-dev for cinder backup test/dev - https://phabricator.wikimedia.org/T358855#9693185 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm c... [15:58:38] 06cloud-services-team, 10Cloud-VPS: Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361772#9693231 (10Andrew) nope, thanks! [16:01:06] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#9693241 (10bd808) > Connect active LDAP accounts with SUL accounts -- this can use Bitu/Striker account linking? Yes, the associations collected by Striker and Bitu would be part of... [16:14:20] 06cloud-services-team, 10Cloud-VPS: 14Expired cert failure on cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud - 14https://phabricator.wikimedia.org/T361772#9693266 (10taavi) 05Open→03Resolved [16:45:03] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account Jaumeortola - https://phabricator.wikimedia.org/T361957 (10Jaumeortola) 03NEW [16:50:55] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account Jaumeortola - https://phabricator.wikimedia.org/T361957#9693361 (10Jaumeortola) Following the instructions here: https://wikitech.wikimedia.org/wiki/Password_and_2FA_reset#For_users This is... [17:12:58] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: 14Account recovery help needed for Developer account Jaumeortola - 14https://phabricator.wikimedia.org/T361957#9693380 (10bd808) 05Open→03Resolved a:03bd808 14> I don't have 2FA enabled, and I need it to request a Toolforge membe... [17:18:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudbackup1001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [17:19:52] 06cloud-services-team, 10Data-Services: 14Toolforge view for blocks is very slow - 14https://phabricator.wikimedia.org/T361945#9693425 (10bd808) 14@taavi, do you have any theory about how/why @tstarling's actions in {T355034} weren't sufficient? >>! In T355034#9686749, @Stashbot wrote: > {nav icon=file,... [17:20:15] 06cloud-services-team, 10Data-Services: 14Toolforge view for blocks is very slow - 14https://phabricator.wikimedia.org/T361945#9693429 (10bd808) [17:22:19] 06cloud-services-team, 10Data-Services: 14Toolforge view for blocks is very slow - 14https://phabricator.wikimedia.org/T361945#9693431 (10taavi) 14`maintain-views` and `maintain-replica-indexes` are two separate script. https://gerrit.wikimedia.org/r/1016066 changed the configuration for both scripts, but... [17:25:06] (ProbeDown) firing: (2) Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:30:06] (ProbeDown) resolved: (4) Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:25:18] (PuppetConstantChange) resolved: Puppet performing a change on every puppet run on cloudbackup1001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:45:00] (03PS1) 10Andrew Bogott: openstack: move a bunch of codfw1dev passwords from 'codfw' to 'common' [labs/private] - 10https://gerrit.wikimedia.org/r/1017356 (https://phabricator.wikimedia.org/T358855) [19:45:21] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] openstack: move a bunch of codfw1dev passwords from 'codfw' to 'common' [labs/private] - 10https://gerrit.wikimedia.org/r/1017356 (https://phabricator.wikimedia.org/T358855) (owner: 10Andrew Bogott) [19:47:47] 10VPS-project-Codesearch, 06collaboration-services: Graduate codesearch to production - https://phabricator.wikimedia.org/T268199#9693785 (10bd808) >>! In T268199#6635059, @Ladsgroup wrote: > given that we would eventually migrate to gitlab GitLab CE does not have **//any//** cross project code search capabil... [19:58:11] 10VPS-project-Codesearch, 13Patch-For-Review: Let codesearch-frontend reques to local Hound instances directly - https://phabricator.wikimedia.org/T361899#9693932 (10Dzahn) after the changes above were deployed, access to port 3002 is now allowed from 172.17.0.0/16 ` root@codesearch8:/# iptables -L | grep 300... [19:59:04] (03PS1) 10Krinkle: frontend: In prod mode, skip sleep, don't skip chunking [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017357 [20:06:53] (03CR) 10Krinkle: [C:03+2] frontend: In prod mode, skip sleep, don't skip chunking [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017357 (owner: 10Krinkle) [20:07:44] (03Merged) 10jenkins-bot: frontend: In prod mode, skip sleep, don't skip chunking [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017357 (owner: 10Krinkle) [20:09:44] 10VPS-project-Codesearch, 13Patch-For-Review: 14Let codesearch-frontend reques to local Hound instances directly - 14https://phabricator.wikimedia.org/T361899#9693967 (10Krinkle) 05Open→03Resolved p:05Triage→03Medium a:03Krinkle [20:24:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:24:37] (03PS1) 10Krinkle: frontend: Fix undefined backendLabel on 'excludes' and 'repos' views [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017371 [20:24:37] (03PS1) 10Krinkle: frontend: Distinguish between internal and public Hound base [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017372 (https://phabricator.wikimedia.org/T361899) [20:25:33] (03CR) 10CI reject: [V:04-1] frontend: Fix undefined backendLabel on 'excludes' and 'repos' views [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017371 (owner: 10Krinkle) [20:26:40] (03CR) 10Krinkle: [C:03+2] frontend: Fix undefined backendLabel on 'excludes' and 'repos' views [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017371 (owner: 10Krinkle) [20:26:44] (03CR) 10Krinkle: [C:03+2] frontend: Distinguish between internal and public Hound base [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017372 (https://phabricator.wikimedia.org/T361899) (owner: 10Krinkle) [20:27:44] (03Merged) 10jenkins-bot: frontend: Fix undefined backendLabel on 'excludes' and 'repos' views [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017371 (owner: 10Krinkle) [20:27:45] (03Merged) 10jenkins-bot: frontend: Distinguish between internal and public Hound base [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017372 (https://phabricator.wikimedia.org/T361899) (owner: 10Krinkle) [20:39:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:52:29] (03PS1) 10Krinkle: frontend: Add robots.txt to discourage crawling beyond landing pages [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017377 [21:16:28] (03PS1) 10Jforrester: releases: Upgrade jsdoc-wmf-theme to 0.0.13 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1017380 [21:16:36] (03CR) 10Jforrester: [C:03+2] releases: Upgrade jsdoc-wmf-theme to 0.0.13 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1017380 (owner: 10Jforrester) [21:17:27] (03Merged) 10jenkins-bot: releases: Upgrade jsdoc-wmf-theme to 0.0.13 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1017380 (owner: 10Jforrester) [21:47:52] (03PS1) 10Krinkle: app.py: remove favicon and open_search.xml from old UI [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017383 [21:47:52] (03PS1) 10Krinkle: frontend: Implement /_health [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017384 [21:56:09] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9694270 (10bd808) test [22:15:57] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9694312 (10bd808) test [22:16:19] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [22:51:32] (03CR) 10BryanDavis: [C:04-2] "test" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [23:21:36] (03CR) 10Krinkle: [C:03+2] frontend: Add robots.txt to discourage crawling beyond landing pages [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017377 (owner: 10Krinkle) [23:21:55] (03CR) 10Krinkle: [C:03+2] app.py: remove favicon and open_search.xml from old UI [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017383 (owner: 10Krinkle) [23:22:34] (03Merged) 10jenkins-bot: frontend: Add robots.txt to discourage crawling beyond landing pages [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017377 (owner: 10Krinkle) [23:22:49] (03Merged) 10jenkins-bot: app.py: remove favicon and open_search.xml from old UI [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1017383 (owner: 10Krinkle)