[00:06:49] 10Wikibugs, 15User-bd808: GitLab CI tests fail for MRs from forks because of missing secrets - https://phabricator.wikimedia.org/T358775#9594666 (10bd808) 05In progress→03Resolved >>! In T358775#9592144, @bd808 wrote: > Likely this will be easier once I land my still in progress refactor of doom® MR. /me g... [00:09:50] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:11:49] 10Wikibugs, 15User-bd808: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729#9594671 (10bd808) 05Open→03Stalled Marking as stalled while we wait to see how the bot+znc combo behaves. I'm setting a calendar reminder for myself to check back o... [00:11:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:13:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:16:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:21:17] 10Data-Services: Make Dispenser's principle_links table accessible in new Wiki replica cluster - https://phabricator.wikimedia.org/T180636#9594678 (10bd808) [00:21:30] 10Data-Services, 14cloud-services-team (FY2017-18), 06DBA, 05Goal, and 2 others: Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#9594679 (10bd808) [00:22:00] 10Data-Services, 06cloud-services-team, 06Data-Engineering-Icebox: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#9594675 (10bd808) 05Open→03Declined With no useful activity, including activism for implementation, in 7+ years let'... [00:28:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:34:17] 10Wikibugs: Store PhorgeFeedReader.poll_last_seen_chrono_key in Redis - https://phabricator.wikimedia.org/T359009 (10bd808) [00:40:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:45:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:19:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:24:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:47:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:52:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:40:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-sgegrid-shadow in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [02:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 17h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [03:06:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:11:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:03:51] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9594743 (10Andrew) >>! In T358427#9591405, @fnegri wrote: > @0xDeadbeef unfortunately we don't currently support Redis as a database type in Trove. @Andrew do you think it's something we cou... [04:39:27] (03CR) 10Eugene233: "recheck" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1006842 (owner: 10Ketulucas) [04:40:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:42:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:47:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:50:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:40:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-sgegrid-shadow in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [05:40:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 14h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [05:50:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:50:56] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:51:11] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:40:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:45:41] (CloudVPSDesignateLeaks) firing: (5) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:51:01] (03CR) 10Eugene233: "recheck" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1006865 (owner: 10Ketulucas) [06:55:41] (CloudVPSDesignateLeaks) firing: (5) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:00:41] (CloudVPSDesignateLeaks) firing: (5) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:37:37] 10Tools, 10Wikidata, 06Wikidata Dev Team, 10wmde-wikidata-tech: [GENERAL] Deprecate connecting senses prototype - https://phabricator.wikimedia.org/T351829#9594951 (10Michael) [07:50:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:36:18] 10Wikibugs: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#9594988 (10taavi) test [08:38:56] (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 11h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [08:43:56] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:45:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-sgegrid-shadow in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [10:53:34] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review, 15User-aborrero: Deploy OVS test setup in codfw1dev - https://phabricator.wikimedia.org/T358761#9595368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudnet2007-dev.codfw.wmne... [10:53:56] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review, 15User-aborrero: Deploy OVS test setup in codfw1dev - https://phabricator.wikimedia.org/T358761#9595369 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudnet2008-dev.codfw.wmne... [11:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 8h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [11:43:02] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review, 15User-aborrero: Deploy OVS test setup in codfw1dev - https://phabricator.wikimedia.org/T358761#9595500 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudnet2008-dev.codfw.wmnet wi... [11:43:15] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review, 15User-aborrero: Deploy OVS test setup in codfw1dev - https://phabricator.wikimedia.org/T358761#9595501 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudnet2007-dev.codfw.wmnet wi... [11:45:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-sgegrid-shadow in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:47:34] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.307% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:10:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:11:52] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595629 (10dcaro) I have started a PR with several improvements. https://github.com/Saisengen/wikibots/pull/2 Things it does now: * Gathers all static files from `web-servic... [12:15:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:15:53] (03CR) 10Brouberol: [C: 03+1] "Thanks!" [labs/private] - 10https://gerrit.wikimedia.org/r/1008408 (owner: 10Muehlenhoff) [12:17:34] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 5.933% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:18:25] (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Fix location of dummy keytab for an-airflow1007 [labs/private] - 10https://gerrit.wikimedia.org/r/1008408 (owner: 10Muehlenhoff) [12:25:25] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1008433 (owner: 10L10n-bot) [12:25:30] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1008436 (owner: 10L10n-bot) [12:35:58] 10tool-wdlocator, 06translatewiki.net, 10Language-Team (Language-2024-January-March), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9595698 (10Nikerabbit) https://translatewiki.net/wiki/Group_descriptions h... [12:36:17] 10Cloud-VPS, 06cloud-services-team: cloudcephosd1017 /dev/sdg (osd.132) failed - https://phabricator.wikimedia.org/T358945#9595696 (10dcaro) 05Open→03In progress a:03dcaro [12:39:44] 10tool-wdlocator, 06translatewiki.net, 10Language-Team (Language-2024-January-March), 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9595701 (10Nikerabbit) `wdlocator` is not included in the `group` spec in... [12:41:00] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1007600 (owner: 10L10n-bot) [12:43:49] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1007599 (owner: 10L10n-bot) [12:44:19] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595715 (10MBH) Thank you very much. In past, all of my bots was based on DNWB, I have rewritten them to not use DNWB one or two years ago. In cluster analysis scripts, only f... [12:45:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tools-sgegrid-shadow in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:45:34] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10SRE-tools, 10Spicerack, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9595718 (10fnegri) @bking thanks for having a look! No rush really, I was... [12:46:22] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1008433 (owner: 10L10n-bot) [12:46:44] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1008436 (owner: 10L10n-bot) [12:48:20] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049 (10dcaro) [12:48:43] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.reboot_node (T359049) [12:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:48:51] T359049: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049 [12:53:04] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9595748 (10fnegri) > Best bet is probably for the project to run its own redis and make some kind of periodic backup dump Agreed. @0xDeadbeef I wonder if you prefer having a separate Cloud... [12:53:07] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595750 (10MBH) > It also adds an entry point for each built binary in the procfile, so it's easy to use for jobs too: toolforge jobs run --command "my-script" myjobname I'm s... [12:54:21] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595760 (10dcaro) >>! In T319883#9595715, @MBH wrote: > Thank you very much. In past, all of my bots was based on DNWB, I have rewritten them to not use DNWB one or two years... [12:54:43] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.reboot_node (exit_code=0) (T359049) [12:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:54:50] T359049: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049 [12:55:10] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [12:56:55] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049#9595776 (10dcaro) After rebooting the hard drive came online, will try to repartition and see if it keeps failing [12:58:18] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9595777 (100xDeadbeef) Bundling would actually be nice. It would probably help with the bus factor since Legoktm (or any future collaborators) can access both if things happen. We can rename... [13:00:10] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [13:02:11] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595788 (10MBH) Okay, and how to set this envvar, could I do this from console after `become mbh`? [13:03:13] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049#9595791 (10dcaro) Wait no, the hard drive did not show up anymore (just the sdX letters got re-shuffled). The drive is not appearin... [13:04:42] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049#9595792 (10dcaro) [13:06:40] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595811 (10dcaro) >>! In T319883#9595788, @MBH wrote: > Okay, and how to set this envvar, could I do this from console after `become mbh`? Yes, I added a note in the readme:... [13:07:20] 10Toolforge Build Service, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9595808 (10user_123) Is this task still open [13:08:25] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049#9595817 (10dcaro) [13:09:06] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595818 (10MBH) Thank you. I'll ask again: where exactly should I execute or enter this command `dotnet sln add web-service//.csproj` , when I cre... [13:09:26] (SystemdUnitDown) resolved: The systemd unit ceph-osd@132.service on node cloudcephosd1017 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1017 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:11:38] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595825 (10dcaro) >>! In T319883#9595818, @MBH wrote: > Thank you. I'll ask again: where exactly should I execute or enter this command `dotnet sln add web-service/ 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595836 (10MBH) I don't understand you. My development environment is a Visual Studio 2022 on my Windows PC desktop. I doesn't execute any console commands when I creating my... [13:18:56] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595840 (10dcaro) >>! In T319883#9595836, @MBH wrote: > I don't understand you. My development environment is a Visual Studio 2022 on my Windows PC desktop. I doesn't execute... [13:18:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:20:21] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9595844 (10dcaro) >>! In T319883#9595836, @MBH wrote: > I don't understand you. My development environment is a Visual Studio 2022 on my Windows PC desktop. I doesn't execute... [13:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:23:56] (ProbeDown) resolved: (3) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:24:45] 10Cloud-VPS, 06cloud-services-team, 06DC-Ops: hw troubleshooting: /dev/sdg disk not working properly in cloudcephosd1017.eqiad.wmnet - https://phabricator.wikimedia.org/T359049#9595858 (10dcaro) [13:25:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:29:13] 10Cloud-VPS, 06cloud-services-team: cloudcephosd1017 /dev/sdg (osd.132) failed - https://phabricator.wikimedia.org/T358945#9595869 (10dcaro) I manually destroyed the osd: ` ceph osd destroy 132 ` So it will need to be recreated when the disk is fixed (or the new one arrives): ` ceph-volume lvm zap /dev/sdX ce... [13:43:53] 10Toolforge, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9595931 (10dcaro) >>! In T350691#9595808, @user_123 wrote: > Is this task still open Yes, still in the backlog [13:44:27] 10Toolforge, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9595932 (10dcaro) p:05Triage→03Low [13:48:59] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9595939 (10dcaro) >>! In T356905#9592014, @Dvorapa wrote: > I see, thank you for the explanation. So if the /healthz endpoint is set correctly and probe... [13:50:04] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9595948 (10dcaro) [13:51:06] 10Toolforge (Toolforge iteration 06), 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9595945 (10dcaro) 05In progress→03Resolved Things seem stable, I'll close the task and reopen if any bugs arise. [13:55:12] 10Toolforge (Toolforge iteration 06): [harbor, maintain-harbor] We seem to be cleaning up image tags that should not be cleaned up for the toolforge project - https://phabricator.wikimedia.org/T359052 (10dcaro) 05Open→03In progress p:05Triage→03High [13:57:36] 10PAWS: Add wikibase-cli to paws - https://phabricator.wikimedia.org/T358649#9595982 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/381 [13:57:49] vivian-rook closed https://github.com/toolforge/paws/pull/381 [13:58:42] 10PAWS: Add wikibase-cli to paws - https://phabricator.wikimedia.org/T358649#9595984 (10rook) 05Open→03Resolved a:03rook [14:04:51] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10SRE-tools, 10Spicerack, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9596015 (10bking) > @bking what if we release spicerack with the change... [14:13:13] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9596044 (10fnegri) I think renaming a project is complicated, I will create a new project `discordbots` and delete the existing `loggerdiscordbot` project. Once the new project is created, y... [14:15:09] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9596046 (10fnegri) [14:15:33] 10Cloud-VPS (Project-requests): Request creation of wikiauthbot-ng VPS project - https://phabricator.wikimedia.org/T358427#9596050 (10fnegri) [14:16:19] 10Cloud-VPS (Project-requests), 06cloud-services-team: Request creation of discordbots VPS project - https://phabricator.wikimedia.org/T358427#9596055 (10fnegri) [14:16:38] 10Cloud-VPS (Project-requests), 06cloud-services-team: Request creation of discordbots VPS project - https://phabricator.wikimedia.org/T358427#9596051 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [14:23:10] 10Toolforge (Toolforge iteration 06), 10cloud-services-team (FY2023/2024-Q3-Q4), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 15User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9596082 (10fnegri) My vote goes for... [14:27:36] !log fnegri@cloudcumin1001 discordbots START - Cookbook wmcs.vps.create_project for project discordbots in eqiad1 (T358427) [14:27:39] fnegri@cloudcumin1001: Unknown project "discordbots" [14:27:39] T358427: Request creation of discordbots VPS project - https://phabricator.wikimedia.org/T358427 [14:28:12] !log fnegri@cloudcumin1001 discordbots END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project discordbots in eqiad1 (T358427) [14:28:12] fnegri@cloudcumin1001: Unknown project "discordbots" [14:29:34] !log fnegri@cloudcumin1001 discordbots START - Cookbook wmcs.vps.add_user_to_project for user 'dbeef' in role 'member' (T358427) [14:29:34] fnegri@cloudcumin1001: Unknown project "discordbots" [14:29:40] !log fnegri@cloudcumin1001 discordbots END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'dbeef' in role 'member' (T358427) [14:29:40] fnegri@cloudcumin1001: Unknown project "discordbots" [14:30:13] !log fnegri@cloudcumin1001 discordbots START - Cookbook wmcs.vps.add_user_to_project for user 'legoktm' in role 'member' (T358427) [14:30:13] fnegri@cloudcumin1001: Unknown project "discordbots" [14:30:19] !log fnegri@cloudcumin1001 discordbots END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'legoktm' in role 'member' (T358427) [14:30:19] fnegri@cloudcumin1001: Unknown project "discordbots" [14:34:02] 10Toolforge (Toolforge iteration 06), 13Patch-For-Review: [harbor, maintain-harbor] We seem to be cleaning up image tags that should not be cleaned up for the toolforge project - https://phabricator.wikimedia.org/T359052#9596123 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/... [14:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 5h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [14:52:51] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596235 (10MBH) Now I have more than 10 `csproj` files with exactly the same content like https://github.com/Saisengen/wikibots/blob/main/web-services/clusters5/clusters5.cspr... [15:01:03] 10Cloud-VPS (Project-requests), 06cloud-services-team: Request creation of discordbots VPS project - https://phabricator.wikimedia.org/T358427#9596293 (10fnegri) 05In progress→03Resolved The new project `discordbots` was created, and I added @0xDeadbeef and @Legoktm has members. The default quotas should b... [15:03:03] 06cloud-services-team: SystemdUnitDown Unit ceph-osd@132.service on node cloudcephosd1017 has been down for long. - https://phabricator.wikimedia.org/T358925#9596312 (10fnegri) [15:03:07] 10Cloud-VPS, 06cloud-services-team: cloudcephosd1017 /dev/sdg (osd.132) failed - https://phabricator.wikimedia.org/T358945#9596313 (10fnegri) [15:18:00] 10Cloud-VPS, 06cloud-services-team: Maybe decom cloudbackup100[12]-dev - https://phabricator.wikimedia.org/T358855#9596388 (10Andrew) a:03Andrew I think the right thing here is to update these to replicate the behavior of cloudbackup200[12]. I'll have a look at that. [15:22:59] 10Toolforge (Quota-requests): Request increased quota for mbh Toolforge tool - https://phabricator.wikimedia.org/T359061 (10MBH) [15:32:24] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596481 (10MBH) > Here's a guide on using visual studio for the solutions file: https://learn.microsoft.com/en-us/visualstudio/get-started/tutorial-projects-solutions?view=vs-... [15:36:29] 10Toolforge (Quota-requests): Request increased quota for mbh Toolforge tool - https://phabricator.wikimedia.org/T359061#9596513 (10dcaro) +1 [15:37:50] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596516 (10MBH) This is my development environment. Could you explain, WHERE I should type `dotnet sln add web-service//.csproj` here? {F42395781} [15:38:55] 10Cloud-VPS (Quota-requests): Request for more compute and storage for the GLAMS dashboard project - https://phabricator.wikimedia.org/T358477#9596519 (10fnegri) > The DB's disk already got filled up once when it had 500GB. If I'm reading T355138 correctly, the disk filled 500GB only because of a Postgres WAL f... [15:40:04] (03CR) 10Jforrester: [C: 03+2] Add browserslist-config-wikimedia [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1007978 (owner: 10Jforrester) [15:40:42] (03Merged) 10jenkins-bot: Add browserslist-config-wikimedia [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1007978 (owner: 10Jforrester) [15:44:11] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4): [tools.meta] can't delete file inside cache/wikimedia-wikis.dat - https://phabricator.wikimedia.org/T357098#9596551 (10fnegri) [15:53:35] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596602 (10dcaro) >>! In T319883#9596516, @MBH wrote: > This is my development environment. Could you explain, WHERE I should type `dotnet sln add web-service//... [16:09:04] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596750 (10dcaro) >>! In T319883#9596235, @MBH wrote: > Now I have more than 10 `csproj` files with exactly the same content like https://github.com/Saisengen/wikibots/blob/ma... [16:15:09] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9596799 (10MBH) Oh, looks like I understand. I should use a native `.sln` files from projects' folders from my PC, they contains these IDs. But I have many dozens of `sln` fil... [16:36:59] 10Toolforge (Quota-requests), 13Patch-For-Review: Request increased quota for mbh Toolforge tool - https://phabricator.wikimedia.org/T359061#9597014 (10CodeReviewBot) fnegri merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/212 maintain-kubeusers: increase quota for m... [16:46:18] 10Toolforge (Quota-requests), 13Patch-For-Review: Request increased quota for mbh Toolforge tool - https://phabricator.wikimedia.org/T359061#9597069 (10fnegri) 05Open→03Resolved a:03fnegri [16:49:23] 10Cloud-VPS, 06cloud-services-team, 13Patch-For-Review, 15User-aborrero: Improve cloudgw filter between VM instances and cloud-private - https://phabricator.wikimedia.org/T356986#9597096 (10aborrero) just sent an updated patch with a new approach for the firewall, let me know if that would work or you woul... [16:57:11] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [16:57:21] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [17:09:54] 05Grid-Engine-to-K8s-Migration: Migrate huggle from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319797#9597253 (10dcaro) >>! In T319797#9593193, @Petrb wrote: > ` > tools.huggle@tools-sgebastion-10:~$ toolforge webservice --backend=gridengine stop > --mount is only for --bac... [17:26:41] 10Toolforge (Toolforge iteration 06), 10Toolforge Jobs framework: Support job health checks - https://phabricator.wikimedia.org/T335592#9597412 (10Raymond_Ndibe) [17:29:52] 10Toolforge (Toolforge iteration 06), 10Toolforge Jobs framework: Support job health checks - https://phabricator.wikimedia.org/T335592#9597425 (10CodeReviewBot) raymond-ndibe updated https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/63 [jobs-api] support job health checks [17:30:36] 10Toolforge, 07Epic, 15User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9597428 (10Raymond_Ndibe) [17:30:59] 10Toolforge (Toolforge iteration 06), 10Toolforge Jobs framework, 13Patch-For-Review: Support job health checks - https://phabricator.wikimedia.org/T335592#9597427 (10Raymond_Ndibe) 05Open→03In progress [17:31:07] 10Toolforge (Toolforge iteration 06), 10Toolforge Jobs framework, 13Patch-For-Review: Support job health checks - https://phabricator.wikimedia.org/T335592#9597432 (10Raymond_Ndibe) [17:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 27d 2h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:45:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:51:49] 10Toolforge (Quota-requests), 10Wikibugs, 13Patch-For-Review, 15User-bd808: Request increased quota for wikibugs-testing Toolforge tool - https://phabricator.wikimedia.org/T358968#9597516 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/211... [17:55:38] !log bd808@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [17:55:46] !log bd808@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [17:56:06] !log bd808@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [17:56:15] !log bd808@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [18:06:29] 05Grid-Engine-to-K8s-Migration: Migrate huggle from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319797#9597583 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/29 cli: don't require mount for gridengine [18:07:22] 10Toolforge (Quota-requests), 10Wikibugs, 13Patch-For-Review, 15User-bd808: Request increased quota for wikibugs-testing Toolforge tool - https://phabricator.wikimedia.org/T358968#9597581 (10bd808) 05In progress→03Resolved ` starting a run Update quota for tool wikibugs-testing from version '2' to vers... [18:10:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:11:28] 05Grid-Engine-to-K8s-Migration: Migrate huggle from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319797#9597593 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/30 d/changelog: bump to 0.103.4 [18:13:39] 10Toolforge, 07Kubernetes: kubectl is quite slow the “first time” per user account - https://phabricator.wikimedia.org/T358976#9597603 (10LucasWerkmeister) Interestingly, this even seems to be the case right after a `webservice restart`, so it’s probably not (as I first thought it might be) slowness on the sid... [18:14:35] 05Grid-Engine-to-K8s-Migration: Migrate huggle from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319797#9597606 (10dcaro) Thanks for your patience, deployed a fix and tested, can you verify that it works for you now? [18:15:41] (CloudVPSDesignateLeaks) firing: (5) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:29:56] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:34:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:39:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:47:34] 10Wikibugs: Bot does not detect when ssh connection to Gerrit is interrupted - https://phabricator.wikimedia.org/T359096 (10bd808) [18:52:50] 10Wikibugs: Bot does not detect when ssh connection to Gerrit is interrupted - https://phabricator.wikimedia.org/T359096#9597740 (10bd808) When {T335592} is ready, we could have a check for activity in general. Inside the container the bot could touch a file or write a timestamp when it processes an event. The h... [19:01:38] (ProbeDown) firing: Service toolsbeta-test-k8s-haproxy-4:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:02:09] 10Wikibugs: Frequent "Redis listener crashed; restarting in a few seconds." errors logged - https://phabricator.wikimedia.org/T359097 (10bd808) [19:05:23] 10Wikibugs: Frequent "Redis listener crashed; restarting in a few seconds." errors logged - https://phabricator.wikimedia.org/T359097#9597794 (10bd808) * What actually is the `TypeError: a coroutine was expected, got None` exception telling us about the state of the system? * Is connectivity to Redis actually lo... [19:06:38] (ProbeDown) resolved: Service toolsbeta-test-k8s-haproxy-4:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:07:37] 10Wikibugs: Frequent "Redis listener crashed; restarting in a few seconds." errors logged - https://phabricator.wikimedia.org/T359097#9597805 (10bd808) [19:43:12] 05Grid-Engine-to-K8s-Migration: Migrate gergesbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357555#9597962 (10Reedy) [19:44:10] 05Grid-Engine-to-K8s-Migration: Migrate xiplus from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357567#9597967 (10Reedy) [19:44:20] 05Grid-Engine-to-K8s-Migration: Migrate wikiflix from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357566#9597968 (10Reedy) [19:44:29] 05Grid-Engine-to-K8s-Migration: Migrate updatewikiprojectmovies from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357563#9597969 (10Reedy) [19:44:54] 05Grid-Engine-to-K8s-Migration: Migrate addletterboxdfilmidbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357549#9597971 (10Reedy) [19:45:04] 05Grid-Engine-to-K8s-Migration: Migrate aka from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357550#9597972 (10Reedy) [19:45:28] 05Grid-Engine-to-K8s-Migration: Migrate arbclerkbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357551#9597973 (10Reedy) [19:45:37] 05Grid-Engine-to-K8s-Migration: Migrate backup-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357553#9597974 (10Reedy) [19:45:43] 05Grid-Engine-to-K8s-Migration: Migrate ganfilter from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357554#9597975 (10Reedy) [19:46:09] 05Grid-Engine-to-K8s-Migration: Migrate himowd from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357556#9597976 (10Reedy) [19:46:12] 05Grid-Engine-to-K8s-Migration: Migrate pagecounts from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357559#9597977 (10Reedy) [19:46:16] 05Grid-Engine-to-K8s-Migration: Migrate recoin from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357560#9597978 (10Reedy) [19:46:22] 05Grid-Engine-to-K8s-Migration: Migrate spacemedia from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357561#9597980 (10Reedy) [19:46:49] 05Grid-Engine-to-K8s-Migration: Migrate hnatsumi-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357558#9597984 (10Reedy) [19:47:10] 05Grid-Engine-to-K8s-Migration: Migrate unblock-zh-status from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357562#9597987 (10Reedy) [19:47:21] 05Grid-Engine-to-K8s-Migration, 07Chinese-Sites: Migrate zhwiki-perm-qualicheck from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357568#9597988 (10Reedy) [19:50:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:53:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:55:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:58:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:00:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:34:42] 10Toolforge, 06cloud-services-team, 10Capacity Exchange: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9598110 (10Albertoleoncio) [20:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 26d 23h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [20:43:07] 10Toolforge, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9598126 (10user_123) Hi, I would love to work on this. Please can I get a rundown on what is expected? Also, I can't find any link to the tech blog. Thank you! [20:58:48] 10Toolforge, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9598201 (10bd808) >>! In T350691#9598126, @user_123 wrote: > Hi, I would love to work on this. Please can I get a rundown on what is expected? Also, I can't find any link to... [21:02:40] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9598227 (10Dvorapa) 05Open→03Resolved a:03Dvorapa [21:47:30] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598460 (10MusikAnimal) p:05Triage→03Unbreak! I'll be bold and elevate this to UBN. I'm continually getting complaints from the communities and staff alike who say it is important to their daily work. At... [21:49:56] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598472 (10bking) Hello everyone, I apologize for the delayed response. We fixed the cross cluster settings last week, which should have fixed the issue. I'll take another look now. [22:06:54] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598534 (101234qwer1234qwer4) The query in the task description now throws a "500: Internal Server Error" for me. [22:07:04] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598536 (10MusikAnimal) I'm actually seeing it timeout now, so not the same issue as what was reported here, but it still breaks Global Search. ` tools.global-search@tools-sgebastion-10:~$ curl -XGET https:/... [22:08:35] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598547 (101234qwer1234qwer4) >>! In T358541#9598534, @1234qwer1234qwer4 wrote: > The query in the task description now throws a "500: Internal Server Error" for me. On the other hand, the link shared by Kri... [22:14:45] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598581 (10bking) Per IRC conversation with @MusikAnimal , we believe this to be fixed now. Please respond here if this is not the case. Apologies, as this was a missed step during the migration. We believe... [22:16:18] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598582 (10MusikAnimal) 05Open→03Resolved a:03bking Thanks @bking! Resolving. [22:25:16] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 05Goal, 10Puppet (Puppet 7.0): Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9598611 (10Andrew) a:03Andrew [22:28:01] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598631 (10Varnent) Confirming this is working for me - thank you all! :) [23:00:14] 10Toolforge, 10Technical-blog-posts: Publish a blog post about buildservice on the Tech Blog - https://phabricator.wikimedia.org/T350691#9598718 (10user_123) @bd808 Thank you for the response and clarification. [23:13:15] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598737 (10T) Thanks to everyone for the fix. I use it every day, I realized how important it was when it was broken. [23:25:12] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9598745 (10T) I don't know if it's related but there many duplicate items in the results now. [23:42:28] (PuppetCertificateAboutToExpire) firing: Puppet CA certificate Puppet CA: cloudinfra-internal-puppetmaster01.cloudinfra.eqiad.wmflabs is about to expire in 26d 20h 54m 11s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire