[00:04:03] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:09:03] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:10:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:15:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:20:27] PROBLEM - Check unit status of backup_cinder_volumes on cloudbackup2001 is CRITICAL: CRITICAL: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:06:15] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:33:15] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [04:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:47:00] 10Grid-Engine-to-K8s-Migration: Migrate wikitasks from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320168 (10komla) @Vort can we hear from you as to whether T292289 resolves your issue [04:48:47] 10Grid-Engine-to-K8s-Migration, 10Pywikibot: Migrate pywikibot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319981 (10komla) @Dvorapa pywikibot should work. have you given it another try? [05:03:30] 10Grid-Engine-to-K8s-Migration: Migrate steve-adder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320062 (10komla) Can we say this is resolved? [05:05:17] 10Grid-Engine-to-K8s-Migration: Migrate wd-flaw-finder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320138 (10komla) > Your webservice of type python3.9 is running on backend kubernetes > ` It means your tool is running on kubernetes. You can mark this ticket as 'reso... [07:06:15] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:19:33] 10Grid-Engine-to-K8s-Migration: Migrate wikitasks from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320168 (10Vort) >>! In T320168#9307499, @komla wrote: > @Vort can we hear from you as to whether T292289 resolves your issue No. I made T295220 instead and it there was no sol... [07:33:15] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [07:54:10] (03PS1) 10VolkerE: build, styles: Replace WikimediaUI Base with Codex design tokens [labs/tools/meetingtimes] - 10https://gerrit.wikimedia.org/r/971743 [07:54:49] (03CR) 10CI reject: [V: 04-1] build, styles: Replace WikimediaUI Base with Codex design tokens [labs/tools/meetingtimes] - 10https://gerrit.wikimedia.org/r/971743 (owner: 10VolkerE) [08:43:18] 10Tool-bub2: Use .jsx for files that containt JSX syntax - https://phabricator.wikimedia.org/T348505 (10Aklapper) @SamMintah: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task, or [do you need any help](https://www.mediawiki.org/wiki/New_D... [08:43:23] 10Tool-bub2: Bulk Upload Functionality - https://phabricator.wikimedia.org/T344118 (10Aklapper) @Akanksha.t05: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task, or [do you need any help](https://www.mediawiki.org/wiki/New_Developers/Commu... [09:38:18] 10Tool-bub2: Bulk Upload Functionality - https://phabricator.wikimedia.org/T344118 (10Akanksha.t05) >>! In T344118#9307700, @Aklapper wrote: > @Akanksha.t05: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task, or [do you need any help](http... [09:50:38] 10Data-Services, 10DBA: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10Zabe) wiki has been created [09:50:42] 10Data-Services, 10DBA: Prepare and check storage layer for bjnwikiquote - https://phabricator.wikimedia.org/T350234 (10Zabe) wiki has been created [09:50:48] 10Data-Services, 10DBA: Prepare and check storage layer for dgawiki - https://phabricator.wikimedia.org/T350228 (10Zabe) wiki has been created [09:50:54] 10Data-Services, 10DBA: Prepare and check storage layer for bbcwiki - https://phabricator.wikimedia.org/T350372 (10Zabe) wiki has been created [10:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [10:21:46] (03PS1) 10Aklapper: Set code repository URI when creating project tags [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) [10:22:46] (03CR) 10Aklapper: "Note: Untested. Plus I am clueless in Python." [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [10:24:10] 10Striker, 10Patch-For-Review: Set code repository URI when creating project tags - https://phabricator.wikimedia.org/T320915 (10Aklapper) p:05Triage→03Low a:03Aklapper [10:24:57] (03CR) 10CI reject: [V: 04-1] Set code repository URI when creating project tags [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [10:26:02] (03CR) 10Aklapper: "Guess I need to concatenate the URI outside of the API call as the apostrophes would make the JSON invalid" [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [10:35:39] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use API:EmailUser to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10Maryann-Onyinye) [10:35:46] 10Tool-bub2, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Use API:EmailUser to send Emails to the users - https://phabricator.wikimedia.org/T338267 (10Maryann-Onyinye) [11:08:11] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [11:34:41] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [12:03:04] 10Tools, 10MediaViewer, 10Thumbor: Explore moving the Panoviewer gadget/Toolforge tool into production - https://phabricator.wikimedia.org/T138933 (10TheDJ) I've updated my Panoviewer extension a bit https://www.mediawiki.org/wiki/User:TheDJ/panoviewer It's now self contained. Still needs a bit of cleanup b... [12:11:02] 10PAWS: New upstream release 8.5.0 for Pywikibot - https://phabricator.wikimedia.org/T350552 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/345 [12:11:06] vivian-rook opened https://github.com/toolforge/paws/pull/345 [13:14:07] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032 (10SD0001) The gunicorn migration sounds like an unlikely culprit, since it's the db connections referenced here - which are managed by pymysql in any case. [13:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:01:37] (CephSlowOps) firing: Ceph cluster in eqiad has 63 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [14:01:44] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349502 (10phaultfinder) [14:06:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 25 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [14:24:14] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): cinder volumes stuck in detaching, deleting states - https://phabricator.wikimedia.org/T350586 (10jsn.sherman) [14:24:18] 10Quarry: Deploy magnum cluster for quarry - https://phabricator.wikimedia.org/T349032 (10rook) >>! In T349032#9308383, @SD0001 wrote: > The gunicorn migration sounds like an unlikely culprit, since it's the db connections referenced here - which are managed by pymysql in any case. Did I install the db correctl... [14:24:32] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): cinder volumes stuck in detaching, deleting states - https://phabricator.wikimedia.org/T350586 (10jsn.sherman) [14:25:25] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bookworm [14:29:56] 10cloud-services-team (FY2023/2024-Q1), 10Infrastructure-Foundations, 10Packaging: wmfbackups packages for Debian Bookworm - https://phabricator.wikimedia.org/T347740 (10fnegri) Thanks @jcrespo, I have just installed the new packages in our dev cluster (`cloudservices200[45]-dev.codfw.wmnet`) and I will inst... [14:30:06] 10cloud-services-team (FY2023/2024-Q1), 10Infrastructure-Foundations, 10Packaging: wmfbackups packages for Debian Bookworm - https://phabricator.wikimedia.org/T347740 (10fnegri) 05In progress→03Resolved [14:30:12] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade codfw hosts to bookworm - https://phabricator.wikimedia.org/T345810 (10fnegri) [15:00:39] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bookworm completed: - cloudnet1005 (**PA... [15:08:11] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:16:22] 10Toolforge, 10cloud-services-team: Weird error HTTP 405 Method Not Allowed on Toolforge - https://phabricator.wikimedia.org/T349452 (10Albertoleoncio) [15:17:12] 10Toolforge, 10cloud-services-team: Weird error HTTP 405 Method Not Allowed on Toolforge - https://phabricator.wikimedia.org/T349452 (10taavi) 05Open→03Resolved > Well... the error continues, but now it's a little different: This is an error with the tool, not with the Toolforge infrastructure. [15:26:23] PROBLEM - Host cloudcephosd1032 is DOWN: PING CRITICAL - Packet loss = 100% [15:28:02] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10fnegri) [15:30:03] PROBLEM - Host cloudcephosd1033 is DOWN: PING CRITICAL - Packet loss = 100% [15:30:55] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bookworm [15:32:59] RECOVERY - Host cloudcephosd1032 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [15:33:11] PROBLEM - Host cloudcephosd1034 is DOWN: PING CRITICAL - Packet loss = 100% [15:36:45] 10Toolforge: Re-visit Toolforge Kubernetes default quotas (April 2023) - https://phabricator.wikimedia.org/T333979 (10taavi) a:03taavi [15:37:05] RECOVERY - Host cloudcephosd1033 is UP: PING OK - Packet loss = 0%, RTA = 0.89 ms [15:38:11] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:40:37] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Jclark-ctr) [15:40:59] RECOVERY - Host cloudcephosd1034 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [15:42:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (348643) [15:43:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) (348643) [15:43:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (348643) [15:44:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) (348643) [15:46:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (348643) [15:47:13] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) (348643) [15:52:12] 10Data-Services, 10DBA: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10Marostegui) Sanitizing [15:52:20] 10Data-Services, 10DBA: Prepare and check storage layer for bjnwikiquote - https://phabricator.wikimedia.org/T350234 (10Marostegui) Sanitizing [15:52:23] 10Data-Services, 10DBA: Prepare and check storage layer for dgawiki - https://phabricator.wikimedia.org/T350228 (10Marostegui) Sanitizing [15:52:27] 10Data-Services, 10DBA: Prepare and check storage layer for bbcwiki - https://phabricator.wikimedia.org/T350372 (10Marostegui) Sanitizing [15:56:44] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Patch-For-Review: [openstack] prometheus exporter broken in bookworm - https://phabricator.wikimedia.org/T350154 (10fnegri) The prometheus exporter is now running in codfw on `cloudcontrol2005-dev`: ` root@cloudcontrol2005-dev:~# systemctl status promet... [15:56:53] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10fnegri) [15:56:58] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Patch-For-Review: [openstack] prometheus exporter broken in bookworm - https://phabricator.wikimedia.org/T350154 (10fnegri) 05In progress→03Resolved [16:00:49] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10fnegri) [16:00:59] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Network tests are failing in eqiad - https://phabricator.wikimedia.org/T350466 (10fnegri) 05Resolved→03Invalid [16:01:01] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10fnegri) [16:01:05] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Network tests are failing in eqiad - https://phabricator.wikimedia.org/T350466 (10fnegri) 05Invalid→03Resolved [16:03:05] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bookworm completed: - cloudrabbit10... [16:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [16:24:37] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:28:31] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10Marostegui) All done, ready for the views creation. [16:28:40] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for bjnwikiquote - https://phabricator.wikimedia.org/T350234 (10Marostegui) All done, ready for the views creation. [16:28:46] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for dgawiki - https://phabricator.wikimedia.org/T350228 (10Marostegui) All done, ready for the views creation. [16:28:54] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for bbcwiki - https://phabricator.wikimedia.org/T350372 (10Marostegui) All done, ready for the views creation. [16:29:37] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:33:34] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10BTullis) The views were generated, but the cookbook failed when attempting to run the DNS step. ` ----- OUTPUT of 'source /root/nov...ca-dns --aliases' -----... [16:33:37] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:33:46] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10BTullis) a:03BTullis [16:37:32] 10Data-Services, 10DBA, 10Data-Platform-SRE: Prepare and check storage layer for zghwiki - https://phabricator.wikimedia.org/T350240 (10Marostegui) Confirmed that I can query the view just fine and see all the rows. So it might be indeed just related to the DNS and not affecting the data underneath. Once it... [16:43:37] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:47:15] 10VPS-project-Wikistats: Add bjnwikiquote to wikistats - https://phabricator.wikimedia.org/T350239 (10Dzahn) ` MariaDB [wikistats]> insert into wikiquotes (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix='bjn'; ` [16:47:20] (03PS2) 10VolkerE: build, styles: Replace WikimediaUI Base with Codex design tokens [labs/tools/meetingtimes] - 10https://gerrit.wikimedia.org/r/971743 [16:47:57] 10VPS-project-Wikistats: Add bjnwikiquote to wikistats - https://phabricator.wikimedia.org/T350239 (10Dzahn) 05Open→03Resolved ` /usr/lib/wikistats/update.php wq prefix bjn sent query: 'select * from wikiquotes where prefix="bjn"'. A(1/96) - bjn.wikiquote.org - calling API: https://bjn.wikiquote.org/w/api.p... [16:50:17] 10VPS-project-Wikistats: Add dgawiki to wikistats - https://phabricator.wikimedia.org/T350233 (10Dzahn) 05Open→03Resolved ` MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("dga", "Dagaare", "Dagaaba", 8); ` ` /usr/lib/wikistats/update.php wp prefix dga sent query: 'sele... [16:53:01] 10VPS-project-Wikistats: Add bbcwiki to wikistats - https://phabricator.wikimedia.org/T350377 (10Dzahn) 05Open→03Resolved ` MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("bbc", "Toba Batak", "ᯅᯖᯂ᯲ ᯖᯬᯅ / ... [16:53:56] (03CR) 10CI reject: [V: 04-1] build, styles: Replace WikimediaUI Base with Codex design tokens [labs/tools/meetingtimes] - 10https://gerrit.wikimedia.org/r/971743 (owner: 10VolkerE) [16:59:53] (03PS1) 10EoghanGaffney: [apt-staging] Add fake secrets fro rsyncd secrets [labs/private] - 10https://gerrit.wikimedia.org/r/971993 [17:01:51] (03CR) 10EoghanGaffney: [V: 03+2 C: 03+2] [apt-staging] Add fake secrets fro rsyncd secrets [labs/private] - 10https://gerrit.wikimedia.org/r/971993 (owner: 10EoghanGaffney) [17:03:22] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm [17:03:27] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bookworm [17:10:23] 10PAWS: New upstream release 8.5.0 for Pywikibot - https://phabricator.wikimedia.org/T350552 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/345 [17:10:28] vivian-rook closed https://github.com/toolforge/paws/pull/345 [17:10:36] 10PAWS: New upstream release 8.5.0 for Pywikibot - https://phabricator.wikimedia.org/T350552 (10rook) 05Open→03Resolved a:03rook [17:22:26] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:35:56] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bookworm completed: - cloudrabbit10... [17:37:23] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10Andrew) {F41459130} [17:37:27] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:54:48] 10Grid-Engine-to-K8s-Migration: Migrate wd-flaw-finder from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320138 (10tidoni_t) 05Open→03Resolved [18:01:01] (03PS2) 10Aklapper: Set code repository URI when creating project tags [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) [18:04:01] (03CR) 10CI reject: [V: 04-1] Set code repository URI when creating project tags [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [18:07:40] (03CR) 10BryanDavis: [C: 04-1] Set code repository URI when creating project tags (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [18:12:28] PROBLEM - Check systemd state on cloudrabbit1002 is CRITICAL: CRITICAL - degraded: The following units failed: rabbitmq_detect_partition.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:14:48] RECOVERY - Check systemd state on cloudrabbit1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:16:29] (03CR) 10BryanDavis: [C: 04-1] Set code repository URI when creating project tags (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/971912 (https://phabricator.wikimedia.org/T320915) (owner: 10Aklapper) [18:18:27] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm executed with errors: - cl... [18:19:17] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm [18:19:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:19:44] PROBLEM - Check systemd state on cloudrabbit1002 is CRITICAL: CRITICAL - degraded: The following units failed: rabbitmq_detect_partition.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:20:58] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [18:22:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:22:26] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:23:34] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [18:24:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:25:56] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [18:27:23] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:33:17] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:50:10] RECOVERY - Check systemd state on cloudrabbit1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:54:26] PROBLEM - Check systemd state on cloudrabbit1002 is CRITICAL: CRITICAL - degraded: The following units failed: rabbitmq_detect_partition.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:55:07] 10VPS-project-Wikistats, 10collaboration-services, 10User-RhinosF1: Move from wikistats-bullseye to wikistats-bookworm - https://phabricator.wikimedia.org/T342813 (10Dzahn) Thank you for this !:)) [18:58:32] 10VPS-project-Wikistats: wikistats: mediawiki version shown incorrectly in "largest wikis" listing - https://phabricator.wikimedia.org/T317241 (10Dzahn) Oh, I forgot about this, thanks RhinosF1. Someone needs to upload the change (to gitlab). [18:58:51] 10VPS-project-Wikistats, 10patch-welcome: wikisite table - status of updates - https://phabricator.wikimedia.org/T111592 (10Dzahn) Maybe we should make one last attempt to contact them directly and otherwise drop the table. [19:01:50] RECOVERY - Check unit status of backup_cinder_volumes on cloudbackup2001 is OK: OK: Status of the systemd unit backup_cinder_volumes https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:02:35] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [envvars-api] avoid invalidating go mod download cache on each code change - https://phabricator.wikimedia.org/T350193 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/18 [envvars-ap... [19:08:11] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:30:38] 10Tools: 'wikitanvirbot' tool missing pywikibot config - https://phabricator.wikimedia.org/T349916 (10Wikitanvir) Thanks for notifying @taavi. I'm on it. [19:38:11] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [19:44:01] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm executed with errors: - cl... [19:44:58] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm [19:53:31] 10PAWS: Is PAWS culler workng? - https://phabricator.wikimedia.org/T345838 (10rook) This seems resolved with T349545 [19:53:38] 10PAWS: Is PAWS culler workng? - https://phabricator.wikimedia.org/T345838 (10rook) 05In progress→03Resolved [19:54:08] 10PAWS: update build-and-push action - https://phabricator.wikimedia.org/T348874 (10rook) 05Open→03Resolved [19:55:12] 10PAWS: Reduce memory request for singleuser - https://phabricator.wikimedia.org/T345467 (10rook) 05Open→03Resolved [19:57:44] RECOVERY - Check systemd state on cloudrabbit1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:58:44] (03CR) 10Mess: "This change is ready for review." [labs/tools/lists] - 10https://gerrit.wikimedia.org/r/972028 (owner: 10Mess) [19:59:54] (03CR) 10Mess: "It's OK" [labs/tools/lists] - 10https://gerrit.wikimedia.org/r/972028 (owner: 10Mess) [20:04:04] (03CR) 10Mess: "This change is ready for review." [labs/tools/lists] - 10https://gerrit.wikimedia.org/r/972028 (owner: 10Mess) [20:04:41] 10PAWS: PAWS shell - lack of i18n submodule or files or an outdated submodule - https://phabricator.wikimedia.org/T343676 (10rook) @Info-farmer thank you for your response. To verify is this still happening in paws? [20:07:23] 10PAWS: [bug] PAWS has problems loading - https://phabricator.wikimedia.org/T343054 (10rook) Sounds like this might have been resolved. Please re-open if not found to be the case. [20:07:33] 10PAWS: [bug] PAWS has problems loading - https://phabricator.wikimedia.org/T343054 (10rook) 05Open→03Resolved [20:21:40] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bookworm completed: - cloudrabbit10... [20:26:19] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: move from single script to multi-script approach in maintain-harbor - https://phabricator.wikimedia.org/T350410 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_req... [20:27:40] 10Toolforge (Toolforge iteration 02), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10Patch-For-Review, 10User-dcaro: [builds-api] catch harbor timeout when creating repository - https://phabricator.wikimedia.org/T345903 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.o... [20:36:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [20:38:48] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10Andrew) All rabbit nodes (cloudrabbit100[123]) are now running Bullseye and are clustered properly. [20:40:03] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 (10Andrew) [20:40:52] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [21:24:20] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew) P53144 [21:38:16] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Andrew) No new errors reported since the 1st. I'm not clear on if that means we've fixed the problem or not; there a... [22:14:24] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [22:22:27] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:08:12] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:39:41] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange