[00:00:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [00:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [00:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [00:04:18] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [00:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [00:08:03] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:16:05] 10Toolforge (Toolforge iteration 00): [tbs.build.logs] Show a more user-friendly error message when logs are not ready - https://phabricator.wikimedia.org/T341059 (10Raymond_Ndibe) 05Open→03In progress [00:18:03] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:52] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Allow multi-lingual books to be uploaded to Internet Archive - https://phabricator.wikimedia.org/T346388 (10Okerekechinweotito) @wassan.anmol117 INPUT NEEDED Currently I have reused already present component to impleme... [00:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:24:23] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review, 10User-dcaro: `toolforge build logs`: add follow options - https://phabricator.wikimedia.org/T339922 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/12 [build.logs]: add --follow o... [00:25:13] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs.build.logs] Show a more user-friendly error message when logs are not ready - https://phabricator.wikimedia.org/T341059 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/12 [build... [00:44:05] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): For PDL, download and stream the PDF if available - https://phabricator.wikimedia.org/T348188 (10Ibinaboadiela) a:03Ibinaboadiela [00:49:15] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): For PDL, download and stream the PDF if available - https://phabricator.wikimedia.org/T348188 (10Ibinaboadiela) a:05Ibinaboadiela→03None [02:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:38:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [02:56:27] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review, 10User-Raymond_Ndibe: toolforge build start: default to tailing the build as it progresses with the option of -d/--detached - https://phabricator.wikimedia.org/T340079 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/too... [03:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [03:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [03:04:31] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:29:47] 10Quarry: Quarry exports integers as floats to wikitable - https://phabricator.wikimedia.org/T151106 (10Audiodude) Documenting my investigation (no solution found). With this query against mywiki in dev: ` SELECT @rownum := @rownum + 1 AS rank, page_title FROM (SELECT page_title FROM page) t, (SELECT @rownum... [04:33:41] 10Quarry: Quarry exports integers as floats to wikitable - https://phabricator.wikimedia.org/T151106 (10Audiodude) Another puzzling part is that MariaDB doesn't appear to be returning results as floats. I exposed the mywiki MariaDB in docker and ran this: ` -------------- SELECT @rownum := @rownum + 1 AS rank,... [04:34:45] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10Audiodude) I assume we need some kind of access to the Github repo too? https://github.com/toolforge/quarry [04:56:11] (03PS2) 10Krinkle: Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 [04:56:13] (03CR) 10Krinkle: [C: 03+2] Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 (owner: 10Krinkle) [05:11:17] (03Merged) 10jenkins-bot: Improve 'selected namespace' styling [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963400 (owner: 10Krinkle) [05:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [05:21:06] (03PS1) 10Krinkle: Fix 'p' parameter [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963431 [05:21:18] (03CR) 10Krinkle: [C: 03+2] Fix 'p' parameter [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963431 (owner: 10Krinkle) [05:21:50] (03Merged) 10jenkins-bot: Fix 'p' parameter [labs/tools/blankpages] - 10https://gerrit.wikimedia.org/r/963431 (owner: 10Krinkle) [05:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [06:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [06:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [06:12:13] 10tool-wscontest: Incorrect stats on landing page - https://phabricator.wikimedia.org/T348210 (10PMenon-WMF) [06:22:34] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:48:58] 10tool-wscontest: Incorrect stats on landing page - https://phabricator.wikimedia.org/T348210 (10Samwilson) I guess something like `SELECT COUNT(DISTINCT user_id) FROM scores` would be more what we want? [07:04:47] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:15:03] 10Tool-bub2, 10Internet-Archive, 10Outreach-Programs-Projects, 10Outreachy (Round 27): Add max character limit while creating identifier in Internet Archive and remove some special characters - https://phabricator.wikimedia.org/T348192 (10Shreyashidabral) Hey @wassan.anmol117 I have opened a [[ https://gi... [07:18:38] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [07:18:55] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [07:31:23] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs][buildpacks] ensure apt buildpack runs before others - https://phabricator.wikimedia.org/T347985 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/109 builds-builder: bump to 0... [07:32:54] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [07:32:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:34:38] 10Toolforge (Toolforge iteration 00), 10Patch-For-Review: [tbs][buildpacks] ensure apt buildpack runs before others - https://phabricator.wikimedia.org/T347985 (10Slst2020) 05In progress→03Resolved [07:36:55] 10Toolforge (Toolforge iteration 00): [tbs] migrate sample tools to Gitlab - https://phabricator.wikimedia.org/T348213 (10Slst2020) [07:42:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [07:49:31] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [07:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:19:21] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: Decision request – Toolforge (re)architecture - https://phabricator.wikimedia.org/T346153 (10dcaro) >>! In T346153#9224779, @fnegri wrote: > I think I have a //slight// prefe... [08:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:22:22] 10Cloud-VPS, 10observability, 10Security: Ingest Cloud VPS audit logs into production logging pipeline - https://phabricator.wikimedia.org/T348075 (10fgiunchedi) Thank you for following up @Southparkfan ! Some thoughts/clarifications: the model I proposed to pull logs is indeed to avoid cloud vps-initiated... [08:32:34] (ProbeDown) resolved: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [08:44:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cli: Use toolforge-weld config system [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/962628 (owner: 10Majavah) [08:46:33] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Seems elegant. Thanks." [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) (owner: 10Majavah) [08:50:28] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_node [08:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:56:47] (03CR) 10Majavah: [C: 03+2] cli: Use toolforge-weld config system [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/962628 (owner: 10Majavah) [08:58:31] (03Merged) 10jenkins-bot: cli: Use toolforge-weld config system [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/962628 (owner: 10Majavah) [09:01:24] (03CR) 10Majavah: [C: 03+2] Add support for querying logs (032 comments) [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) (owner: 10Majavah) [09:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [09:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [09:02:19] (03Merged) 10jenkins-bot: Add support for querying logs [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963295 (https://phabricator.wikimedia.org/T336057) (owner: 10Majavah) [09:06:28] 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [09:16:11] (03PS1) 10Majavah: d/changelog: prepare for new release 13 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963668 [09:19:27] (03PS2) 10Majavah: d/changelog: prepare for new release 13 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963668 [09:22:24] (03CR) 10Majavah: [C: 03+2] d/changelog: prepare for new release 13 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963668 (owner: 10Majavah) [09:23:37] (03Merged) 10jenkins-bot: d/changelog: prepare for new release 13 [cloud/toolforge/jobs-framework-cli] - 10https://gerrit.wikimedia.org/r/963668 (owner: 10Majavah) [09:32:59] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [09:33:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:33:05] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:34:31] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [09:37:37] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-jobs-framework-cli' version '13' [09:37:52] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-jobs-framework-cli' version '13' [09:39:08] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10MoritzMuehlenhoff) >> @MoritzMuehlenhoff @SLyngshede-WMF Should Bitu create users with more object classes, or should we remove some unnecessary ones f... [09:40:01] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [09:40:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:40:07] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:40:50] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10MoritzMuehlenhoff) >>! In T347631#9210445, @taavi wrote: > This is indeed it, adding it to my testing account made it able to log in to Toolforge. Alth... [09:44:52] (03PS1) 10Majavah: Relax LDAP objectClass requirements [labs/striker] - 10https://gerrit.wikimedia.org/r/963671 (https://phabricator.wikimedia.org/T347631) [09:47:28] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [09:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:47:33] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:49:03] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10SLyngshede-WMF) We can just add it as an auxiliary class. I just tested and Bitu has a bug where it gets confused if new schemas... [09:54:03] 10cloud-services-team, 10affects-Kiwix-and-openZIM: Read-only access to Wikimedia mirror of Kiwix data in dumps.wikimedia.org/kiwix/ - https://phabricator.wikimedia.org/T348226 (10Benoit74) [09:54:03] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-prometheus-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:54:42] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [09:54:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:54:47] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [09:59:34] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10SLyngshede-WMF) It's the "shadowAccount" class you want to add, and not a "person" class correct? Because I can't seem to find an... [10:17:16] !log impactvisualizer dcaro@urcuchillay START - Cookbook wmcs.vps.create_project for project impactvisualizer in eqiad1 (T347905) [10:17:17] wm-bot2: Unknown project "impactvisualizer" [10:17:17] T347905: Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 [10:19:20] !log impactvisualizer dcaro@urcuchillay END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project impactvisualizer in eqiad1 (T347905) [10:19:20] wm-bot2: Unknown project "impactvisualizer" [10:22:22] !log impactvisualizer dcaro@urcuchillay START - Cookbook wmcs.vps.add_user_to_project for user 'Ragesoss' in role 'member' (T347905) [10:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Impactvisualizer/SAL [10:22:25] T347905: Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 [10:22:31] !log impactvisualizer dcaro@urcuchillay END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'Ragesoss' in role 'member' (T347905) [10:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Impactvisualizer/SAL [10:22:44] !log impactvisualizer dcaro@urcuchillay START - Cookbook wmcs.vps.add_user_to_project for user 'Megannewsome1' in role 'member' (T347905) [10:22:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Impactvisualizer/SAL [10:22:54] !log impactvisualizer dcaro@urcuchillay END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'Megannewsome1' in role 'member' (T347905) [10:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Impactvisualizer/SAL [10:24:38] 10Cloud-VPS (Project-requests): Request creation of impact-visualizer VPS project - https://phabricator.wikimedia.org/T347905 (10dcaro) 05Open→03Resolved Done! Let me know if you have any issues, enjoy! [10:25:03] (InstanceDown) firing: Project tools instance tools-sgeweblight-10-28 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:29:19] 10Toolforge, 10cloud-services-team, 10Epic: Provide modern, non-NFS error log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367 (10taavi) [10:30:03] (InstanceDown) resolved: Project tools instance tools-sgeweblight-10-28 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:30:44] 10Toolforge (Toolforge iteration 00), 10cloud-services-team, 10Patch-For-Review: Add commands to `webservice` and `jobs` to query logs from Kubernetes - https://phabricator.wikimedia.org/T336057 (10taavi) 05Open→03Resolved also wrote documentation: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Logs [10:47:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [10:57:20] !log admin dcaro@urcuchillay END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) [10:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:01:38] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10SLyngshede-WMF) Okay... So I was missing the core.schema in my test setup, that contains "person". The description field is also... [11:18:47] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [tbs][builder] Inject nodejs buildpack - https://phabricator.wikimedia.org/T346635 (10Slst2020) I've tested with both, and in this particular case, it... [11:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:24:12] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10MoritzMuehlenhoff) >>! In T347631#9227685, @SLyngshede-WMF wrote: > Okay... So I was missing the core.schema in my test setup, th... [11:28:36] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/963698 (owner: 10L10n-bot) [11:28:39] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/963701 (owner: 10L10n-bot) [11:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:54:06] !log admin dcaro@urcuchillay START - Cookbook wmcs.ceph.osd.drain_rack [11:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:54:31] (OpenstackAPIResponse) firing: (4) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:01:48] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [12:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [12:01:54] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye [12:02:01] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye [12:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [12:03:30] 10Data-Services, 10DBA, 10Data Engineering and Event Platform Team: Prepare and check storage layer for fonwiki - https://phabricator.wikimedia.org/T347938 (10Ladsgroup) [12:03:38] 10Data-Services, 10DBA, 10Data Engineering and Event Platform Team: Prepare and check storage layer for fonwiki - https://phabricator.wikimedia.org/T347938 (10Ladsgroup) Ready for data engineering. [12:04:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [12:05:04] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) >>! In T348184#9226989, @Audiodude wrote: > I assume we need some kind of access to the Github repo too? https://github.com/toolforge/quarry Oh that would be helpful, wouldn't it :) What are yinz github accounts? [12:09:03] 10Cloud-VPS, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [cookbooks] Remove all the duplicated code now that we can use spicerack one - https://phabricator.wikimedia.org/T319438 (10fnegri) p:05High→03Low [12:09:27] 10Cloud-VPS, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [cookbooks] Remove all the duplicated code now that we can use spicerack one - https://phabricator.wikimedia.org/T319438 (10fnegri) @Volans noticed another duplication that is probably unnec... [12:11:58] 10Cloud-VPS, 10cloud-services-team, 10SRE: cloudlb2001-dev and cloudlb2002-dev connected at different speeds - https://phabricator.wikimedia.org/T348173 (10LSobanski) [12:34:32] 10Toolforge Jobs framework: Show a job status when a job is being deleted - https://phabricator.wikimedia.org/T348242 (10taavi) [12:34:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [12:47:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:00:58] 10Toolforge (Toolforge iteration 00), 10cloud-services-team (FY2023/2024-Q1), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 3 others: [tbs][builder] Inject nodejs buildpack - https://phabricator.wikimedia.org/T346635 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/re... [13:17:37] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [13:22:06] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [13:22:14] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1063.eqiad.wmnet with OS bullseye executed with erro... [13:22:17] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1064.eqiad.wmnet with OS bullseye executed with erro... [13:22:41] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10SD0001) My github id is `siddharthvp`. Also, how do we login to the instances where quarry runs? Doesn't seem to be documented on wikitech. [13:25:18] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Container dashboard: inject polite error page for projects w/out object support [openstack/horizon/horizon] - 10https://gerrit.wikimedia.org/r/963323 (https://phabricator.wikimedia.org/T341509) (owner: 10Andrew Bogott) [13:29:34] (03PS1) 10Andrew Bogott: Container dashboard: inject polite error page for projects w/out object support [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/963728 (https://phabricator.wikimedia.org/T341509) [13:30:03] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Container dashboard: inject polite error page for projects w/out object support [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/963728 (https://phabricator.wikimedia.org/T341509) (owner: 10Andrew Bogott) [13:41:58] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [13:43:55] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [labs/striker] - 10https://gerrit.wikimedia.org/r/963671 (https://phabricator.wikimedia.org/T347631) (owner: 10Majavah) [13:44:14] 10Toolforge (Toolforge iteration 00), 10Toolforge Jobs framework: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10taavi) [13:44:27] 10Toolforge (Toolforge iteration 00), 10Toolforge Jobs framework: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 (10taavi) a:03taavi [13:45:37] (CephClusterInWarning) firing: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWar [13:56:45] (SystemdUnitCrashLoop) firing: crashloop on cloudweb1003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:00:45] (SystemdUnitCrashLoop) firing: crashloop on cloudweb1004:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:04:10] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [14:04:17] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [14:07:09] !log admin dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=99) [14:07:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:08:42] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye [14:15:42] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) >>! In T348184#9228143, @SD0001 wrote: > My github id is `siddharthvp`. Also, how do we login to the instances where quarry runs? Doesn't seem to be documented on wikitech. I've sent a github invite for 'read' could you ver... [14:18:14] 10Cloud-Services: regenerate sql credentials for tool wosretbot - https://phabricator.wikimedia.org/T348259 (10Janui) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag... [14:19:18] 10Toolforge: regenerate sql credentials for tool wosretbot - https://phabricator.wikimedia.org/T348259 (10Janui) [14:20:15] 10Tool-bub2: Fix README.md - https://phabricator.wikimedia.org/T344123 (10Akanksha.t05) 05In progress→03Resolved [14:20:45] (SystemdUnitCrashLoop) resolved: crashloop on cloudweb1004:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:21:45] (SystemdUnitCrashLoop) resolved: crashloop on cloudweb1003:9100 - TODO - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitCrashLoop [14:23:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance tools-puppetdb-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:23:10] 10Tool-bub2: Fix CONTRIBUTING.MD - https://phabricator.wikimedia.org/T344122 (10Akanksha.t05) 05Open→03Resolved [14:36:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-puppetdb-02 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [15:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [15:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [15:06:38] 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, and 3 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544 (10dcaro) [15:10:07] (CephClusterInWarning) resolved: The ceph cluster in is in warning status, that means that it's high availability is compromised, things should still be working as expected. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInW [15:16:08] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [15:16:22] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [15:28:54] 10cloud-services-team (Hardware), 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudvirt10[62-67] - https://phabricator.wikimedia.org/T342537 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host cloudvirt1067.eqiad.wmnet with OS bullseye executed with erro... [15:49:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [15:49:31] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [15:57:09] 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10netops: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10aborrero) The cloudgw side is now completed. We may want to refresh the neutron side as well: `lang=shell-session aborrero@cloudcontrol100... [15:59:37] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10Audiodude) I'm `audiodude` on github. Thanks! [16:06:59] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10bking) a:03VRiley-WMF [16:11:32] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [16:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:11:38] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:24:09] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [16:24:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:24:14] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:26:37] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) >>! In T348184#9228819, @Audiodude wrote: > I'm `audiodude` on github. Thanks! Added. Similarly please confirm here that I added the right person and I'll up the permission. [16:29:31] !log admin fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285) [16:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:29:37] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:35:02] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal: keystone: segfaults in debian bookworm - https://phabricator.wikimedia.org/T348157 (10fnegri) 05Open→03Resolved This was fixed by @Andrew in https://gerrit.wikimedia.org/r/c/operations/puppet/+/963378 [16:35:10] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) [16:35:18] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1): keystone: segfaults in debian bookworm - https://phabricator.wikimedia.org/T348157 (10fnegri) [16:36:22] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10SD0001) Confirming that I got the invite. (And am able to login to the instances now.) Thanks. [16:41:36] !log admin fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285) [16:41:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:41:41] T341285: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 [16:44:40] 10tool-wscontest: Incorrect stats on landing page - https://phabricator.wikimedia.org/T348210 (10PMenon-WMF) a:03PMenon-WMF [17:09:17] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1), 10Goal, 10Patch-For-Review: Upgrade cloud-vps openstack to version 'Antelope' - https://phabricator.wikimedia.org/T341285 (10fnegri) The cookbook has been run successfully on the following nodes: * cloudcontrol2001-dev * cloudcontrol2004-dev * cloudcont... [17:15:18] (03CR) 10Majavah: [C: 03+2] Relax LDAP objectClass requirements [labs/striker] - 10https://gerrit.wikimedia.org/r/963671 (https://phabricator.wikimedia.org/T347631) (owner: 10Majavah) [17:16:56] (03Merged) 10jenkins-bot: Relax LDAP objectClass requirements [labs/striker] - 10https://gerrit.wikimedia.org/r/963671 (https://phabricator.wikimedia.org/T347631) (owner: 10Majavah) [17:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:21:57] 10tool-wscontest: The score command throws deprecated warning - https://phabricator.wikimedia.org/T348270 (10PMenon-WMF) [17:23:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance tools-puppetdb-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:29:02] 10Striker, 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10Patch-For-Review: Unable to apply for Toolforge access - https://phabricator.wikimedia.org/T347631 (10taavi) 05Open→03Resolved a:03taavi https://gerrit.wikimedia.org/r/c/labs/striker/+/963671/ seems to have fixed the issue. [17:30:03] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10Audiodude) Confirmed: I got the github invite. I can also access the instances with my wikitech account, thanks! [17:36:03] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-puppetdb-02 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [17:48:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance tools-puppetdb-1 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:51:03] (PuppetAgentNoResources) resolved: No Puppet resources found on instance toolsbeta-puppetdb-02 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [18:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [18:33:58] 10superset.wmcloud.org: Verify superset works in minikube - https://phabricator.wikimedia.org/T348273 (10rook) [19:17:35] 10Tool-inteGraality: Add integraality column for sitelinks - https://phabricator.wikimedia.org/T312726 (10JeanFred) Syntax-wise, I checked for inspiration the JSON rendering ([[ https://www.wikidata.org/wiki/Special:EntityData/Q40.json | example ]]) − the sitelinks are keyed as “frwiki” or “bnwikivoyage“. That l... [20:09:45] (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:14:45] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [21:01:48] (CodesearchConfigWriteFailed) firing: codesearch-write-config.service failed on codesearch8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchConfigWriteFailed [21:02:18] (CodesearchBackendDown) firing: (2) Codesearch backend design is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DCodesearchBackendDown [21:26:58] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) Awesome, Ok I think everyone is all connected. Let me know if I missed anything, feel free to poke me with any questions. [21:27:06] 10Quarry: Add maintainers to quarry - https://phabricator.wikimedia.org/T348184 (10rook) 05Open→03Resolved [22:37:27] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host cloudelastic1007.... [22:59:39] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host cloudelastic1007.eqia... [23:00:20] 10cloud-services-team (Hardware), 10DC-Ops, 10Data-Platform-SRE, 10SRE, 10ops-eqiad: Q1:rack/setup/install cloudelastic10[07-10].wikimedia.org - https://phabricator.wikimedia.org/T342538 (10Papaul) @bking I tried to do the re-images on cloudelastic1007, the re-image finished with the OS install without a... [23:21:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:43:04] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 17 deleted instances on integration-puppetmaster-02 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates