[00:10:49] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:15:49] (TfInfraTestDestroyFailed) resolved: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [00:18:44] 10Toolforge: [envvars] Enable use of `toolforge envvar` managed data on bastions - https://phabricator.wikimedia.org/T358537#9578688 (10bd808) [00:43:30] 10Toolforge (Quota-requests), 10Wikibugs: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T358538#9578709 (10bd808) [00:46:31] 10Toolforge (Quota-requests), 10Wikibugs: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T306322#9578728 (10bd808) [00:51:55] 10Toolforge (Quota-requests), 10Wikibugs, 10Patch-For-Review: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T358538#9578731 (10CodeReviewBot) bd808 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/210 maintain-kubeusers: bu... [01:05:12] 10Wikibugs, 10User-bd808: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729#9578750 (10bd808) [01:05:16] 10Toolforge (Quota-requests), 10Wikibugs, 10Patch-For-Review: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T358538#9578751 (10bd808) [01:39:15] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9578936 (10Varnent) [01:46:56] (CloudVPSDesignateLeaks) firing: (2) Detected 20 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:08:59] (03PS4) 10BryanDavis: wikibugs: get tag icon and color via Conduit API [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1004326 (https://phabricator.wikimedia.org/T1175) [05:51:41] (CloudVPSDesignateLeaks) firing: (2) Detected 20 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:12:25] 10tool-wdlocator, 10translatewiki.net, 10Language-Team (Language-2024-January-March), 10Localization Infrastructure FY2023-24, and 2 others: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9579184 (10Wangombe) [07:57:59] 10Toolforge Build Service: Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9579281 (10Magnus) [08:27:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:32:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:36:38] 10cloud-services-team, 10WMDE-Analytics-Engineering, 10Wikibase - Federated Properties, 10Wikidata, 10Cloud-Services-Origin-Alert: Wikidata-related Cloud VPS alerts about puppet - https://phabricator.wikimedia.org/T354268#9579353 (10taavi) `wikidata-analytics-1.wmdeanalytics.eqiad1.wikimedia.cloud` still... [08:38:31] (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8659424977930517/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [08:38:59] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9579360 (10taavi) [08:39:24] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570#9579358 (10taavi) 05Open→03Resolved a:03taavi [08:42:19] 10Tools: wiki-osm.pl: Use of uninitialized value within @kml in lc at /data/project/osm4wiki/public_html/cgi-bin/wiki/wiki-osm.pl line 166. - https://phabricator.wikimedia.org/T357899#9579361 (10taavi) 05Resolved→03Open `lang=shell-session root@tools-nfs-2:~# ls -lah /srv/tools/project/osm4wiki/error.log -r... [08:42:23] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9579363 (10taavi) [08:42:32] 10Toolforge, 10cloud-services-team: 2024-02-27: toolforge NFS cleanup - https://phabricator.wikimedia.org/T358554#9579364 (10taavi) [08:42:43] 10Toolforge, 10cloud-services-team: 2024-02-27: toolforge NFS cleanup - https://phabricator.wikimedia.org/T358554#9579376 (10taavi) [08:42:46] 10Tools: wiki-osm.pl: Use of uninitialized value within @kml in lc at /data/project/osm4wiki/public_html/cgi-bin/wiki/wiki-osm.pl line 166. - https://phabricator.wikimedia.org/T357899#9579377 (10taavi) [08:43:49] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#9579383 (10taavi) [08:43:51] 10Toolforge, 10cloud-services-team: 2024-02-27: toolforge NFS cleanup - https://phabricator.wikimedia.org/T358554#9579364 (10taavi) [08:44:08] 10Tools: 'hoiscript' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349913#9579384 (10taavi) [08:44:10] 10Toolforge, 10cloud-services-team: 2024-02-27: toolforge NFS cleanup - https://phabricator.wikimedia.org/T358554#9579364 (10taavi) [08:45:00] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#7156929 (10taavi) `lang=shell-session root@tools-nfs-2:~# du -sh /srv/tools/project/eatchabot/ 67G /srv/tools/project/eatchabot/ ` [08:49:56] 10Toolforge, 10cloud-services-team: cewbot k8s-20230418.fix-redirected-wikilinks-of-templates.out is unreasonably large - https://phabricator.wikimedia.org/T358555#9579392 (10taavi) [08:53:31] (ToolsNfsAlmostFull) resolved: Toolforge NFS is 0.8565375062842955/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [08:56:47] 10Toolforge (Quota-requests), 10Wikibugs, 10Patch-For-Review: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T358538#9579421 (10dcaro) +1 [09:10:50] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:28:00] 10Toolforge Build Service: Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9579573 (10dcaro) The compilation is for the build step (it compiled the rust code), it failed during export so it never sent the built image to the registry. The current 'latest' is the previous successful... [09:29:08] 10Toolforge (Toolforge iteration 06): Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9579576 (10dcaro) p:05Triage→03High [09:29:37] 10Toolforge (Toolforge iteration 06): Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9579281 (10dcaro) [09:31:40] 10Toolforge (Toolforge iteration 06): Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9579596 (10taavi) This is {T354116} most likely? `lang=shell-session taavi@proxy-03:~$ sudo grep tools-harbor /var/log/nginx/error.log 2024/02/27 07:52:25 [crit] 2458144#2458144: *44076682 pwrite... [09:32:25] 10Toolforge (Toolforge iteration 06), 10cloud-services-team: Harbor uploads sometimes fail due to tmpfs space on project-proxy - https://phabricator.wikimedia.org/T354116#9579599 (10dcaro) [09:34:22] 10Toolforge (Toolforge iteration 06), 10cloud-services-team: Harbor uploads sometimes fail due to tmpfs space on project-proxy - https://phabricator.wikimedia.org/T354116#9579602 (10taavi) https://gerrit.wikimedia.org/r/c/operations/puppet/+/998659 should have fixed this, but based on T358552 it did not? [09:51:41] (CloudVPSDesignateLeaks) firing: (2) Detected 20 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:02:17] 10Tools, 10cloud-services-team: cewbot k8s-20230418.fix-redirected-wikilinks-of-templates.out is unreasonably large - https://phabricator.wikimedia.org/T358555#9579670 (10taavi) [10:24:15] 10cloud-services-team, 10WMDE-Analytics-Engineering, 10Wikibase - Federated Properties, 10Wikidata, 10Cloud-Services-Origin-Alert: Wikidata-related Cloud VPS alerts about puppet - https://phabricator.wikimedia.org/T354268#9579732 (10AndrewTavis_WMDE) Hey @taavi 👋 One thing to note is that a decision was... [10:26:41] (CloudVPSDesignateLeaks) firing: (2) Detected 20 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:36:47] 10Toolforge (Toolforge iteration 06), 10cloud-services-team: Harbor uploads sometimes fail due to tmpfs space on project-proxy - https://phabricator.wikimedia.org/T354116#9579766 (10dcaro) Looks like it :/, will have to experiment a bit more. [10:41:30] (03PS1) 10Ketulucas: Bug:T358396. Typo in Top contributor campaign page [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1006865 [10:43:38] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9579796 (10DB111) [10:43:40] 10Toolforge, 10cloud-services-team: 2024-02-27: toolforge NFS cleanup - https://phabricator.wikimedia.org/T358554#9579793 (10DB111) [10:44:37] 10Tools: wiki-osm.pl: Use of uninitialized value within @kml in lc at /data/project/osm4wiki/public_html/cgi-bin/wiki/wiki-osm.pl line 166. - https://phabricator.wikimedia.org/T357899#9579791 (10DB111) 05Open→03Resolved Thank you, so I could have a look on the first lines in the error log. The problem was an... [10:51:41] (CloudVPSDesignateLeaks) resolved: Detected 10 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:10:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:15:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:17:54] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: Some VPS instances still using ns-recursor0 - https://phabricator.wikimedia.org/T346426#9579893 (10cmooney) >>! In T346426#9529977, @Andrew wrote: > - 5 have 127.0.0.53 in resolv.conf Those are probably running systemd-resolved, which creates a caching re... [11:20:57] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: Some VPS instances still using ns-recursor0 - https://phabricator.wikimedia.org/T346426#9579914 (10cmooney) >>! In T346426#9529977, @Andrew wrote: > The 13 with the proper reesolv.conf presumably have nameservers cached or configured someplace else. I'm no... [11:29:56] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static [11:32:50] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 26920 bytes in 1.084 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [12:13:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:18:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:22:34] (03CR) 10FNegri: "Thank, this looks really promising! I'm still unsure if using the hiera key is the best way, or if we should ask mariadb directly." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992932 (owner: 10David Caro) [12:35:00] 10Toolforge (Toolforge iteration 06), 10cloud-services-team: Harbor uploads sometimes fail due to tmpfs space on project-proxy - https://phabricator.wikimedia.org/T354116#9580068 (10dcaro) There's documentation about the buffering being disabled if the client uses HTTP 1.1 with chunked transfer: """ When HTTP... [13:53:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [14:21:04] 10Toolforge (Toolforge iteration 06), 10cloud-services-team: Harbor uploads sometimes fail due to tmpfs space on project-proxy - https://phabricator.wikimedia.org/T354116#9580445 (10Raymond_Ndibe) >>! In T354116#9580068, @dcaro wrote: > There's documentation about the buffering being disabled if the client use... [15:02:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:07:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:23:57] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:28:56] (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:23:49] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9580919 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/24 k8s: allow passing the http probe path [16:27:31] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: [wmcs-cookbooks] tox is failing - https://phabricator.wikimedia.org/T348726#9580930 (10fnegri) [16:27:59] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Infrastructure-Foundations, 10SRE-tools, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9580929 (10fnegri) 05Open→03Stalled [17:39:37] 10PAWS: Update labpawspublic extension to jupyterlab 4 system - https://phabricator.wikimedia.org/T358604#9581217 (10rook) [17:53:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [18:04:22] (HAProxyBackendUnavailable) firing: (2) HAProxy service designate-api_backend backend cloudservices1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:05:22] (HAProxyServiceUnavailable) firing: (2) HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [18:05:27] 10cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T358607#9581307 (10phaultfinder) [18:08:56] (SystemdUnitDown) firing: The service unit designate-api.service is in failed status on host cloudservices1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudservices1005 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:09:22] (HAProxyBackendUnavailable) resolved: (2) HAProxy service designate-api_backend backend cloudservices1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:10:22] (HAProxyServiceUnavailable) resolved: (2) HAProxy service designate-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable [18:13:56] (SystemdUnitDown) firing: (6) The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:26:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [18:28:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [18:45:26] (SystemdUnitDown) resolved: (5) The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1005. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:46:28] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: [openstack] cloudservices + Designate are using different source addresses for local vs. remote updates - https://phabricator.wikimedia.org/T350995#9581395 (10Andrew) 05Open→03Resolved a:03Andrew I've moved Designate to cloudcontrol nodes, with pdns... [18:50:40] (03CR) 10BryanDavis: [C: 03+2] wikibugs: Get anchors from API instead of screen scraping [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1004317 (https://phabricator.wikimedia.org/T1177) (owner: 10BryanDavis) [18:50:43] (03CR) 10BryanDavis: [C: 03+2] wikibugs: get tag icon and color via Conduit API [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1004326 (https://phabricator.wikimedia.org/T1175) (owner: 10BryanDavis) [18:53:19] (03CR) 10David Caro: "We had a chat and decided to:" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992932 (owner: 10David Caro) [19:03:00] (03Merged) 10jenkins-bot: wikibugs: Get anchors from API instead of screen scraping [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1004317 (https://phabricator.wikimedia.org/T1177) (owner: 10BryanDavis) [19:03:02] (03Merged) 10jenkins-bot: wikibugs: get tag icon and color via Conduit API [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1004326 (https://phabricator.wikimedia.org/T1175) (owner: 10BryanDavis) [19:50:15] 10Striker: ACCOUNT_SSH.html links to obsolete help page - https://phabricator.wikimedia.org/T358615 (10Ahecht) [20:05:16] (03PS1) 10Dzahn: delete passwords::etherpad [labs/private] - 10https://gerrit.wikimedia.org/r/1006988 [20:10:18] (03PS2) 10Dzahn: delete passwords::etherpad [labs/private] - 10https://gerrit.wikimedia.org/r/1006988 [20:13:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:18:41] (CloudVPSDesignateLeaks) firing: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:43:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:45:27] 10Cloud-VPS, 15User-aborrero: Cloud VPS Designate setup improvements - https://phabricator.wikimedia.org/T340446#9581733 (10Andrew) [20:46:49] 10Wikibugs, 15User-bd808: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#9581748 (10bd808) test [20:53:18] 10Wikibugs, 15User-bd808: Get icon and color from API instead of screen scraping - https://phabricator.wikimedia.org/T1176#9581757 (10bd808) 05In progress→03Resolved I'm seeing lots of colors in `#wikimedia-dev` now. `lang=irc [19:57] < taavi> ooh, another wikibugs "bugfix" for something that's been b... [20:53:20] 10Wikibugs: All phabricator tags emitted with blue color - https://phabricator.wikimedia.org/T357828#9581759 (10bd808) [20:53:22] 10Wikibugs: Some projects get lost - https://phabricator.wikimedia.org/T90267#9581761 (10bd808) [20:53:24] 10Wikibugs: Tasks with reduced visibility (logged-in-only) are reported incorrectly - https://phabricator.wikimedia.org/T90488#9581760 (10bd808) [20:53:26] 10Wikibugs: Match on usage of Additional Hashtags, so that project renames don't break the bot - https://phabricator.wikimedia.org/T87825#9581762 (10bd808) [20:53:28] 10Wikibugs, 05Goal, 15User-bd808: Get rid of screen scraping in Wikibugs - https://phabricator.wikimedia.org/T1175#9581763 (10bd808) [20:55:14] 10Wikibugs, 15User-bd808: Get anchors from API instead of screen scraping - https://phabricator.wikimedia.org/T1177#9581766 (10bd808) 05In progress→03Resolved This is implemented and deployed. I'm not sure that we won't walk the change back because of the lack of a clear algorithm for selecting which trans... [20:55:15] 10Wikibugs, 05Goal, 15User-bd808: Get rid of screen scraping in Wikibugs - https://phabricator.wikimedia.org/T1175#9581768 (10bd808) [20:55:55] 10Wikibugs, 05Goal, 15User-bd808: Get rid of screen scraping in Wikibugs - https://phabricator.wikimedia.org/T1175#9581772 (10bd808) 05In progress→03Resolved We may end up bringing HTML parsing back for anchor selection, but for now we have achieved this long imagined goal. [21:07:48] 10PAWS: remove jupyter-dash - https://phabricator.wikimedia.org/T358621 (10rook) [21:15:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:29:55] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9581839 (10bd808) a:03bd808 [21:39:22] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9581845 (10Krinkle) I've tried a number of variations and simpler queries, but it seems nothing is getting through, e.g. [21:53:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:45:25] 10Wikibugs, 15User-bd808: All phabricator tags emitted with blue color - https://phabricator.wikimedia.org/T357828#9581984 (10bd808) 05Open→03Resolved a:03bd808 Fixed by {T1176} {F42191735, size=full} [22:49:30] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9581994 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/2 Setup GitLab CI and expand tests that run automatically [23:17:05] (03PS1) 10Dzahn: delete passwords::racktables [labs/private] - 10https://gerrit.wikimedia.org/r/1007008 (https://phabricator.wikimedia.org/T327405) [23:19:45] (03PS1) 10Dzahn: delete passwords::tendril and passwords::bugzilla [labs/private] - 10https://gerrit.wikimedia.org/r/1007009 [23:22:58] (03PS1) 10Dzahn: delete passwords::mysql::wikimania_scholarships and passwords::tor [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 [23:24:26] (03PS1) 10Dzahn: delete grafana password classes [labs/private] - 10https://gerrit.wikimedia.org/r/1007011 [23:29:09] 10Toolforge iteration 06: Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9582048 (10Raymond_Ndibe) I think I need help reproducing this. So far multiple runs on tools and toolsbeta failed to trigger this error. Is it likely that this is a combination of more than 1 factor, maybe... [23:35:44] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582070 (10bd808) {rTWBT} now mirrors https://gitlab.wikimedia.org/toolforge-repos/wikibugs2 [23:36:36] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582078 (10bd808) [23:43:01] 10Wikibugs, 13Patch-For-Review, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582081 (10CodeReviewBot) bd808 opened https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/3 .gitreview: convert for gerritlab usage [23:46:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [23:50:55] 10Wikibugs, 13Patch-For-Review, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582100 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/3 .gitreview: convert for gerritlab usage [23:56:25] 10Wikibugs, 10Projects-Cleanup: Archive labs/tools/wikibugs2 Gerrit repository - https://phabricator.wikimedia.org/T358630 (10bd808) [23:56:46] 10Wikibugs, 10Projects-Cleanup: Archive labs/tools/wikibugs2 Gerrit repository - https://phabricator.wikimedia.org/T358630#9582128 (10bd808) [23:56:49] 10Wikibugs, 13Patch-For-Review, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582129 (10bd808)