[00:04:56] (MaxConntrack) firing: Max conntrack at 81.14% on cloudvirt1042:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:07:37] (03Abandoned) 10BryanDavis: Update channel config for #mediawiki-parsoid [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/987992 (owner: 10Reedy) [00:39:50] !log bd808@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [00:39:59] !log bd808@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [00:43:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:46:41] !log bd808@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [00:46:52] !log bd808@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [00:50:25] (MaxConntrack) resolved: Max conntrack at 81.56% on cloudvirt1042:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:55:44] 10Wikibugs, 15User-bd808: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729#9582248 (10bd808) [00:56:16] 10Wikibugs, 10Quota-requests, 15User-bd808: Request increased quota for wikibugs Toolforge tool - https://phabricator.wikimedia.org/T358538#9582245 (10bd808) 05Open→03Resolved a:03bd808 Followed https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Deploy_new_version to deploy the chang... [01:11:56] 10Wikibugs, 10Projects-Cleanup, 13Patch-For-Review: Archive labs/tools/wikibugs2 Gerrit repository - https://phabricator.wikimedia.org/T358630#9582272 (10bd808) [01:11:58] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582273 (10bd808) [01:12:09] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582275 (10bd808) [01:12:12] 10Wikibugs, 10Projects-Cleanup, 13Patch-For-Review: Archive labs/tools/wikibugs2 Gerrit repository - https://phabricator.wikimedia.org/T358630#9582274 (10bd808) [01:17:59] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582279 (10bd808) https://www.mediawiki.org/w/index.php?title=Wikibugs&diff=6388946&oldid=6388439 [01:18:10] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634 (10tstarling) [01:21:05] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582296 (10bd808) 05In progress→03Resolved [01:21:09] 10Wikibugs, 10Projects-Cleanup, 13Patch-For-Review: Archive labs/tools/wikibugs2 Gerrit repository - https://phabricator.wikimedia.org/T358630#9582297 (10bd808) [01:21:11] 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9582298 (10bd808) [01:21:15] 10Wikibugs, 15User-bd808: Move wikibugs git hosting from Gerrit to GitLab - https://phabricator.wikimedia.org/T357850#9582294 (10bd808) {T358630} will track the remaining Gerrit/Zuul/Jenkins cleanup steps. [01:53:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [03:46:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:23:41] (CloudVPSDesignateLeaks) firing: (5) Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:28:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 8 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:53:48] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [06:26:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:45:21] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, and 2 others: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9582430 (10Wangombe) Does https://gitlab.wikimedia.org/l10n-bot have commit access to this project? If... [06:50:29] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, and 2 others: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9582435 (10Samwilson) @Wangombe not yet, but I've invited that user to be a developer; can you accept t... [06:56:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:05:41] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634#9582468 (10Joe) Focusing on the swap part of the problem, for posterity: I think it's a valid point for backend/async processing systems or systems that have a lot of noisy neighbours and are not latency-critical. I do... [07:05:47] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634#9582471 (10Joe) I also want to note that on kubernetes memory is mostly managed by the k8s scheduler on top of the kernel one, so that we never have overflowing use of memory and we OOM containers (which are nothing more... [07:06:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:06:58] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, and 2 others: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9582470 (10abi_) >>! In T357495#9582435, @Samwilson wrote: > @Wangombe not yet, but I've invited that u... [07:11:28] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, and 2 others: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9582479 (10Samwilson) Yes, 'Developers + Maintainers' groups are 'Allowed to merge' and 'Allowed to pus... [07:40:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:45:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:00:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:05:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:07:45] 10tool-wdlocator: tomba Kanssa - https://phabricator.wikimedia.org/T358400#9582685 (10Samwilson) @Berete5212 I'm not sure what issue you're trying to report here, sorry. Could you elaborate? [09:09:39] 10Striker, 10Toolforge iteration 06: ACCOUNT_SSH.html links to obsolete help page - https://phabricator.wikimedia.org/T358615#9582686 (10taavi) a:03taavi [09:34:28] 10tool-wdlocator: tomba Kanssa - https://phabricator.wikimedia.org/T358400#9582728 (10Berete5212) hello i wanted this wikipedia article be displayed https://wiwosm.toolforge.org/osm-on-ol/kml-on-ol.php?la=en&uselang=en&lon=-10.183333&lat=11.716667&rang=50&map=1 be displayed on the Wikimedia map like this https:/... [09:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:11:06] 10Cloud-VPS, 06cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138#9582807 (10taavi) 05Open→03Resolved And the disk space usage still looks okay: `lang=shell-session ubuntu@dbapp:~$ df -h /var/lib/postgresql Filesystem Size Used Av... [10:11:11] 10VPS-Projects: GLAMWiki Dashboard not loading - https://phabricator.wikimedia.org/T355082#9582809 (10taavi) [10:12:26] 10tool-wdlocator: tomba Kanssa - https://phabricator.wikimedia.org/T358400#9582825 (10Samwilson) You might be looking to report against the wiwosm tool; it looks like their bug tracker is at https://github.com/wikiosm/osm-on-ol/issues If it's wdlocator you're interested in, I think the area you're looking at is... [10:13:11] 10Cloud-Services, 10PAWS: Add wikibase-cli to paws - https://phabricator.wikimedia.org/T358649 (10hubaishan) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this... [10:14:41] 10PAWS: Add wikibase-cli to paws - https://phabricator.wikimedia.org/T358649#9582841 (10hubaishan) [10:32:14] 10Cloud-VPS, 06cloud-services-team, 13Patch-For-Review, 15User-aborrero: Improve cloudgw filter between VM instances and cloud-private - https://phabricator.wikimedia.org/T356986#9582874 (10cmooney) Had a first stab at this in the patch above. @aborrero @taavi it will need some input from you guys regardi... [10:33:41] 10Cloud-VPS, 06cloud-services-team: sssd permanent failure on integration-agent-docker-1029 - https://phabricator.wikimedia.org/T324934#9582879 (10hashar) For `sssd` I have found an other issue via T349681#9278821 which is that the service unit has a restart counter limit and once reached it is never restarted... [10:35:22] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634#9582894 (10dcaro) @tstarling thanks for the task! :), I was sitting a couple of desks away from Cris in London when he wrote that post xd, it circulated widely among production engineering To clarify, this task is to re... [10:41:58] 10Wikibugs, 15User-bd808: Get icon and color from API instead of screen scraping - https://phabricator.wikimedia.org/T1176#9582909 (10Peachey88) Looks like the API is only grabbing the milestone name, instead of the full project name. Two examples below, their full names would be `Wikibase Suite Team (Sprint-∞... [10:52:15] 10Toolforge iteration 04, 13Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320#9582947 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/25 d... [10:53:00] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9582945 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/25 d/changelog: bump to 0.103.2 [11:06:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:18:00] 10Cloud-VPS, 06DC-Ops, 06SRE, 10ops-eqiad, 10FY2023/2024-Q3-Q4: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9582996 (10dcaro) [11:21:27] 10Wikibugs: wikibugs only shows milestone name without parent project name - https://phabricator.wikimedia.org/T358653 (10taavi) [11:30:26] 10Cloud-VPS, 10Tool-spacemedia, 06cloud-services-team, 10video2commons, 07Upstream: Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads - https://phabricator.wikimedia.org/T236446#9583045 (10Yann) It seems transfer from YT works again, i.e.... [11:45:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:03:57] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9583119 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/25 d/changelog: bump to 0.103.2 [12:04:04] 10Toolforge iteration 04, 13Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320#9583120 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/25 d... [12:09:19] 10Toolforge, 07Epic, 15User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9583144 (10taavi) [12:09:21] 10Toolforge Jobs framework: Toolforge jobs: consider having a way for jobs to report their liveness status to kubernetes - https://phabricator.wikimedia.org/T335592#9583145 (10taavi) [12:09:42] 10Toolforge Jobs framework: Support job health checks - https://phabricator.wikimedia.org/T335592#9583150 (10taavi) [12:12:23] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9583155 (10dcaro) This is available for use now, I'll leave the task open to do a bit of following for the next few days monitoring https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforg... [12:12:51] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9583171 (10dcaro) @Dvorapa This is unblocked now by allowing to specify an http probe from the webservice cli (`toolforge webservice start --health-check... [12:20:22] (03PS1) 10Jaime Nuche: jenkins: add security patch bot token to releases instance secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1007319 (https://phabricator.wikimedia.org/T350065) [12:48:43] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9583370 (10Wangombe) [12:51:04] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9583378 (10Wangombe) [13:47:22] 10Toolforge, 06cloud-services-team: Toolforge: systemd monitoring - https://phabricator.wikimedia.org/T215155#9583542 (10MoritzMuehlenhoff) [13:47:55] 10Cloud-VPS, 06cloud-services-team, 06Infrastructure-Foundations, 13Patch-For-Review, 07Puppet: wmf_auto_restart_cron.service failing in Cloud VPS bookworm instances - https://phabricator.wikimedia.org/T358343#9583540 (10MoritzMuehlenhoff) 05Open→03Resolved I added a new Hiera option for this: profi... [13:53:14] (03PS1) 10Jelto: passwords: update etherpad labs [labs/private] - 10https://gerrit.wikimedia.org/r/1007331 (https://phabricator.wikimedia.org/T316421) [13:53:28] (03PS2) 10Jelto: passwords: update etherpad labs [labs/private] - 10https://gerrit.wikimedia.org/r/1007331 (https://phabricator.wikimedia.org/T316421) [13:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [14:33:20] 10tool-wdlocator: tomba Kanssa - https://phabricator.wikimedia.org/T358400#9583783 (10Berete5212) @Samwilson So we can't do it ourselves to make sure that the article https://en.wikipedia.org/wiki/Tomba_Kanssais displayed on the map like this https://wiwosm.toolforge.org/osm-on-ol/kml-on-ol.php?la=en&uselang=en&... [14:41:22] 10Cloud-VPS, 06cloud-services-team: cloud-vps dynamic proxy returns 500 - https://phabricator.wikimedia.org/T358672 (10Andrew) [15:06:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:14:45] 10Cloud-Services: Deploy a Hasura endpoint for Wikimedia wiki databases - https://phabricator.wikimedia.org/T358677 (10Lectrician1) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specifi... [15:16:27] 10Cloud Services Proposals: Deploy a Hasura endpoint for Wikimedia wiki databases - https://phabricator.wikimedia.org/T358677#9583987 (10Lectrician1) [15:28:38] 10Cloud Services Proposals: Deploy a Hasura endpoint for Wikimedia wiki databases - https://phabricator.wikimedia.org/T358677#9584121 (10taavi) 05Open→03Declined https://hasura.io/docs/latest/databases/mariadb/index/ says support for MariaDB is not present on the free software versions which makes this a non... [15:37:34] (03CR) 10Jelto: [V: 03+2 C: 03+2] passwords: update etherpad labs [labs/private] - 10https://gerrit.wikimedia.org/r/1007331 (https://phabricator.wikimedia.org/T316421) (owner: 10Jelto) [15:40:59] 10Cloud Services Proposals: Deploy a Hasura endpoint for Wikimedia wiki databases - https://phabricator.wikimedia.org/T358677#9584205 (10Lectrician1) 😢 [15:54:28] 10Cloud-VPS, 06cloud-services-team, 13Patch-For-Review: cloud-vps dynamic proxy returns 500 - https://phabricator.wikimedia.org/T358672#9584246 (10taavi) 05Open→03Resolved I'm optimistically closing this, but please re-open if this comes back. [16:06:47] (03PS2) 10Dzahn: delete passwords::racktables [labs/private] - 10https://gerrit.wikimedia.org/r/1007008 (https://phabricator.wikimedia.org/T327405) [16:07:41] 10Toolforge iteration 06: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9584300 (10bd808) @dcaro, this is awesome and I think also completely worthy of a message to cloud-announce. :) [16:11:50] 10Toolforge: Expose Toolforge service names via environment variables - https://phabricator.wikimedia.org/T151002#9584308 (10dcaro) > I worry that if we try to expose some set of known services with globally provisioned envvars this will actually only be of use to a small number of tools that were explicitly wri... [16:49:34] 10Tools, 06Tech-Docs-Team, 07Documentation, 03Wikimedia-Hackathon-2024: [Hackathon 2024] Improve technical documentation of tools - https://phabricator.wikimedia.org/T358040#9584481 (10TBurmeister) [16:49:49] 10Tools, 06Tech-Docs-Team, 07Documentation, 03Wikimedia-Hackathon-2024: [Hackathon 2024] Improve technical documentation of tools - https://phabricator.wikimedia.org/T358040#9584483 (10TBurmeister) Project page is now live at https://www.mediawiki.org/wiki/Doc_Your_Tool:_Creating_user-friendly_documentation [17:07:38] (03CR) 10Dzahn: [V: 03+2 C: 03+2] delete passwords::racktables [labs/private] - 10https://gerrit.wikimedia.org/r/1007008 (https://phabricator.wikimedia.org/T327405) (owner: 10Dzahn) [17:17:26] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4, 13Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9584635 (10fnegri) There are potentially other edge cases or race conditions. I created a new test volume `volume-8f14e78f-8c95-4bf4-b... [17:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [18:02:36] (03PS2) 10Dzahn: delete grafana password classes [labs/private] - 10https://gerrit.wikimedia.org/r/1007011 [18:05:35] (03CR) 10Dzahn: [C: 03+1] "still exist in private repo with real passwords and a comment "Deprecated 2017-01-18"" [labs/private] - 10https://gerrit.wikimedia.org/r/1007011 (owner: 10Dzahn) [18:12:48] (PuppetFailure) firing: Puppet has failed on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:12:53] 06cloud-services-team: PuppetFailure Puppet failure on cloudcumin1001:9100 - https://phabricator.wikimedia.org/T358702 (10phaultfinder) [18:17:07] (03CR) 10Dzahn: "thanks, yep. btw I also deleted the "etherpad" (not etherpad_lite) passwords class in the actually private repo" [labs/private] - 10https://gerrit.wikimedia.org/r/1007331 (https://phabricator.wikimedia.org/T316421) (owner: 10Jelto) [18:27:48] (PuppetFailure) firing: (2) Puppet has failed on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:27:51] 06cloud-services-team: PuppetFailure - https://phabricator.wikimedia.org/T358705 (10phaultfinder) [18:54:04] 10Toolforge iteration 06: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9585090 (10LucasWerkmeister) Seems to work like a charm, thanks a lot! The “only terminate old pod once new one is ready” seems to behave as expected: {F42220354} (The fact that the pods often seem to n... [18:57:48] (PuppetFailure) firing: (2) Puppet has failed on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:02:11] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:04:27] (03CR) 10Dzahn: [C: 03+2] "entire module was deleted in https://gerrit.wikimedia.org/r/c/operations/puppet/+/739658 and passwords::tor is also gone and deleted in pr" [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 (owner: 10Dzahn) [19:06:00] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:07:48] (PuppetFailure) resolved: (2) Puppet has failed on cloudcumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:10:22] (03PS2) 10Dzahn: delete passwords::tendril and passwords::bugzilla [labs/private] - 10https://gerrit.wikimedia.org/r/1007009 [19:11:23] (03CR) 10Dzahn: [C: 03+2] "service gone and passwords don't exist anymore in the private repo" [labs/private] - 10https://gerrit.wikimedia.org/r/1007009 (owner: 10Dzahn) [19:11:28] (03CR) 10Dzahn: [V: 03+2 C: 03+2] delete passwords::tendril and passwords::bugzilla [labs/private] - 10https://gerrit.wikimedia.org/r/1007009 (owner: 10Dzahn) [19:15:21] 10Toolforge iteration 06: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9585211 (10LucasWerkmeister) FYI, I added this to the [cookiecutter-toolforge](https://github.com/lucaswerkmeister/cookiecutter-toolforge) template / boilerplate for new tools now: https://github.com/lu... [19:52:02] 05Grid-Engine-to-K8s-Migration, 06Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9585346 (10MusikAnimal) @komla Going by the [[ https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#Timeline | timeline ]], I understand you're aiming to s... [21:12:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:17:41] (CloudVPSDesignateLeaks) firing: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:18:04] 05Grid-Engine-to-K8s-Migration, 06Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9585540 (10bd808) >>! In T306888#9585346, @MusikAnimal wrote: > Unfortunately, CopyPatrol can't be migrated until a legal agreement with Turnitin (the copyvio detection service... [21:21:16] 05Grid-Engine-to-K8s-Migration, 06Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9585564 (10bd808) >>! In T306888#9585540, @bd808 wrote: >>>! In T306888#9585346, @MusikAnimal wrote: >> Unfortunately, CopyPatrol can't be migrated until a legal agreement with... [21:31:21] 10Wikibugs: wikibugs only shows milestone name without parent project name - https://phabricator.wikimedia.org/T358653#9585576 (10bd808) Sprint tags seem to be controversial in a number of ways: * {T94761} * {T94318} * {T161249} Those tasks are all advocating for them to be ignored or shortened. This one is adv... [21:34:52] 10Wikibugs: wikibugs only shows milestone name without parent project name - https://phabricator.wikimedia.org/T358653#9585585 (10bd808) >>! In T1176#9582909, @Peachey88 wrote: > Looks like the API is only grabbing the milestone name, instead of the full project name. Two examples below, their full names would b... [21:39:04] (03CR) 10BryanDavis: [C: 03+2] templates: Update links pointing to obsolete Help:Tool_Labs pages [labs/striker] - 10https://gerrit.wikimedia.org/r/1007289 (https://phabricator.wikimedia.org/T358615) (owner: 10Majavah) [21:40:48] (03Merged) 10jenkins-bot: templates: Update links pointing to obsolete Help:Tool_Labs pages [labs/striker] - 10https://gerrit.wikimedia.org/r/1007289 (https://phabricator.wikimedia.org/T358615) (owner: 10Majavah) [21:42:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:08:56] 10Tools, 06cloud-services-team: cewbot k8s-20230418.fix-redirected-wikilinks-of-templates.out is unreasonably large - https://phabricator.wikimedia.org/T358555#9585658 (10bd808) I don't think the truncate worked @taavi: `lang=shell-session tools-sgebastion-11 tools.cewbot 22:06:27 ~ ls -lh k8s-20230418.fix-red... [22:10:49] 10Tools, 06cloud-services-team: cewbot k8s-20230418.fix-redirected-wikilinks-of-templates.out is unreasonably large - https://phabricator.wikimedia.org/T358555#9585663 (10bd808) >>! In T358555#9585658, @bd808 wrote: > I don't think the truncate worked @taavi: > `lang=shell-session > tools-sgebastion-11 tools.c... [22:20:40] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#9585692 (10bd808) The tool is disabled per {T319713} (in the Disabled column). There is apparently a pending adoption request at {T338555}. [22:26:26] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#9585701 (10bd808) The `lr` sub-tool that is consuming the disk space has a main process that downloads files listed in https://commons.wikimedia.org/wiki/Category:License_review_needed and never deletes them. Seems like we... [22:30:48] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#9585711 (10bd808) `lang=shell-session bd808@tools-nfs-2:/srv/tools/project/eatchabot/lr/downloaded_images$ du -sh . 58G . bd808@tools-nfs-2:/srv/tools/project/eatchabot/lr/downloaded_images$ sudo find . -type f -delete... [22:32:42] 10Tools: eatchabot using a lot of NFS storage - https://phabricator.wikimedia.org/T284968#9585713 (10bd808) Most of the remaining file usage for this tool is in `$HOME/www/python/src/image_hash_db.sqlite`, a 7.3G database stored on NFS. :/ [22:35:22] 10Cloud-VPS, 10FY2023/2024-Q3-Q4, 05Goal, 10Puppet 7.0: Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9585724 (10Andrew) [22:36:24] 10Cloud-VPS, 06cloud-services-team, 13Patch-For-Review, 10Puppet 7.0: Andrew tries to make a cloud-vps puppet7 server - https://phabricator.wikimedia.org/T351468#9585722 (10Andrew) 05Open→03Invalid The fix for this is to first switch the VM to puppet7 via hiera (profile::puppet::agent::force_puppet7: t... [22:51:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:01:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [23:06:01] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse