[00:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:16:56] FIRING: [2x] SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:41:29] FIRING: NodeTextfileStale: Stale textfile for an-redacteddb1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [06:16:56] FIRING: [2x] SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:41:29] FIRING: NodeTextfileStale: Stale textfile for an-redacteddb1001:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [06:47:31] 06cloud-services-team, 10Data-Services, 06Data-Platform-SRE, 06DBA: Prepare and check storage layer for idwikivoyage - https://phabricator.wikimedia.org/T381079#10369965 (10Marostegui) a:05ABran-WMF→03None Wiki created by @tstarling I have: Redacted Grants created `_p` database created. Ready for v... [07:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:12:27] 10Tool-schedule-deployment: ScheduleDeploymentBot should refuse to add more than 6 patches to a backport window - https://phabricator.wikimedia.org/T367229#10370039 (10kostajh) >>! In T367229#9887013, @bd808 wrote: > The implementation for this may end up being fragile, but I think it is worth attempting. The fr... [08:14:02] 10Tool-schedule-deployment: Allow scheduling for current backport window - https://phabricator.wikimedia.org/T381237 (10kostajh) 03NEW [08:46:59] (03PS1) 10Urbanecm: [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [08:47:42] (03CR) 10CI reject: [V:04-1] [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [08:50:53] (03PS2) 10Urbanecm: [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [08:52:05] (03CR) 10CI reject: [V:04-1] [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [08:53:28] (03PS3) 10Urbanecm: [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [08:54:49] (03CR) 10CI reject: [V:04-1] [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [08:55:48] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10ci-test-error (WMF-deployed Build Failure): Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org (2024-11-27) - https://phabricator.wikimedia.org/T380991#10370170 (10hashar... [08:57:40] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10370168 [08:59:23] (03PS4) 10Urbanecm: [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [09:00:44] (03CR) 10CI reject: [V:04-1] [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [09:04:21] (03PS5) 10Urbanecm: [DNM] Test CI [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [09:07:20] (03PS6) 10Urbanecm: Fix CI for Wikinity [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 [09:07:30] (03PS3) 10Pppery: Fix admin-description-item to replace 'item number' with 'item ID' [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099358 (https://phabricator.wikimedia.org/T331193) (owner: 10Ravitej Neeli) [09:10:34] (03CR) 10Urbanecm: [C:03+2] Fix CI for Wikinity [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [09:10:54] (03CR) 10Urbanecm: [C:03+2] "I just went ahead and fixed it :). Passes now." [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099358 (https://phabricator.wikimedia.org/T331193) (owner: 10Ravitej Neeli) [09:12:55] (03Merged) 10jenkins-bot: Fix CI for Wikinity [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099640 (owner: 10Urbanecm) [09:12:56] (03Merged) 10jenkins-bot: Fix admin-description-item to replace 'item number' with 'item ID' [labs/tools/wikinity] - 10https://gerrit.wikimedia.org/r/1099358 (https://phabricator.wikimedia.org/T331193) (owner: 10Ravitej Neeli) [09:17:22] (03update) 10sstefanova: add prometheus stats [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/10 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) (owner: 10dcaro) [09:23:21] (03update) 10sstefanova: add prometheus stats [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/10 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) (owner: 10dcaro) [09:23:30] (03approved) 10sstefanova: add prometheus stats [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/10 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) (owner: 10dcaro) [09:30:15] (03PS1) 10Urbanecm: deploy: Do not load venv-bastion [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1099648 [09:30:42] (03CR) 10Urbanecm: [C:03+2] deploy: Do not load venv-bastion [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1099648 (owner: 10Urbanecm) [09:31:03] (03Merged) 10jenkins-bot: deploy: Do not load venv-bastion [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1099648 (owner: 10Urbanecm) [09:44:04] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 16): [components-api] Add functional tests for the components api - https://phabricator.wikimedia.org/T379092#10370318 (10Slst2020) a:03Slst2020 [09:45:18] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 16): [components-api] Add functional tests for the components api - https://phabricator.wikimedia.org/T379092#10370315 (10Slst2020) 05Open→03In progress [09:49:46] 06cloud-services-team, 10Toolforge: Can't pip install mysqlclient on Toolforge - https://phabricator.wikimedia.org/T349341#10370341 (10Urbanecm) I just ran into this as well. For some reason, this works for me from Python 3.9, but not Python3.11. [09:56:45] 06cloud-services-team, 10Toolforge: [jobs-api] Cleanup deprecated blueprints - https://phabricator.wikimedia.org/T365015#10370382 (10Slst2020) 05Open→03Resolved a:03Slst2020 I did this as part of the migration to the standardized path naming a few months ago. Only the current blueprints are left: `... [09:57:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Add permissions for Komla to run WMCS cookbooks - https://phabricator.wikimedia.org/T379159#10370412 (10elukey) 05Stalled→03Declined Setting it to declined for the moment, please re-open when a final decisi... [10:13:08] 06cloud-services-team, 10Toolforge: [components-api] Add minimal cli with build-only features - https://phabricator.wikimedia.org/T362082#10370459 (10Slst2020) We pivoted to implementing end-to-end deployment of a continuous job before adding build-related features. The minimal cli already exists, but build fe... [10:16:56] FIRING: [2x] SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:19:53] 10Toolforge (Toolforge iteration 16): [components-api] add basic prometheus instrumentation - https://phabricator.wikimedia.org/T381249 (10Slst2020) 03NEW [11:49:22] 06cloud-services-team: SystemdUnitDown kiwix-mirror-update.service - https://phabricator.wikimedia.org/T381212#10370962 (10fnegri) p:05Triage→03High [11:52:21] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T381211#10370967 (10fnegri) This unit is now failing on both clouddumps nodes: {T381212} [11:53:09] 06cloud-services-team: SystemdUnitDown kiwix-mirror-update.service - https://phabricator.wikimedia.org/T381212#10370972 (10fnegri) [11:53:19] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T381211#10370970 (10fnegri) →14Duplicate dup:03T381212 [12:09:21] (03open) 10sstefanova: metrics: add prometheus instrumentation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/46 (https://phabricator.wikimedia.org/T381249) [12:21:19] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/watch-translations] - 10https://gerrit.wikimedia.org/r/1099680 (owner: 10L10n-bot) [12:24:44] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266 (10Sarai-WMF) 03NEW [12:24:49] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10371114 (10Sarai-WMF) a:03Sarai-WMF [12:41:42] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10371173 (10Sarai-WMF) [13:25:03] (03update) 10sstefanova: metrics: add prometheus instrumentation [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/46 (https://phabricator.wikimedia.org/T381249) [13:27:14] 10Toolforge (Toolforge iteration 16), 13Patch-For-Review: [components-api] add basic prometheus instrumentation - https://phabricator.wikimedia.org/T381249#10371262 (10Slst2020) 05Open→03In progress [13:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:11:43] 06cloud-services-team, 10Toolforge: [toolsdb] Remove floating IP - https://phabricator.wikimedia.org/T381272 (10fnegri) 03NEW [14:16:57] FIRING: [2x] SystemdUnitDown: The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:18:06] 06cloud-services-team, 10Toolforge: toolforge jobs load errors with 404 repetatively - https://phabricator.wikimedia.org/T381273 (10Urbanecm) 03NEW [14:24:27] (03PS1) 10Urbanecm: cswiki: Add ukoly/bezNadstranky [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1099705 [14:24:36] (03CR) 10Urbanecm: [C:03+2] cswiki: Add ukoly/bezNadstranky [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1099705 (owner: 10Urbanecm) [14:24:57] (03Merged) 10jenkins-bot: cswiki: Add ukoly/bezNadstranky [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1099705 (owner: 10Urbanecm) [14:28:47] 06cloud-services-team, 10Toolforge: toolforge jobs load errors with 404 repetatively - https://phabricator.wikimedia.org/T381273#10371500 (10Slst2020) Has this worked for you in the past, then started failing today? [14:33:14] 06cloud-services-team, 07affects-Kiwix-and-openZIM: SystemdUnitDown kiwix-mirror-update.service - https://phabricator.wikimedia.org/T381212#10371542 (10Andrew) The logs show this as the issue: ` rsync: [Receiver] failed to connect to master.download.kiwix.org (135.181.224.247): Connection refused ` @Benoit7... [14:37:03] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T381275 (10phaultfinder) 03NEW [14:41:23] 10VPS-project-Wikistats: Add idwikivoyage to wikistats - https://phabricator.wikimedia.org/T381084#10371579 (10Dzahn) a:03Dzahn [14:45:46] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects: [trove] Database quota values are not updating correctly - https://phabricator.wikimedia.org/T373348#10371612 (10fnegri) [14:54:01] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects: [trove] Database quota values are not updating correctly - https://phabricator.wikimedia.org/T373348#10371631 (10fnegri) tofuinfratest is again failing to create Trove instances because of this. But this time there were also some Trove instances stuck in `E... [15:07:13] 06cloud-services-team: Kernel error Server an-redacteddb1001 may have kernel errors - https://phabricator.wikimedia.org/T379571#10371686 (10fnegri) This alert is now firing: `Stale textfile for an-redacteddb1001:9100 /var/lib/prometheus/node.d/kernel-panic.prom`. I just deleted the file and the alert went a... [15:27:31] 10wikitech.wikimedia.org, 10Wikimedia-Site-requests, 10MW-1.44-notes (1.44.0-wmf.5; 2024-11-25), 13Patch-For-Review: fold contentadmin group to sysop in Wikitech - https://phabricator.wikimedia.org/T375950#10371897 (10taavi) 05Open→03Resolved [16:25:48] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10372160 (10Andrew) I believe paws is running on the following neutron IPs: 172.16.5.253 172.16.5.100 172.16.1.161 172.16.0.46 172.16.5.229 172.16.1.198 [16:31:07] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), 10Release-Engineering-Team (Seen): Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup... - https://phabricator.wikimedia.org/T374830#10372189 [16:31:19] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10372199 (10fnegri) The list from kubectl is slightly different: ` paws-127a-m3mctzr7itba-master-0 Ready master 95d v1.27.8 172.16.1.198 paws-127a-m3mctzr7itba-node-0 Ready FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:40:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:41:41] 10Tool-schedule-deployment: Allow scheduling for current backport window - https://phabricator.wikimedia.org/T381237#10372236 (10bd808) How many minutes of slack in adding content to a deployment window are reasonable? The current logic for displaying windows available for scheduling uses the window start time a... [16:53:18] 06Toolforge-standards-committee, 10Tools, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: OAuth credentials of Cradle tool are world-readable on Toolforge - https://phabricator.wikimedia.org/T314135#10372270 (10sbassett) p:05Triage→03High [16:58:28] 06cloud-services-team, 10Toolforge, 06Design-Research, 07Design: Toolforge UI: Publish newcomer experience and recruitment survey - https://phabricator.wikimedia.org/T381266#10372316 (10Sarai-WMF) [16:58:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-12 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:13:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-12 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:19:25] 06cloud-services-team, 10Tools, 07Puppet: Too many puppet facts on toolforge k8s workers - https://phabricator.wikimedia.org/T381293 (10Andrew) 03NEW [17:23:25] 06cloud-services-team, 10Toolforge, 07Puppet: Too many puppet facts on toolforge k8s workers - https://phabricator.wikimedia.org/T381293#10372452 (10taavi) [17:23:42] 06cloud-services-team, 10Data-Services, 06DBA, 06Privacy Engineering: Create views for SecurePoll db tables in Toolforge replicas - https://phabricator.wikimedia.org/T381197#10372446 (10sbassett) Rerouting to #privacy_engineering, though this will likely be very low-priority for them. [17:47:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:52:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:56:54] 10Tool-Pageviews, 10Tool-wikistatistics2-0, 06Data Products, 06Data-Engineering, and 2 others: Pageviews Analysis 3.0 (Vue + Codex) - https://phabricator.wikimedia.org/T378549#10372625 (10Ottomata) [18:04:24] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10372693 (10fnegri) The "Network I/O" graph from the [[ https://grafana.wmcloud.org/d/eV0M3UyVk/paws-usage-statistics?orgId=1 | Paws usage statistics ]] Grafana dashboard confirms big spikes of "... [18:16:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:17:34] 06Toolforge-standards-committee, 10Tools, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: OAuth credentials of Cradle tool are world-readable on Toolforge - https://phabricator.wikimedia.org/T314135#10372730 (10LucasWerkmeister) >>! In T314135#10370162, @Magnus wrote: > Fixed now. Great, thanks! [18:17:37] 10Tool-bluehillbotb: Migrate BluehillBot B from pywikibot to .NET - https://phabricator.wikimedia.org/T381303 (10Bluehill395) 03NEW [18:21:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:26:08] 10Tool-wikiqanda, 06Future-Audiences: Test out logging before publicly announcing internal Slack launch - https://phabricator.wikimedia.org/T381010#10372804 (10etz) a:03etz [18:26:13] 10Tool-wikiqanda, 06Future-Audiences, 07Epic: Create documentation & qual feedback form for internal bot release - https://phabricator.wikimedia.org/T379791#10372797 (10Maryana) 05Open→03Resolved [18:27:25] 10Tool-bluehillbotb: Migrate BluehillBot B from pywikibot to .NET - https://phabricator.wikimedia.org/T381303#10372806 (10Bluehill395) p:05Triage→03Medium [18:34:24] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10372849 (10cmooney) @aborrero might it be an idea to put a special rule on the cloudgw to NAT these IPs to a different outside, public IPv4? That way we could identify them in our netflow logs... [18:40:35] 10Tools, 07SecTeam-Processed, 07Security, 07Vuln-Infoleak: 3 public Python tool configuration files - https://phabricator.wikimedia.org/T381027#10372856 (10sbassett) 05Open→03Resolved p:05Triage→03Medium [18:46:07] (03open) 10raymond-ndibe: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) [18:46:17] (03update) 10raymond-ndibe: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) [18:56:38] (03open) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [18:56:44] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [19:08:47] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10373006 (10cmooney) I took some liberties and created a dashboard on the wmcloud.org grafana to show stats for each of those instances: https://grafana.wmcloud.org/goto/9fr1GSVHz One finding i... [19:20:44] (03PS1) 10Raymond Ndibe: [wmcs-cookbooks] make extraction of toolforge_get_versions less brittle [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099776 (https://phabricator.wikimedia.org/T358225) [19:21:08] (03PS1) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [19:22:12] (03update) 10stwalkerster: Allow configuring the provider by environment variables [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/2 [19:24:10] (03update) 10raymond-ndibe: [maintain-harbor] avoid graphql query complexity error [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/39 (https://phabricator.wikimedia.org/T358225) [19:24:36] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [19:24:36] (03update) 10raymond-ndibe: [toolforge-deploy] more bug fixes [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/630 (https://phabricator.wikimedia.org/T358225) [19:31:40] (03PS2) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [19:35:55] 06cloud-services-team, 07affects-Kiwix-and-openZIM: SystemdUnitDown kiwix-mirror-update.service - https://phabricator.wikimedia.org/T381212#10373184 (10Benoit74) Our server is down, we are restoring. You can watch status at https://status.kiwix.org/781867852 and updates at https://mastodon.social/@kiwix/ [19:35:55] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [19:42:30] (03PS3) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [19:42:59] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) (owner: 10Raymond Ndibe) [19:49:02] (03PS4) 10Raymond Ndibe: [wmcs-cookbooks] pass --branch to run_functional_tests.sh [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1099777 (https://phabricator.wikimedia.org/T358225) [19:52:42] 06cloud-services-team, 10Cloud-VPS: cloudgw: suspected network problems - https://phabricator.wikimedia.org/T381078#10373279 (10cmooney) As it turns out all of these devices are in rack D5, which explains why looking at the sflow data (which we only have for E4/F4) didn't show me much. In fact they are all on... [20:00:43] 10Tool-wikiqanda, 06Future-Audiences, 07Design: Bot user personas - https://phabricator.wikimedia.org/T381009#10373302 (10Maryana) [20:24:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:29:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:37:18] 10VPS-project-Codesearch: Allow searching for plain text in CodeSearch - https://phabricator.wikimedia.org/T381325 (10SomeRandomDeveloper) 03NEW [21:46:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:51:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:52:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:57:39] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:51:23] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10373975 (10amastilovic) |**Wikitech account/LDAP:**| AMastilovic-WMF| |**SUL account**| AMastilovic-WMF| |**Account linked on [[ https://idm.wikimedia.org/ | IDM ]]** |Y| |**I have visited [... [23:21:23] 10VPS-Projects, 10fundraising-tech-ops, 10Puppet (Puppet 7.0): Update puppet civicrm-prototype puppetmaster - https://phabricator.wikimedia.org/T361595#10374114 (10Dwisehaupt) @Andrew Thanks so much for that. I'm sure some of that was my doing or due to how my workflow was. I'll work through the changes and... [23:21:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:30:57] 10Tool-wikiqanda, 06Future-Audiences: Test out logging before publicly announcing internal Slack launch - https://phabricator.wikimedia.org/T381010#10374153 (10etz) 05Open→03Resolved [23:31:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks