[00:31:27] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [00:33:47] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [01:28:45] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [01:36:45] (JobUnavailable) firing: (2) Reduced availability for job redis_gitlab in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:41:45] (JobUnavailable) firing: (8) Reduced availability for job nginx in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:46:45] (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [01:49:57] RECOVERY - SSH on db1101.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:51:45] (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:06:45] (JobUnavailable) firing: (6) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:07:50] Hey guys. Can I get some clarification on something? [02:08:06] I got this while trying to diagnose erratic IABot behavior [02:08:07] If you report this error to the Wikimedia System Administrators, please include the details below.

Request from 185.15.56.22 via cp1085 cp1085, Varnish XID 1062733635
Upstream caches: cp1085 int
Error: 429, Too Many Requests at Mon, 19 Sep 2022 02:02:55 GMT [02:08:41] I know IABot can be a bit heavy on I/O sometimes, but it shouldn't be THAT heavy to trigger this. [02:11:45] (JobUnavailable) resolved: (5) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:17:42] Cyberpower678: rate limits are often dynamic, changing based on whatever is going on. I would recommend that your bot just wait for the length specified in the `retry-after`and retry [02:19:07] legoktm: It would be nice to know if I can get an idea of just how much the bot is pushing the production servers though. It shouldn't ever be hitting it hard to warrant a 429. Except maybe, when it initializes and tries to import a bunch of template metadata. [02:19:13] PROBLEM - SSH on mw1311.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:19:37] I can't answer that part, but I think the 429 is the sign to slow down [02:20:29] It's a little harder than that. The bot is a global bot running on 140+ ish wikis concurrently. Each wiki runs on a different process, and they don't really communicate with each other. [02:20:44] (This is something that will be addressed in the IABot rewrite being planned) [02:21:39] So another process won't know that it's rate limiting production and keep retrying. [02:22:31] But either way, legoktm do you think you could poke an op to maybe feed me some request logs originating from an IABot UA? [02:22:55] rate limits are global [02:23:05] Well that's unfortunate. [02:23:06] which is why building in 429 handling is important [02:23:15] I thought it was local [02:23:33] This also presents somewhat of a scaling issue. [02:23:33] MediaWiki rate limits are per-wiki (though some are global!), but these are enforced by the caching layer [02:23:54] if you need logs, it would be best to file a task under SRE in Phab [02:24:07] Link? [02:24:23] https://phabricator.wikimedia.org/project/view/1025/ [02:24:41] * Cyberpower678 notes that the new version will have much more extensive logging built in. [02:25:03] Thank you [02:27:48] :) [02:32:36] 10SRE, 10InternetArchiveBot: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10Cyberpower678) [02:32:45] 10SRE, 10InternetArchiveBot: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10Cyberpower678) p:05Triage→03High [02:36:01] legoktm: ^ [02:36:40] ok, it'll get triaged by whoever is on clinic duty this week [02:36:47] Let's hope that there's just some big inefficiency going on that can easily be dealt with. Otherwise, the bot's going to be down for a bit while I work to implement 429 handling [03:20:27] RECOVERY - SSH on mw1311.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:54:25] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [05:02:56] 10SRE, 10Commons, 10WMF-General-or-Unknown: Upload to Commons fails with a common ADSL connection in Taiwan - https://phabricator.wikimedia.org/T205619 (10YitNat) Same issue. Uploading 20 MB webm file on 80 KB/s speed upload connection. On Chromium (Brave Browser) it shows: > ERR_HTTP2_PROTOCOL_ERROR On... [05:57:01] (03PS2) 10ArielGlenn: switch snapshot hosts to use php7.4 [puppet] - 10https://gerrit.wikimedia.org/r/827954 (https://phabricator.wikimedia.org/T271736) [05:59:01] (03CR) 10ArielGlenn: [C: 03+2] switch snapshot hosts to use php7.4 [puppet] - 10https://gerrit.wikimedia.org/r/827954 (https://phabricator.wikimedia.org/T271736) (owner: 10ArielGlenn) [06:17:50] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832715 (https://phabricator.wikimedia.org/T314318) (owner: 10Arlolra) [06:49:43] PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:00:04] Amir1 and Urbanecm: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T0700). [07:00:04] MdsShakil: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:01:08] o/ [07:01:10] hi MdsShakil, around? [07:01:21] urbanecm: yes [07:01:45] great! [07:04:00] (03CR) 10Urbanecm: [C: 03+2] Remove unnecessary wgNamespaceAliases from bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832683 (https://phabricator.wikimedia.org/T318003) (owner: 10MdsShakil) [07:04:46] (03Merged) 10jenkins-bot: Remove unnecessary wgNamespaceAliases from bnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832683 (https://phabricator.wikimedia.org/T318003) (owner: 10MdsShakil) [07:05:37] MdsShakil: your patch is at mwdebug1001, please test [07:08:59] MdsShakil: how is it going? [07:09:32] urbanecm: sorry, looking good to me [07:10:23] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:11:20] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:11:21] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:11:47] MdsShakil: great, syncing [07:12:10] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:16:46] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 4a6c1ddf5cd1a46ab05f5d6fda4b938a3ee37238: Remove unnecessary wgNamespaceAliases from bnwiki (T318003) (duration: 04m 16s) [07:16:50] T318003: Remove unnecessary wgNamespaceAliases from bnwiki - https://phabricator.wikimedia.org/T318003 [07:16:52] And, done. [07:16:59] Took bit longer than expected, but succeeded. [07:17:52] urbanecm: Thank you [07:22:16] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [07:26:41] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [07:26:42] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [07:30:47] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [07:32:42] hi, I might add a config patch to this window, if there's time [07:49:34] (03PS1) 10Kosta Harlan: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832959 (https://phabricator.wikimedia.org/T314518) [07:50:43] (03PS2) 10Kosta Harlan: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832959 (https://phabricator.wikimedia.org/T314518) [07:51:01] cc urbanecm ^ [07:51:15] I can wait until later if we're too close to end of the window [07:58:56] meh, let's leave it for later [08:01:45] PROBLEM - SSH on restbase2012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:03:15] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [08:17:55] PROBLEM - Check systemd state on logstash1010 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:40:02] (03CR) 10Gergő Tisza: [C: 03+1] GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832959 (https://phabricator.wikimedia.org/T314518) (owner: 10Kosta Harlan) [08:52:56] kostajh: sorry, i wasnt monitoring IRC after the deployment :/ [08:53:12] Looking forward to seeing that project on more wikis though! [08:53:55] urbanecm: it's ok, ran into some issues with updating MediaWiki:NewcomerTasks.json anyway [09:12:59] RECOVERY - Check systemd state on logstash1010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:04:23] kostajh: i see. lmk if i can help with those issues somehow [10:36:59] (03PS1) 10Andrew Bogott: Cloudvirts: remove libguestfs-tools dependency [puppet] - 10https://gerrit.wikimedia.org/r/832977 (https://phabricator.wikimedia.org/T317344) [10:46:07] (03CR) 10Andrew Bogott: "This is causing a very noisy type mismatch on all the gitlab-runner nodes, maybe an encoding issue?" [puppet] - 10https://gerrit.wikimedia.org/r/832584 (https://phabricator.wikimedia.org/T317904) (owner: 10Dduvall) [10:49:03] (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:54:03] (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:19:35] (03PS1) 10KartikMistry: Update cxserver to 2022-09-15-113346-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/832989 (https://phabricator.wikimedia.org/T317289) [11:23:53] (03PS1) 10KartikMistry: testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832993 (https://phabricator.wikimedia.org/T317289) [11:59:45] (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832999 (https://phabricator.wikimedia.org/T316676) (owner: 10Awight) [12:13:21] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and Kerberos identity for CMyrick-WMF - https://phabricator.wikimedia.org/T317996 (10CMyrick-WMF) HI Brett, I would like to use to the [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter | JupyterH... [12:33:48] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Enable Tech Wishes survey on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832999 (https://phabricator.wikimedia.org/T316676) (owner: 10Awight) [12:36:46] (Traffic bill over quota) firing: Alert for device cr2-drmrs.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [12:39:43] (03PS3) 10Abijeet Patro: Add editcontentmodel right for translation administrators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/830817 (https://phabricator.wikimedia.org/T311587) [12:44:00] (03CR) 10Urbanecm: [C: 04-1] "code lgtm, see inline comment for commit message." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/830817 (https://phabricator.wikimedia.org/T311587) (owner: 10Abijeet Patro) [12:49:25] (03CR) 10Thcipriani: buildkitd: Support configuration of OCI executor nameservers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/832584 (https://phabricator.wikimedia.org/T317904) (owner: 10Dduvall) [12:56:46] (Traffic bill over quota) resolved: Alert for device cr2-drmrs.wikimedia.org - Traffic bill over quota - https://alerts.wikimedia.org/?q=alertname%3DTraffic+bill+over+quota [13:00:04] RoanKattouw, Lucas_WMDE, Urbanecm, and awight: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC afternoon backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T1300). [13:00:04] kostajh: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:11] I can deploy today! [13:00:12] hi kostajh [13:00:36] \o/ [13:01:16] hello [13:01:26] thanks urbanecm [13:01:48] kostajh: afaics, all wikis but plwiki don't have image-recommendation in NewcomerTasks.json. Is that intended/ok? [13:02:02] urbanecm: eh, no... let me look again [13:02:06] sure [13:02:31] urbanecm: they should all have it, am I missing something? [13:03:19] kostajh: oh, my mistake. i was checking the page history, didn't realize it might get there via Special:EditGrowthConfig. [13:03:25] urbanecm: see https://phabricator.wikimedia.org/T314518#8245095 for edits made to support this patch. Only plwiki needed the addition of image-recommendation, the others already had it [13:03:34] yep, missed that. sorry :) [13:03:35] let's go ahead! [13:03:39] whew :) [13:03:39] (03CR) 10Urbanecm: [C: 03+2] GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832959 (https://phabricator.wikimedia.org/T314518) (owner: 10Kosta Harlan) [13:04:39] (03Merged) 10jenkins-bot: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832959 (https://phabricator.wikimedia.org/T314518) (owner: 10Kosta Harlan) [13:05:16] kostajh: pulled to mwdebug1001. can you check please? [13:05:23] urbanecm: yep, looking [13:08:31] RECOVERY - SSH on restbase2012.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:09:35] urbanecm: the feature looked good for existing users, but seeing a JS error on task type selection for new accounts. Not sure if it's related, need a few minutes [13:09:45] sure, waiting [13:10:05] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:10:11] (03PS1) 10MVernon: hieradata: remove ms-be20[28-39] from swift::storagehosts [puppet] - 10https://gerrit.wikimedia.org/r/833007 (https://phabricator.wikimedia.org/T294549) [13:11:43] urbanecm: looks good [13:11:54] i see a DB error in logstash: `Error connecting to db1189 as user wikiuser202206: :real_connect(): (HY000/2002): Connection refused` [13:12:20] there is no way how a GE config change can cause that, but it is suspicious anyway [13:12:32] urbanecm: the error I got was attempting to set task filters as a logged-out user. I guess I was somehow logged-out. [13:12:40] 10SRE, 10MediaWiki-extensions-CodeReview, 10Platform Engineering, 10serviceops-radar, 10Patch-For-Review: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org - https://phabricator.wikimedia.org/T205361 (10Jdforrester-WMF) >>! In T205361#8060540, @gerritbot wrote: > Change 774943... [13:12:48] ack. syncing. [13:12:49] probably another manifestation of T299193 [13:12:50] T299193: MediaWiki login failure due to race condition with session cookie - https://phabricator.wikimedia.org/T299193 [13:14:16] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:14:17] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [13:17:09] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: cbf161d148228e0e706813f923ab1a5d4b42757a: GrowthExperiments: Enable image recommendations for el/pl/zh/id/ro (T314518) (duration: 04m 01s) [13:17:12] T314518: Scale: deploy "add an image" to el, pl, zh, id, ro - https://phabricator.wikimedia.org/T314518 [13:17:15] kostajh: and should be live [13:17:16] anything else? [13:17:23] \o/ [13:17:47] urbanecm: I don't think so. I had a question about whether we should roll out mentor overview Vue to all wikis, but we can leave that for another time, if you want to wait longer [13:18:17] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [13:18:48] I'm leaving that to Sergio. Personally, I didn't see any complaints, and I'm comfortable rolling out, but I think it should be mainly Sergio's call, as he's working on the migration. [13:20:20] urbanecm: ack [13:52:53] urbanecm, still around? [13:52:59] zabe: yes, what's up? [13:53:29] would you have time to deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/832623? the window isn't over yet ;) [13:53:33] sure [13:53:55] cool :) [13:54:02] (03CR) 10Urbanecm: [C: 03+2] Regenerate ukwikivoyage logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832623 (https://phabricator.wikimedia.org/T317718) (owner: 10Zabe) [13:54:05] (03PS2) 10Urbanecm: Regenerate ukwikivoyage logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832623 (https://phabricator.wikimedia.org/T317718) (owner: 10Zabe) [13:54:09] (03CR) 10Urbanecm: [C: 03+2] Regenerate ukwikivoyage logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832623 (https://phabricator.wikimedia.org/T317718) (owner: 10Zabe) [13:55:32] (03Merged) 10jenkins-bot: Regenerate ukwikivoyage logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832623 (https://phabricator.wikimedia.org/T317718) (owner: 10Zabe) [13:57:11] zabe: pulled to mwdebug1001, can you verify, [13:57:12] ? [13:57:51] lgtm [13:57:57] urbanecm, ^ [13:58:04] thanks, deploying [13:58:37] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [13:59:36] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [13:59:37] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [14:00:32] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [14:02:10] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 6c7151d969b6997bd9cce042b7bc78c282dd9b26: Regenerate ukwikivoyage logo (T317718) (duration: 03m 46s) [14:02:14] T317718: Logo of Ukrainian Wikivoyage differs on different resolutions - https://phabricator.wikimedia.org/T317718 [14:02:15] zabe: and, live [14:02:21] purging the logo files now [14:03:12] thanks! [14:03:33] !log Purge https://en.wikipedia.org/static/images/project-logos/ukwikivoyage{.png,-1.5x.png,-2x.png} (T317718) [14:03:35] and, done [14:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:47] PROBLEM - Debian mirror in sync with upstream on mirror1001 is CRITICAL: /srv/mirrors/debian is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [15:07:39] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and Kerberos identity for CMyrick-WMF - https://phabricator.wikimedia.org/T317996 (10odimitrijevic) Approved! [15:07:51] PROBLEM - SSH on db1101.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:17:51] 10SRE, 10SRE-swift-storage, 10Data Engineering Planning, 10Wikidata, and 3 others: Clean up the rdf-streaming-updater-codfw container from thanos-swift. - https://phabricator.wikimedia.org/T316031 (10bking) [15:25:47] (03PS3) 10BCornwall: admin: Add cmyrick to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/832716 (https://phabricator.wikimedia.org/T317996) [15:30:05] jan_drewniak: I, the Bot under the Fountain, call upon thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T1530). [15:30:51] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and Kerberos identity for CMyrick-WMF - https://phabricator.wikimedia.org/T317996 (10BCornwall) [15:39:36] (03PS2) 10KartikMistry: testwiki: Enable Section Translation on haw, la, ps and, xh Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832993 (https://phabricator.wikimedia.org/T317289) [15:56:20] (03CR) 10Ssingh: "krb: present should be added since Kerberos identity was requested." [puppet] - 10https://gerrit.wikimedia.org/r/832716 (https://phabricator.wikimedia.org/T317996) (owner: 10BCornwall) [15:56:48] (03PS4) 10BCornwall: admin: Add cmyrick to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/832716 (https://phabricator.wikimedia.org/T317996) [15:58:43] (03CR) 10Ssingh: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/832716 (https://phabricator.wikimedia.org/T317996) (owner: 10BCornwall) [15:59:45] (03PS1) 10Ebernhardson: Add token_count subfield to outgoing_link [extensions/CirrusSearch] (wmf/1.40.0-wmf.1) - 10https://gerrit.wikimedia.org/r/833031 (https://phabricator.wikimedia.org/T317546) [16:02:53] (03CR) 10BCornwall: [C: 03+2] admin: Add cmyrick to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/832716 (https://phabricator.wikimedia.org/T317996) (owner: 10BCornwall) [16:04:55] (03PS1) 10PipelineBot: blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/833020 [16:12:23] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and Kerberos identity for CMyrick-WMF - https://phabricator.wikimedia.org/T317996 (10BCornwall) The request has been merged and you should have received the Kerberos password through email. @CMyrick-WMF Can you... [16:15:51] RECOVERY - Check systemd state on snapshot1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:23:25] (03PS1) 10Dduvall: P:gitlab::runner: $nameservers parameter type should match aliased [puppet] - 10https://gerrit.wikimedia.org/r/833046 (https://phabricator.wikimedia.org/T317904) [16:24:25] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [16:44:25] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and Kerberos identity for CMyrick-WMF - https://phabricator.wikimedia.org/T317996 (10BCornwall) a:05BCornwall→03CMyrick-WMF [16:50:48] (03CR) 10Dduvall: "Cherry picked on the standalone puppetmaster and seems to fix the noisy type mismatch errors." [puppet] - 10https://gerrit.wikimedia.org/r/833046 (https://phabricator.wikimedia.org/T317904) (owner: 10Dduvall) [16:51:09] (03CR) 10Ebernhardson: [V: 03+1 C: 04-1] "this might not be needed, we are alternatively considering munging the dumps in yarn and uploading the results to swift, this would cut 20" [puppet] - 10https://gerrit.wikimedia.org/r/832543 (https://phabricator.wikimedia.org/T222349) (owner: 10Ebernhardson) [16:56:19] (03PS3) 10Jforrester: ExtensionDistributor: Add REL1_39 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/829877 (https://phabricator.wikimedia.org/T313925) [16:57:59] jouncebot: next [16:57:59] In 0 hour(s) and 2 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T1700) [16:58:09] Oh well. [17:00:05] ryankemper: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T1700). [17:01:57] (03PS1) 10Zabe: build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/833057 [17:20:37] PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:36:04] !log dancy@deploy1002 Started scap: testing, disregard [17:36:26] !log dancy@deploy1002 dancy: testing, disregard synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet [17:36:30] !log dancy@deploy1002 Sync cancelled. [17:40:11] 10SRE, 10SRE-Access-Requests: Requesting access to Analytics for devnull - https://phabricator.wikimedia.org/T318104 (10Devnull) [17:42:52] !log dancy@deploy1002 Installing scap version "4.21.0" for 561 hosts [17:43:11] !log dancy@deploy1002 Installation of scap version "4.21.0" completed for 561 hosts [17:45:58] (KubernetesAPILatency) firing: High Kubernetes API latency (PATCH nodes) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [17:49:31] RECOVERY - Debian mirror in sync with upstream on mirror1001 is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [17:50:58] (KubernetesAPILatency) resolved: High Kubernetes API latency (PATCH nodes) on k8s@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [17:51:13] 10SRE, 10MediaWiki-extensions-CodeReview, 10Platform Engineering, 10serviceops-radar, 10Patch-For-Review: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org - https://phabricator.wikimedia.org/T205361 (10Krinkle) >>! In T205361#7945628, @Legoktm wrote: >>>! In T205361#7815573, @... [17:55:56] (03PS1) 10Dduvall: buildkitd: Install wmf-certificates for registry CA [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/833067 (https://phabricator.wikimedia.org/T318019) [17:58:14] (03CR) 10Dduvall: buildkitd: Bump version to 0.10.4 (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/830909 (owner: 10Dduvall) [18:03:05] PROBLEM - Check systemd state on ms-be1057 is CRITICAL: CRITICAL - degraded: The following units failed: rsync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:12:41] RECOVERY - Check systemd state on ms-be1057 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:49:34] 10SRE, 10Traffic, 10Performance-Team (Radar): Review socket balancing in ATS/Varnish traffic layers - https://phabricator.wikimedia.org/T248522 (10Krinkle) [18:53:38] (03PS8) 10BCornwall: Unlink certificate renewal and OCSP handling [software/acme-chief] - 10https://gerrit.wikimedia.org/r/820795 (https://phabricator.wikimedia.org/T244232) [19:00:09] (03PS1) 10Ebernhardson: sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 [19:01:10] (03PS9) 10BCornwall: Unlink certificate renewal and OCSP handling [software/acme-chief] - 10https://gerrit.wikimedia.org/r/820795 (https://phabricator.wikimedia.org/T244232) [19:01:23] (03CR) 10BCornwall: Unlink certificate renewal and OCSP handling (033 comments) [software/acme-chief] - 10https://gerrit.wikimedia.org/r/820795 (https://phabricator.wikimedia.org/T244232) (owner: 10BCornwall) [19:04:55] (03CR) 10CI reject: [V: 04-1] sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 (owner: 10Ebernhardson) [19:10:03] (03CR) 10Andrew Bogott: [C: 03+2] Cloudvirts: remove libguestfs-tools dependency [puppet] - 10https://gerrit.wikimedia.org/r/832977 (https://phabricator.wikimedia.org/T317344) (owner: 10Andrew Bogott) [19:11:51] (03PS2) 10Ebernhardson: sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 [19:14:17] PROBLEM - SSH on mw1316.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:14:45] 10SRE, 10DNS, 10Domains, 10WMF-Legal: Point wikipedia.in to 205.147.101.160 instead of URL forward - https://phabricator.wikimedia.org/T144508 (10BCornwall) [19:14:53] 10SRE, 10DNS, 10Domains, 10WMF-Legal: Point wikipedia.in to 205.147.101.160 instead of URL forward - https://phabricator.wikimedia.org/T144508 (10BCornwall) 05Open→03Invalid As the server appears to be dead and nearly all of those domain names being removed, I think this can safely be closed. If there'... [19:19:25] (03CR) 10Bking: [C: 03+2] sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 (owner: 10Ebernhardson) [19:22:39] (03CR) 10Bking: [V: 03+2 C: 03+1] sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 (owner: 10Ebernhardson) [19:22:41] (03CR) 10Bking: [V: 03+2 C: 03+2] sre.wdqs.data-reload: Simplify passing a timestamp for kafka [cookbooks] - 10https://gerrit.wikimedia.org/r/833082 (owner: 10Ebernhardson) [19:28:49] PROBLEM - SSH on analytics1077.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:30:30] !log bking@cumin2002 START - Cookbook sre.wdqs.data-reload [19:30:30] !log bking@cumin2002 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [19:31:04] !log bking@cumin2002 START - Cookbook sre.wdqs.data-reload [19:33:11] !log bking@cumin2002 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [19:33:17] !log bking@cumin2002 START - Cookbook sre.wdqs.data-reload [19:33:29] !log bking@cumin2002 END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) [19:35:18] 10SRE, 10PyBal, 10Traffic-Icebox: Backport ipvsadm - https://phabricator.wikimedia.org/T171850 (10BCornwall) 05Open→03Invalid ` $ sudo cumin '*lvs*' 'grep VERSION= /etc/os-release' [...] ----- OUTPUT of 'grep VERSION= /etc/os-release' -----... [19:35:20] 10SRE, 10PyBal, 10Traffic-Icebox: PyBal Feature: progressive depooling strategy for monitored failures - https://phabricator.wikimedia.org/T172124 (10BCornwall) [19:35:24] 10SRE, 10PyBal, 10Traffic-Icebox: IPVS issues with UDP services, pybal depooling strategy - https://phabricator.wikimedia.org/T172103 (10BCornwall) [20:00:05] RoanKattouw, Urbanecm, and cjming: Time to snap out of that daydream and deploy UTC late backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T2000). [20:00:05] arlolra, ebernhardson, James_F, and zabe: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:00:07] Adsum. [20:01:27] Anyone around to deploy? I can do it if needed. [20:01:53] hey o/ [20:01:58] i can deploy - just getting set up [20:02:06] Sure, thanks cjming. [20:02:35] here [20:02:48] (03CR) 10Jforrester: [C: 03+1] build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/833057 (owner: 10Zabe) [20:03:02] hi Arlolra - starting with yours [20:03:07] thanks [20:03:16] (03PS2) 10Clare Ming: Disable wgParserEnableLegacyMediaDOM on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832715 (https://phabricator.wikimedia.org/T314318) (owner: 10Arlolra) [20:04:49] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by cjming@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832715 (https://phabricator.wikimedia.org/T314318) (owner: 10Arlolra) [20:05:17] ebernhardson: are you around for your cirrus patch? [20:05:57] (03Merged) 10jenkins-bot: Disable wgParserEnableLegacyMediaDOM on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/832715 (https://phabricator.wikimedia.org/T314318) (owner: 10Arlolra) [20:06:15] !log cjming@deploy1002 Started scap: Backport for [[gerrit:832715|Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] [20:06:19] T314318: Disable wgParserEnableLegacyMediaDOM on all wikis - https://phabricator.wikimedia.org/T314318 [20:06:36] !log cjming@deploy1002 cjming and arlolra: Backport for [[gerrit:832715|Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet [20:06:37] Arlolra: can you verify on one of the test servers? [20:06:53] yup, one sec [20:08:07] Ok, looks good [20:08:31] great - going live [20:09:33] James_F: would you like to self-deploy your patches? (i'll do Erik's later if/when he shows up) [20:09:57] cjming: Sure! [20:10:04] And I can take zabe's whilst I'm at it? [20:10:15] be my guest [20:10:20] (03CR) 10Jforrester: [C: 03+2] ExtensionDistributor: Add REL1_39 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/829877 (https://phabricator.wikimedia.org/T313925) (owner: 10Jforrester) [20:10:27] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:11:10] (03Merged) 10jenkins-bot: ExtensionDistributor: Add REL1_39 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/829877 (https://phabricator.wikimedia.org/T313925) (owner: 10Jforrester) [20:11:26] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:11:27] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:12:22] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:12:47] !log cjming@deploy1002 Finished scap: Backport for [[gerrit:832715|Disable wgParserEnableLegacyMediaDOM on cswiki (T314318)]] (duration: 06m 31s) [20:12:50] T314318: Disable wgParserEnableLegacyMediaDOM on all wikis - https://phabricator.wikimedia.org/T314318 [20:12:53] Arlolra: your patch should be live [20:12:59] thanks [20:13:24] Yup, looks that way [20:13:46] (03PS2) 10Jforrester: Wikifunctions: Drop two config items moved to docker [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820459 [20:13:50] (03CR) 10Jforrester: [C: 03+2] Wikifunctions: Drop two config items moved to docker [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820459 (owner: 10Jforrester) [20:13:54] James_F: ping me when you're done if you don't mind [20:14:36] (03Merged) 10jenkins-bot: Wikifunctions: Drop two config items moved to docker [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820459 (owner: 10Jforrester) [20:14:48] cjming: Of course! [20:15:01] ty [20:15:30] * James_F twiddles thumbs waiting for fpm-restart. [20:16:49] !log jforrester@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:829877|ExtensionDistributor: Add REL1_39 (T313925)]] (duration: 03m 38s) [20:16:53] T313925: Add REL1_39 to ExtensionDistributor as development snapshot - https://phabricator.wikimedia.org/T313925 [20:17:28] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:17:54] (03PS2) 10Jforrester: build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/833057 (owner: 10Zabe) [20:17:56] (03CR) 10Jforrester: [C: 03+2] build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/833057 (owner: 10Zabe) [20:18:44] (03Merged) 10jenkins-bot: build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/833057 (owner: 10Zabe) [20:20:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:20:08] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:20:19] 10SRE, 10SRE-swift-storage, 10Data Engineering Planning, 10Wikidata, and 3 others: Clean up the rdf-streaming-updater-codfw container from thanos-swift. - https://phabricator.wikimedia.org/T316031 (10bking) Swiftly dies every few days due to 404s (a fairly common response from Swift when you ask it to dele... [20:20:30] (03PS1) 10Jcrespo: dbbackups: Disable notifcations and prepare db2100 for s7 [puppet] - 10https://gerrit.wikimedia.org/r/833099 (https://phabricator.wikimedia.org/T318062) [20:20:38] 10SRE, 10SRE-swift-storage, 10Data Engineering Planning, 10Wikidata, and 4 others: wdqs space usage on thanos-swift - https://phabricator.wikimedia.org/T314835 (10bking) [20:21:10] !log bking@cumin2002 START - Cookbook sre.wdqs.data-reload [20:21:21] !log jforrester@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:820459|Wikifunctions: Drop two config items moved to docker]] (duration: 03m 38s) [20:21:29] 10SRE, 10SRE-swift-storage, 10Data Engineering Planning, 10Wikidata, and 3 others: Clean up the rdf-streaming-updater-codfw container from thanos-swift. - https://phabricator.wikimedia.org/T316031 (10bking) 05Open→03Resolved p:05Medium→03Lowest [20:21:33] cjming: All done from my end. [20:21:37] zabe: Thanks, BTW. :-) [20:21:50] thanks! [20:21:58] thanks for deploying! :) [20:22:14] i'll hang out for a bit longer before closing backport window [20:22:41] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:27:44] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:30:00] cjming: hi! sorry i got distracted. i can deploy mine [20:30:27] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:30:28] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:30:48] ebernhardson: as you wish - good timing - i was about to close the backport window [20:31:22] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:31:47] (03CR) 10Ebernhardson: "backport window" [extensions/CirrusSearch] (wmf/1.40.0-wmf.1) - 10https://gerrit.wikimedia.org/r/833031 (https://phabricator.wikimedia.org/T317546) (owner: 10Ebernhardson) [20:31:55] (03CR) 10Ebernhardson: [C: 03+2] "backport window" [extensions/CirrusSearch] (wmf/1.40.0-wmf.1) - 10https://gerrit.wikimedia.org/r/833031 (https://phabricator.wikimedia.org/T317546) (owner: 10Ebernhardson) [20:51:47] (03Merged) 10jenkins-bot: Add token_count subfield to outgoing_link [extensions/CirrusSearch] (wmf/1.40.0-wmf.1) - 10https://gerrit.wikimedia.org/r/833031 (https://phabricator.wikimedia.org/T317546) (owner: 10Ebernhardson) [20:51:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [20:54:32] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [20:54:33] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [20:55:10] 10SRE, 10Traffic-Icebox, 10Wikimedia-Incident: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 (10BCornwall) 05Open→03Invalid I believe this to be the case. From https://wikitech.wikimedia.org/wiki/Caching_overview#2022: > In April 2022, we replaced ATS with HAProxy for TLS term... [20:55:25] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [20:59:11] !log ebernhardson@deploy1002 Synchronized php-1.40.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php: Backport: [[gerrit:833031|Add token_count subfield to outgoing_link (T317546)]] (duration: 03m 51s) [20:59:14] T317546: Add new elasticsearch field to index the number of outgoing links - https://phabricator.wikimedia.org/T317546 [20:59:56] !log end of UTC late backport window [20:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:05] Reedy, sbassett, Maryum, and manfredi: Time to snap out of that daydream and deploy Weekly Security deployment window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220919T2100). [21:00:32] !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply [21:03:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [21:03:10] !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply [21:03:51] !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [21:08:24] preparing to deploy two security patches [21:12:26] 10SRE, 10Analytics-Radar, 10Domains, 10Traffic-Icebox, 10WMF-General-or-Unknown: Don't set cookies in traffic layer for non-user facing domains (avoid false third-party cookie warning) - https://phabricator.wikimedia.org/T262996 (10BCornwall) a:03BCornwall [21:15:05] !log Deployed security patch for T312820 [21:15:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:59] RECOVERY - SSH on db1101.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:16:49] RECOVERY - SSH on mw1316.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:20:55] (03CR) 10MusikAnimal: rewrite.py: changes for Phonos deployment (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/831955 (https://phabricator.wikimedia.org/T317417) (owner: 10MusikAnimal) [21:21:51] !log mstyles@deploy1002 Synchronized php-1.40.0-wmf.1/extensions/Translate/src/: (no justification provided) (duration: 03m 40s) [21:21:57] !log Deployed security patch for T302479 [21:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:37] (03CR) 10Jcrespo: [C: 03+2] dbbackups: Disable notifcations and prepare db2100 for s7 [puppet] - 10https://gerrit.wikimedia.org/r/833099 (https://phabricator.wikimedia.org/T318062) (owner: 10Jcrespo) [21:50:55] (03PS2) 10Jcrespo: dbbackups: Disable notifications and prepare db2100 for s7 [puppet] - 10https://gerrit.wikimedia.org/r/833099 (https://phabricator.wikimedia.org/T318062) [21:51:05] (03CR) 10Jcrespo: [V: 03+2] dbbackups: Disable notifications and prepare db2100 for s7 [puppet] - 10https://gerrit.wikimedia.org/r/833099 (https://phabricator.wikimedia.org/T318062) (owner: 10Jcrespo) [21:56:57] RECOVERY - SSH on analytics1077.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:20:32] (03PS1) 10Jcrespo: dbbackups: Reenable notifications on db2100 after adding s7 [puppet] - 10https://gerrit.wikimedia.org/r/833124 (https://phabricator.wikimedia.org/T318062) [22:20:56] (03CR) 10Jcrespo: [C: 04-1] "Waiting for replication to catch up." [puppet] - 10https://gerrit.wikimedia.org/r/833124 (https://phabricator.wikimedia.org/T318062) (owner: 10Jcrespo) [22:37:12] (03CR) 10Thcipriani: [C: 03+1] P:gitlab::runner: $nameservers parameter type should match aliased [puppet] - 10https://gerrit.wikimedia.org/r/833046 (https://phabricator.wikimedia.org/T317904) (owner: 10Dduvall) [22:39:18] (03PS1) 10Dduvall: P:gitlab::runner: Provide proxy variables to runner jobs [puppet] - 10https://gerrit.wikimedia.org/r/833125 (https://phabricator.wikimedia.org/T317997) [22:49:17] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [22:59:04] !log T317200 start cirrussearch in-place reindex process for eqiad, codfw and cloudelastic [22:59:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:59:08] T317200: Reindex all wikis to fix nnbsp regression - https://phabricator.wikimedia.org/T317200 [23:30:42] (03PS1) 10Zabe: DNM: switch to /dev/sdb [puppet] - 10https://gerrit.wikimedia.org/r/833128 [23:40:47] (03Abandoned) 10Zabe: DNM: switch to /dev/sdb [puppet] - 10https://gerrit.wikimedia.org/r/833128 (owner: 10Zabe) [23:44:42] (03PS1) 10Zabe: WIP: Don't create a second disk through lvm [puppet] - 10https://gerrit.wikimedia.org/r/833130