[00:06:42] PROBLEM - BFD status on cr1-eqiad is CRITICAL: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [00:07:18] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [00:07:20] PROBLEM - Router interfaces on cr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.128, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [00:38:25] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942790 [00:38:31] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942790 (owner: 10TrainBranchBot) [00:55:03] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/942790 (owner: 10TrainBranchBot) [01:48:34] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [02:07:33] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:17:40] RECOVERY - BFD status on cr1-eqiad is OK: UP: 24 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [02:18:18] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:18:34] RECOVERY - Router interfaces on cr1-drmrs is OK: OK: host 185.15.58.128, interfaces up: 58, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:18:48] PROBLEM - Check systemd state on gitlab1003 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:18:48] PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:30:52] RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:30:52] RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:32:33] (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:42:40] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:42:52] PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:45:32] RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.265 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [03:45:42] RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 50276 bytes in 0.100 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [04:34:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [04:44:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:15:03] (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:20:03] (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [05:27:07] (03PS1) 10Muehlenhoff: Update MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/942824 [05:27:36] (03CR) 10CI reject: [V: 04-1] Update MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/942824 (owner: 10Muehlenhoff) [05:28:33] 10SRE, 10LDAP-Access-Requests: Grant Access to Turnilo for Mpossoupe - https://phabricator.wikimedia.org/T342335 (10MoritzMuehlenhoff) 05Resolved→03Open >>! In T342335#9034501, @andrea.denisse wrote: > Glad to read. I'll close this as resolved but feel free to reach out if there's anything else I can help... [05:30:36] (03CR) 10KartikMistry: [C: 03+1] Update cxserver to 2023-07-13-063245-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/937578 (https://phabricator.wikimedia.org/T340953) (owner: 10Santhosh) [05:31:36] (03PS2) 10KartikMistry: Update cxserver to 2023-07-13-063245-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/937578 (https://phabricator.wikimedia.org/T340953) (owner: 10Santhosh) [05:46:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:51:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [05:56:55] (03CR) 10Stevemunene: [C: 03+1] "LGTM!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/942687 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [05:59:16] (MediaWikiLatencyExceeded) firing: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:06:41] !log imported jenkins 2.401.3 to thirdparty/ci for buster-wikimedia T342572 [06:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:16] (MediaWikiLatencyExceeded) resolved: Average latency high: eqiad parsoid GET/200 - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook#Average_latency_exceeded - https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-site=eqiad&var-cluster=parsoid&var-method=GET - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLatencyExceeded [06:27:31] (03PS4) 10Giuseppe Lavagetto: noc: don't use on-disk files but etcd directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) [06:27:33] (03PS5) 10Giuseppe Lavagetto: noc: centralize file list management [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) [06:27:35] (03PS5) 10Giuseppe Lavagetto: noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) [06:27:37] (03PS5) 10Giuseppe Lavagetto: noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) [06:27:39] (03PS1) 10Giuseppe Lavagetto: update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 [06:27:54] (03PS3) 10Giuseppe Lavagetto: noc: unify methods to fetch the current wiki versions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942671 (https://phabricator.wikimedia.org/T341859) [06:27:56] (03PS5) 10Giuseppe Lavagetto: noc: don't use on-disk files but etcd directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) [06:27:58] (03PS6) 10Giuseppe Lavagetto: noc: centralize file list management [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) [06:28:00] (03PS6) 10Giuseppe Lavagetto: noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) [06:28:02] (03PS6) 10Giuseppe Lavagetto: noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) [06:28:04] (03PS2) 10Giuseppe Lavagetto: update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 [06:28:06] (03CR) 10CI reject: [V: 04-1] noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [06:28:08] (03CR) 10CI reject: [V: 04-1] update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 (owner: 10Giuseppe Lavagetto) [06:28:48] (03CR) 10CI reject: [V: 04-1] noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [06:28:58] (03CR) 10CI reject: [V: 04-1] noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [06:29:02] (03CR) 10CI reject: [V: 04-1] update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 (owner: 10Giuseppe Lavagetto) [06:29:05] (03PS7) 10Giuseppe Lavagetto: noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) [06:29:07] (03PS7) 10Giuseppe Lavagetto: noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) [06:29:09] (03PS3) 10Giuseppe Lavagetto: update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 [06:29:48] (03CR) 10CI reject: [V: 04-1] noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [06:29:55] (03CR) 10CI reject: [V: 04-1] noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [06:30:04] (03CR) 10CI reject: [V: 04-1] update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 (owner: 10Giuseppe Lavagetto) [06:30:46] (03PS1) 10Elukey: services: upgrade changeprop instances to Buster [deployment-charts] - 10https://gerrit.wikimedia.org/r/943037 (https://phabricator.wikimedia.org/T341140) [06:30:48] (03PS1) 10Elukey: changeprop: allow to tune monitoring container's resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) [06:30:50] (03PS1) 10Elukey: services: shift cpu resources from main app to the prometheus container [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) [06:36:39] (03PS2) 10Elukey: services: shift cpu resources from main app to the prometheus container [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) [06:39:22] (03PS2) 10Muehlenhoff: Rename Ferm::Hosts type to Wmflib::Firewall::Hosts and move to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/937488 (https://phabricator.wikimedia.org/T336497) [06:45:17] (03PS3) 10Elukey: services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) [06:53:01] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/937488 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff) [07:00:06] Amir1, Urbanecm, and taavi: I, the Bot under the Fountain, call upon thee, The Deployer, to do UTC morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T0700). [07:00:06] aanzx: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [07:01:38] o/ I can deploy [07:01:49] o/ taavi [07:02:50] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by taavi@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942470 (https://phabricator.wikimedia.org/T342800) (owner: 10Anzx) [07:03:52] (03Merged) 10jenkins-bot: ruwikibooks: Set wgRestrictDisplayTitle to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942470 (https://phabricator.wikimedia.org/T342800) (owner: 10Anzx) [07:04:20] !log taavi@deploy1002 Started scap: Backport for [[gerrit:942470|ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)]] [07:04:25] T342800: ruwikibooks: Set wgRestrictDisplayTitle to false - https://phabricator.wikimedia.org/T342800 [07:06:07] (03PS1) 10Marostegui: db1130: Update notes [puppet] - 10https://gerrit.wikimedia.org/r/943481 (https://phabricator.wikimedia.org/T343077) [07:11:05] (03CR) 10Marostegui: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/943481 (https://phabricator.wikimedia.org/T343077) (owner: 10Marostegui) [07:13:29] !log taavi@deploy1002 anzx and taavi: Backport for [[gerrit:942470|ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [07:13:33] T342800: ruwikibooks: Set wgRestrictDisplayTitle to false - https://phabricator.wikimedia.org/T342800 [07:13:36] Testing [07:13:37] aanzx: please test [07:14:23] (03CR) 10Marostegui: [C: 03+2] db1130: Update notes [puppet] - 10https://gerrit.wikimedia.org/r/943481 (https://phabricator.wikimedia.org/T343077) (owner: 10Marostegui) [07:14:32] taavi: working fine [07:14:59] syncing [07:15:01] !log taavi@deploy1002 anzx and taavi: Continuing with sync [07:16:56] RECOVERY - MariaDB Replica SQL: s2 on dbstore1007 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [07:21:23] !log taavi@deploy1002 Finished scap: Backport for [[gerrit:942470|ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)]] (duration: 17m 02s) [07:21:29] T342800: ruwikibooks: Set wgRestrictDisplayTitle to false - https://phabricator.wikimedia.org/T342800 [07:21:30] all done [07:21:46] Thanks taavi [07:22:09] (03PS1) 10Muehlenhoff: Remove LDAP access for rbrounley [puppet] - 10https://gerrit.wikimedia.org/r/943503 [07:25:06] (03CR) 10Muehlenhoff: [C: 03+2] Remove LDAP access for rbrounley [puppet] - 10https://gerrit.wikimedia.org/r/943503 (owner: 10Muehlenhoff) [07:36:28] RECOVERY - MariaDB Replica Lag: s2 on dbstore1007 is OK: OK slave_sql_lag Replication lag: 0.21 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [07:41:29] 10SRE, 10Traffic, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) Regarding HAProxy reload process, basically HAProxy spawns a new process and hands over all the file descriptors to the new process (that's been started with the new configur... [07:45:56] (03PS3) 10Ilias Sarantopoulos: ores-extension: enable Lift Wing for most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942649 (https://phabricator.wikimedia.org/T342115) [07:48:22] (03PS2) 10Amire80: Remove ak from wgImportSources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941372 (https://phabricator.wikimedia.org/T333765) [07:54:05] 10SRE, 10SRE-swift-storage, 10Performance-Team, 10Traffic, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10MatthewVernon) Here are slightly nicer figures (more sf, which means the lines are rather more accurate) - the frequency distribu... [08:06:27] (03PS8) 10Giuseppe Lavagetto: noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) [08:06:29] (03PS8) 10Giuseppe Lavagetto: noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) [08:06:31] (03PS4) 10Giuseppe Lavagetto: update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 [08:10:18] (03PS1) 10Vgutierrez: prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) [08:10:42] (03CR) 10CI reject: [V: 04-1] prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [08:13:01] (03PS2) 10Vgutierrez: prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) [08:13:25] (03CR) 10CI reject: [V: 04-1] prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [08:15:29] (03PS1) 10Vgutierrez: haproxy: Disable KA on stats frontend [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) [08:16:26] (03CR) 10Muehlenhoff: [C: 03+2] Rename Ferm::Hosts type to Wmflib::Firewall::Hosts and move to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/937488 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff) [08:19:06] (03PS3) 10Vgutierrez: prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) [08:19:08] (03CR) 10Kaleem Bhatti: "anyone what's meaning of merge conflict" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [08:19:13] (03CR) 10Fabfur: [C: 03+2] Release 0.6.4 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/942414 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur) [08:21:51] (03PS3) 10Muehlenhoff: Move Ferm::Protocol to wmflib (as generic Wmflib::Protocol) [puppet] - 10https://gerrit.wikimedia.org/r/937489 (https://phabricator.wikimedia.org/T336497) [08:25:33] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/937489 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff) [08:30:26] (03PS4) 10Slyngshede: Facter: Python version [puppet] - 10https://gerrit.wikimedia.org/r/942641 (https://phabricator.wikimedia.org/T271196) [08:31:03] (03CR) 10CI reject: [V: 04-1] Facter: Python version [puppet] - 10https://gerrit.wikimedia.org/r/942641 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede) [08:39:07] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance [08:39:20] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance [08:39:23] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance [08:39:35] (03CR) 10Slyngshede: Facter: Python version (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/942641 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede) [08:39:36] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance [08:39:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49802 and previous config saved to /var/cache/conftool/dbconfig/20230731-083941-ladsgroup.json [08:39:46] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [08:42:28] !log fabfur@cumin1001 START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm [08:46:09] (03CR) 10Vgutierrez: "exporters working as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [08:52:42] !log fabfur@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm [08:53:31] !log fabfur@cumin1001 START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm [08:57:22] 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for adee_wmde - https://phabricator.wikimedia.org/T342969 (10adee_wmde) a:05adee_wmde→03None [08:58:33] (03CR) 10Jelto: [V: 03+1 C: 03+2] gitlab: auto_sign_in_with openid_connect on all instances [puppet] - 10https://gerrit.wikimedia.org/r/942646 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto) [08:58:48] (03CR) 10Filippo Giunchedi: [C: 03+1] haproxy: Disable KA on stats frontend (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) (owner: 10Vgutierrez) [08:59:45] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance [08:59:58] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance [09:00:57] (03CR) 10Filippo Giunchedi: "LGTM, missing job definition" [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [09:09:28] (03CR) 10Hnowlan: [C: 03+1] services: upgrade changeprop instances to Buster [deployment-charts] - 10https://gerrit.wikimedia.org/r/943037 (https://phabricator.wikimedia.org/T341140) (owner: 10Elukey) [09:09:52] (03CR) 10Ladsgroup: [C: 03+1] "tested some pieces locally too. Looks great." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942671 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [09:10:31] !log fabfur@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage [09:10:51] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10Fabfur) Thanks @jbond I confirm the installer now can correctly install the base system without any error on lvs1016! The "minor annoyance" now is th... [09:13:36] !log fabfur@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage [09:13:58] 10SRE, 10CAS-SSO, 10Infrastructure-Foundations, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10Jelto) OIDC is enabled instance-wide now. >>! In T320390#9052117, @dancy wrote: > In https://gitlab.wikimedia.org/repos/releng/g... [09:14:58] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users (no kerberos, with ssh) for karapayneWMDE - https://phabricator.wikimedia.org/T342546 (10BTullis) [09:17:11] (03CR) 10Hnowlan: [C: 03+1] services: shift changeprop's cpu resources from main app to the prometheus (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [09:17:49] 10SRE, 10CAS-SSO, 10Infrastructure-Foundations, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10CodeReviewBot) jelto opened https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/merge_requests/38 use provider openid_con... [09:18:02] jouncebot: nowandnext [09:18:02] No deployments scheduled for the next 0 hour(s) and 41 minute(s) [09:18:02] In 0 hour(s) and 41 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1000) [09:18:32] (03CR) 10Ladsgroup: [C: 03+2] Remove ak from wgImportSources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941372 (https://phabricator.wikimedia.org/T333765) (owner: 10Amire80) [09:18:57] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by ladsgroup@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941372 (https://phabricator.wikimedia.org/T333765) (owner: 10Amire80) [09:19:13] (03Merged) 10jenkins-bot: Remove ak from wgImportSources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941372 (https://phabricator.wikimedia.org/T333765) (owner: 10Amire80) [09:19:31] !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:941372|Remove ak from wgImportSources (T333765)]] [09:19:35] T333765: Remove Akan support from MediaWiki, ULS, and Wikimedia servers - https://phabricator.wikimedia.org/T333765 [09:20:56] !log ladsgroup@deploy1002 amire80 and ladsgroup: Backport for [[gerrit:941372|Remove ak from wgImportSources (T333765)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [09:21:28] !log ladsgroup@deploy1002 amire80 and ladsgroup: Continuing with sync [09:21:47] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-wmde-users (no kerberos, with ssh) for karapayneWMDE - https://phabricator.wikimedia.org/T342546 (10BTullis) a:03Stevemunene Here is the LDAP `uidNumber` for kpayne ` btullis@seaborgium:~$ ldapsearch -x uid=kpayne uidNumber # extended LDIF # # LDA... [09:22:51] 10SRE, 10SRE-Access-Requests, 10Data-Platform-SRE: Requesting access to analytics-wmde-users (no kerberos, with ssh) for karapayneWMDE - https://phabricator.wikimedia.org/T342546 (10BTullis) p:05Triage→03Medium [09:23:53] (03CR) 10Ladsgroup: [C: 03+2] ores-extension: enable Lift Wing for most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942649 (https://phabricator.wikimedia.org/T342115) (owner: 10Ilias Sarantopoulos) [09:24:35] (03Merged) 10jenkins-bot: ores-extension: enable Lift Wing for most wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942649 (https://phabricator.wikimedia.org/T342115) (owner: 10Ilias Sarantopoulos) [09:27:01] 10SRE-OnFire, 10Discovery-Search (Current work), 10Sustainability: WDQS: Document procedure for switching between Kubernetes and Yarn Streaming Updater - https://phabricator.wikimedia.org/T337801 (10dcausse) Added few notes at: https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater#Runn... [09:27:04] (03CR) 10Elukey: services: shift changeprop's cpu resources from main app to the prometheus (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [09:27:42] !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:941372|Remove ak from wgImportSources (T333765)]] (duration: 08m 10s) [09:27:46] T333765: Remove Akan support from MediaWiki, ULS, and Wikimedia servers - https://phabricator.wikimedia.org/T333765 [09:28:07] !log fabfur@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001" [09:28:32] !log ladsgroup@deploy1002 Started scap: Backport for [[gerrit:942649|ores-extension: enable Lift Wing for most wikis (T342115)]] [09:28:36] T342115: Deployment of Lift Wing usage to all wikis that use ores extension - https://phabricator.wikimedia.org/T342115 [09:28:43] * elukey drum rolls [09:29:02] !log fabfur@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001" [09:29:07] !log fabfur@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm [09:29:56] !log ladsgroup@deploy1002 ladsgroup and isaranto: Backport for [[gerrit:942649|ores-extension: enable Lift Wing for most wikis (T342115)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [09:30:16] isaranto, elukey live in mwdebug [09:32:34] !log Unblock stuck global rename by running `extensions/CentralAuth/maintenance/fixStuckGlobalRename.php` (T343099) [09:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:38] T343099: Unblock stuck global rename of أبو آسر - https://phabricator.wikimedia.org/T343099 [09:32:52] 10SRE, 10SRE-Access-Requests: Requesting access to releasers-wikibase for darthmon_wmde - https://phabricator.wikimedia.org/T342968 (10WMDE-leszek) Once this is done, I boldly suggest to mark @darthmon_wmde as approval person for that group: https://gerrit.wikimedia.org/r/c/operations/puppet/+/943511 [09:33:08] Amir1: anything that we need to check/do? [09:33:30] elukey: some small stuff, running a job locally in mwdebug [09:33:35] not much [09:35:09] ahhh okok I was wondering if you were waiting for us [09:35:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: hw troubleshooting: CPU machine check failure for parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T339340 (10Clement_Goubert) a:05akosiaris→03Clement_Goubert [09:36:22] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: hw troubleshooting: CPU machine check failure for parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T339340 (10Clement_Goubert) I'm going to put it back in the pool to see if that fixed it, thanks @Jclark-ctr [09:36:24] !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki [09:36:45] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: hw troubleshooting: CPU machine check failure for parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T339340 (10Clement_Goubert) [09:37:03] !log fabfur@cumin1001 START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm [09:41:44] !log cgoubert@cumin1001 conftool action : set/pooled=no; selector: dc=eqiad,name=parse1002.eqiad.wmnet [09:42:12] (03CR) 10Btullis: [C: 03+2] Use the new DataHub images built with GitLab-CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/942687 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [09:42:13] isaranto: my test in fawiki works fine, shall I move forward? [09:42:44] just a sec, I am also testing in dewiki [09:43:06] cool, ping me once done [09:43:10] (03Merged) 10jenkins-bot: Use the new DataHub images built with GitLab-CI [deployment-charts] - 10https://gerrit.wikimedia.org/r/942687 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [09:43:13] Ah, deployments in progress [09:43:18] I'll wait with parse1002 [09:43:31] thanks, we will be done soon [09:43:36] Should have checked beforehand, it shouldn't break anything [09:43:45] I'll pause just in case :) [09:43:51] (03CR) 10Muehlenhoff: [C: 03+2] Move Ferm::Protocol to wmflib (as generic Wmflib::Protocol) [puppet] - 10https://gerrit.wikimedia.org/r/937489 (https://phabricator.wikimedia.org/T336497) (owner: 10Muehlenhoff) [09:45:39] Amir1: let's go! [09:45:46] awesome [09:45:49] !log ladsgroup@deploy1002 ladsgroup and isaranto: Continuing with sync [09:46:35] (03PS2) 10Muehlenhoff: Update MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/942824 [09:46:50] (03PS3) 10Muehlenhoff: Update MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/942824 [09:49:42] claime: if you get in the middle of a ORES-related deployment you'll get a curse, and you'll become an ML team member for life [09:49:50] Amir1 knows something about it :D [09:49:52] elukey: nooooooooooooooooooo [09:50:04] lol [09:50:08] (03CR) 10Muehlenhoff: [C: 03+2] Update MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/942824 (owner: 10Muehlenhoff) [09:50:23] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [09:51:32] !log ladsgroup@deploy1002 Finished scap: Backport for [[gerrit:942649|ores-extension: enable Lift Wing for most wikis (T342115)]] (duration: 23m 00s) [09:51:36] T342115: Deployment of Lift Wing usage to all wikis that use ores extension - https://phabricator.wikimedia.org/T342115 [09:51:50] wow is it done? [09:51:54] yup [09:51:55] Can't really believe it [09:52:15] can you see the change in LW side? [09:52:18] I see moar traffic to Lift Wing [09:52:34] btw, I think we should stop running CP for ores precache [09:52:35] (03CR) 10Hnowlan: [C: 03+1] "sgtm for staging, but need to be careful about rolling further." [deployment-charts] - 10https://gerrit.wikimedia.org/r/939292 (https://phabricator.wikimedia.org/T339865) (owner: 10Jgiannelos) [09:52:51] https://grafana.wikimedia.org/d/vAN_bQemz/ores-advanced-metrics?orgId=1&refresh=1m&viewPanel=70 [09:53:18] elukey: only enwiki and wikidatawiki left, tiny little wikis [09:53:20] Amir1: yeah but that thing generates the revision-score stream as well, we are trying to deprecate but serveral teams/clients use it (like WME etc..) [09:53:58] oh joy [09:54:32] !log fabfur@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage [09:54:35] we are planning to have one stream for each "model", so more "specialized" revision-score-etc.. [09:55:17] makes sense [09:56:10] but on the bright side, we already have an happy bot running on liftwing with new Research models (so they discarded also revscoring goodfaith/damaging) [09:56:58] !log fabfur@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage [09:57:07] see ~4x traffic from mediawiki on liftwing, so far all HTTP 200s [09:57:26] nice, the rate? [09:58:11] 400/500 requests/minute afaics [09:58:17] (I am checking from logstash) [09:59:01] https://logstash.wikimedia.org/goto/b18e6593753f893b4865fb908cdd08cb [09:59:14] (03PS1) 10Btullis: Bump all datahub chart versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/943515 (https://phabricator.wikimedia.org/T341194) [09:59:36] isaranto: congrats :) [09:59:44] this is a huge milestone [10:00:06] Deploy window MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1000) [10:00:06] nice [10:01:31] elukey: Congrats to the team and Amir1: for all the help <3. Will be monitoring, hope it continues like this 🤞 [10:02:06] (03CR) 10Btullis: [C: 03+1] "Thanks Filippo." [puppet] - 10https://gerrit.wikimedia.org/r/942426 (https://phabricator.wikimedia.org/T108027) (owner: 10Filippo Giunchedi) [10:02:29] checked on the ores dashboard (logstash) and I see only mw traffic for enwiki and wikidata [10:02:32] !log btullis@cumin1001 Added views for new wiki: btmwiktionary T342670 [10:02:32] !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) [10:02:36] T342670: Prepare and check storage layer for btmwiktionary - https://phabricator.wikimedia.org/T342670 [10:02:56] (03CR) 10Btullis: [C: 03+2] Bump all datahub chart versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/943515 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [10:03:40] (03Merged) 10jenkins-bot: Bump all datahub chart versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/943515 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [10:11:27] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10Volans) From the test made with @Fabfur I'm convinced we're hitting the timeout of spicerack checking the uptime, it runs: ` transports.Command("cat /... [10:11:57] !log fabfur@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm [10:16:22] (03PS1) 10Jforrester: Function impl marked dirty when labels change [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942735 (https://phabricator.wikimedia.org/T342687) [10:20:02] (03CR) 10CI reject: [V: 04-1] Function impl marked dirty when labels change [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942735 (https://phabricator.wikimedia.org/T342687) (owner: 10Jforrester) [10:20:41] !log installing bind9 security updates (client-side tools/libs) [10:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:37] <_joe_> Amir1: want to test noc on mwdebug2002? [10:27:46] sure [10:27:50] <_joe_> jouncebot: nowandnext [10:27:50] For the next 0 hour(s) and 32 minute(s): MediaWiki infrastucture (UTC mid-day) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1000) [10:27:50] In 2 hour(s) and 32 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1300) [10:28:02] <_joe_> we can make do :) [10:28:09] <_joe_> ok, I'll start working on it [10:28:26] <_joe_> !log disabling puppet on mwdebug2002, testing noc.wikimedia.org [10:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:56] <_joe_> Amir1: how do you apply the patches btw? [10:29:15] sudo -u mwdeploy vim /srv/... [10:29:40] sorry, the user is wrong [10:29:42] <_joe_> heh it won't cut it here [10:29:51] <_joe_> www-data, yes [10:30:10] yeah, I actually wanted a way to download a patch for really long time [10:30:19] (03PS1) 10Filippo Giunchedi: clinic-duty: add de-cix support [software] - 10https://gerrit.wikimedia.org/r/943518 [10:30:21] (03PS1) 10Filippo Giunchedi: clinic-duty: add digitalrealty [software] - 10https://gerrit.wikimedia.org/r/943519 [10:30:21] <_joe_> git fetch https://gerrit.wikimedia.org/r/operations/mediawiki-config refs/changes/71/942671/3 && git format-patch -1 --stdout FETCH_HEAD should work [10:30:23] (03PS1) 10Filippo Giunchedi: clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 [10:30:25] (03PS1) 10Filippo Giunchedi: clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 [10:30:28] (03CR) 10CI reject: [V: 04-1] clinic-duty: add digitalrealty [software] - 10https://gerrit.wikimedia.org/r/943519 (owner: 10Filippo Giunchedi) [10:30:30] (03CR) 10CI reject: [V: 04-1] clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 (owner: 10Filippo Giunchedi) [10:30:32] (03CR) 10CI reject: [V: 04-1] clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 (owner: 10Filippo Giunchedi) [10:30:36] _joe_: /srv/mediawiki is not a git repo [10:31:10] <_joe_> Amir1: yeah, on your computer, then upload it [10:31:22] <_joe_> or in a clone in your homedir [10:31:38] a bit more terrifying way to do it (I did it when I had to test on live traffic) is to cherry-pick it on deploy1002 (treat it like security patch) and then scap pull in mwdebug [10:32:09] (03CR) 10Filippo Giunchedi: [C: 03+2] clinic-duty: add de-cix support [software] - 10https://gerrit.wikimedia.org/r/943518 (owner: 10Filippo Giunchedi) [10:34:04] <_joe_> Amir1: I think it's the only way in this case, sigh [10:34:20] <_joe_> well no, let me manually install patch(1) on mwdebug [10:34:37] !log cgoubert@cumin1001 START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet [10:34:37] !log cgoubert@cumin1001 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet [10:34:59] (03PS2) 10Filippo Giunchedi: clinic-duty: add digitalrealty [software] - 10https://gerrit.wikimedia.org/r/943519 [10:35:01] (03PS2) 10Filippo Giunchedi: clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 [10:35:03] (03PS2) 10Filippo Giunchedi: clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 [10:35:12] (03CR) 10jenkins-bot: clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 (owner: 10Filippo Giunchedi) [10:35:14] (03CR) 10jenkins-bot: clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 (owner: 10Filippo Giunchedi) [10:36:04] !log Repooling parse1002 following CPU replacement - T339340 [10:36:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:07] T339340: hw troubleshooting: CPU machine check failure for parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T339340 [10:36:24] !log cgoubert@cumin1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=parse1002.eqiad.wmnet [10:37:04] (03CR) 10Filippo Giunchedi: [C: 03+2] clinic-duty: add digitalrealty [software] - 10https://gerrit.wikimedia.org/r/943519 (owner: 10Filippo Giunchedi) [10:37:51] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: hw troubleshooting: CPU machine check failure for parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T339340 (10Clement_Goubert) 05Open→03Resolved Resolving for now, we will reopen if issues reappear. [10:38:58] <_joe_> Amir1: uhm noc is not supported in X-wikimedia-debug [10:39:13] 😭 [10:39:21] T_T [10:40:03] <_joe_> Amir1: so I can only make curl requests [10:40:15] <_joe_> or better, I can do an ssh tunnel [10:40:41] I can do curl, save and then maybe open them in browser :D [10:44:04] (03CR) 10Clément Goubert: [C: 03+1] noc: stop serving static files from symlinks [puppet] - 10https://gerrit.wikimedia.org/r/942607 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [10:45:12] !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki [10:45:48] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [10:45:59] <_joe_> Amir1: as soon as I figure out how to make curl calls work, I'll let you know :D [10:46:14] haha, no worries [10:47:37] (03CR) 10Jbond: "LGTM excluding the rubocop issue which you should be able to fix with `bundle exec rubocop -A`" [puppet] - 10https://gerrit.wikimedia.org/r/942641 (https://phabricator.wikimedia.org/T271196) (owner: 10Slyngshede) [10:47:58] (03CR) 10Jforrester: "recheck" [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942735 (https://phabricator.wikimedia.org/T342687) (owner: 10Jforrester) [10:50:27] 10SRE-tools, 10DC-Ops, 10Infrastructure-Foundations: sre.hosts.reimage: fails to get uptime in debian installer - https://phabricator.wikimedia.org/T342345 (10jbond) nice hopefully a simple fix >>! In T342345#9054715, @Volans wrote: > After d-i ssh is normally quick, so not sure if it's worth investigating.... [10:51:33] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [10:51:46] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance [10:55:10] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [10:55:43] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "Tested with curl on mwdebug2002, the output doesn't change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942671 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [10:55:54] (03PS1) 10Jforrester: Create standalone JS module with language selector for page header [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942738 (https://phabricator.wikimedia.org/T341500) [10:56:31] (03CR) 10Muehlenhoff: Credit logo artist. (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/934265 (https://phabricator.wikimedia.org/T338828) (owner: 10Slyngshede) [10:57:28] (03PS2) 10Vgutierrez: haproxy: Disable KA on stats frontend [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) [10:57:47] (03CR) 10Vgutierrez: "Thanks for your review Filippo!" [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) (owner: 10Vgutierrez) [10:59:15] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [11:00:27] (03PS3) 10Vgutierrez: haproxy: Disable KA on stats frontend [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) [11:01:01] (03CR) 10Vgutierrez: [C: 03+2] haproxy: Disable KA on stats frontend [puppet] - 10https://gerrit.wikimedia.org/r/943506 (https://phabricator.wikimedia.org/T343000) (owner: 10Vgutierrez) [11:05:21] (03PS1) 10Btullis: Remove unnecessary datahub upgrade repository overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/943524 (https://phabricator.wikimedia.org/T341194) [11:08:03] (03CR) 10Btullis: [C: 03+2] Remove unnecessary datahub upgrade repository overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/943524 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [11:09:08] (03Merged) 10jenkins-bot: Remove unnecessary datahub upgrade repository overrides [deployment-charts] - 10https://gerrit.wikimedia.org/r/943524 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [11:09:20] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Fuzzy) >>! In T275319#9054277, @stjn wrote: > Wikisource editors can absolutely split pages into smaller ones, since those longer... [11:10:08] 10SRE, 10Traffic, 10observability, 10Patch-For-Review: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) 05Open→03Stalled After disabling KA, `haproxy_frontend_connections_total{proxy="stats"}` starts to increase as expected: {F37156880} Let's wait 24h... [11:10:31] (03CR) 10Muehlenhoff: "Final nit" [software/bitu] - 10https://gerrit.wikimedia.org/r/934519 (https://phabricator.wikimedia.org/T340637) (owner: 10Slyngshede) [11:11:01] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [11:11:04] !log btullis@cumin1001 Added views for new wiki: wikifunctionswiki T289316 [11:11:04] !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) [11:11:08] T289316: Prepare and check storage layer for Wikifunctions.org (new public content wiki) - https://phabricator.wikimedia.org/T289316 [11:16:35] (03PS4) 10Vgutierrez: prometheus::ops: Add liberica demo exporter [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) [11:18:34] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [11:21:13] <_joe_> jouncebot: nowandnext [11:21:13] No deployments scheduled for the next 1 hour(s) and 38 minute(s) [11:21:13] In 1 hour(s) and 38 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1300) [11:23:25] !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main [11:23:34] (HelmReleaseBadStatus) firing: Helm release datahub/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [11:25:07] (03PS6) 10Giuseppe Lavagetto: noc: don't use on-disk files but etcd directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) [11:25:09] (03PS7) 10Giuseppe Lavagetto: noc: centralize file list management [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) [11:25:11] (03PS9) 10Giuseppe Lavagetto: noc: add static file server [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) [11:25:13] (03PS9) 10Giuseppe Lavagetto: noc: remove symlinks and also neutralize createTxtFileSymlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) [11:25:15] (03PS5) 10Giuseppe Lavagetto: update noc README [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943036 [11:25:40] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "Verified manually on mwdebug2002." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942672 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [11:25:59] (03PS1) 10PipelineBot: mobileapps: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/942791 [11:28:34] (HelmReleaseBadStatus) resolved: Helm release datahub/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [11:31:31] (03CR) 10MSantos: [C: 03+2] mobileapps: Add core parsoid HTML support config [deployment-charts] - 10https://gerrit.wikimedia.org/r/939292 (https://phabricator.wikimedia.org/T339865) (owner: 10Jgiannelos) [11:32:26] (03Merged) 10jenkins-bot: mobileapps: Add core parsoid HTML support config [deployment-charts] - 10https://gerrit.wikimedia.org/r/939292 (https://phabricator.wikimedia.org/T339865) (owner: 10Jgiannelos) [11:36:59] (03PS1) 10Btullis: Add a missing environment variable to datahub/mae-consumer [deployment-charts] - 10https://gerrit.wikimedia.org/r/943549 (https://phabricator.wikimedia.org/T329514) [11:37:12] 10SRE, 10ops-eqiad, 10DBA: db1130 crash memory errors - https://phabricator.wikimedia.org/T343076 (10Jclark-ctr) @Marostegui i do have a few decom host i can pull from is this server down? I would like to do it today [11:37:37] (03CR) 10CI reject: [V: 04-1] Add a missing environment variable to datahub/mae-consumer [deployment-charts] - 10https://gerrit.wikimedia.org/r/943549 (https://phabricator.wikimedia.org/T329514) (owner: 10Btullis) [11:37:40] 10SRE, 10ops-eqiad, 10DBA: db1130 crash memory errors - https://phabricator.wikimedia.org/T343076 (10Marostegui) Let me depool it for you, give me 5 minutes [11:39:03] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "I tested this patch and it works as expected. There is a small change in the handling of files in subdirectories that I might want to unif" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942673 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [11:39:28] (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [11:39:48] (03PS2) 10Btullis: Add a missing environment variable to datahub/mae-consumer [deployment-charts] - 10https://gerrit.wikimedia.org/r/943549 (https://phabricator.wikimedia.org/T329514) [11:40:17] 10SRE, 10ops-eqiad, 10DBA: db1130 crash memory errors - https://phabricator.wikimedia.org/T343076 (10Marostegui) @Jclark-ctr host down, you can proceed whenever you want during the day. Thank you [11:46:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2114 T334650', diff saved to https://phabricator.wikimedia.org/P49803 and previous config saved to /var/cache/conftool/dbconfig/20230731-114645-root.json [11:46:50] T334650: Migrate s6 to MariaDB 10.6 - https://phabricator.wikimedia.org/T334650 [11:47:44] (03PS1) 10Marostegui: db2114: Migrate to MariaDB 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/943550 (https://phabricator.wikimedia.org/T334650) [11:48:12] (03CR) 10Marostegui: [C: 03+2] db2114: Migrate to MariaDB 10.6 [puppet] - 10https://gerrit.wikimedia.org/r/943550 (https://phabricator.wikimedia.org/T334650) (owner: 10Marostegui) [11:51:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49804 and previous config saved to /var/cache/conftool/dbconfig/20230731-115133-root.json [12:03:33] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49805 and previous config saved to /var/cache/conftool/dbconfig/20230731-120332-ladsgroup.json [12:03:38] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [12:04:23] question: is it possible to give an ETA on when LDAP req for cn=wmf will be fulfilled (namely, T342230, which is waiting for almost 3 weeks w/o any SRE reply). It would help my team's work if the req can be expedited somehow). Thanks for any info! [12:04:24] T342230: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 [12:05:49] godog: ^ [12:06:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49806 and previous config saved to /var/cache/conftool/dbconfig/20230731-120638-root.json [12:08:37] RhinosF1: thank you for the hilight [12:08:42] urbanecm: I'll take a look today [12:08:46] thank you [12:09:40] (03CR) 10David Caro: "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/942778 (https://phabricator.wikimedia.org/T341509) (owner: 10Andrew Bogott) [12:09:44] (03CR) 10David Caro: [C: 03+1] Revert "keystone: hack to reject all new non-alphanumerical project or domain names" [puppet] - 10https://gerrit.wikimedia.org/r/942778 (https://phabricator.wikimedia.org/T341509) (owner: 10Andrew Bogott) [12:10:14] Np [12:11:06] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff) [12:11:13] (03PS1) 10Jforrester: SpecialViewObject: Catch errors from trying to create the language [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942739 (https://phabricator.wikimedia.org/T343006) [12:11:32] (03PS1) 10Jforrester: ApiPerformTest: Catch invalid-Object errors thrown from user input [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942740 (https://phabricator.wikimedia.org/T342901) [12:12:10] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [12:13:28] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff) p:05Triage→03Medium [12:13:46] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 (owner: 10Filippo Giunchedi) [12:14:22] (03Merged) 10jenkins-bot: clinic-duty: add GTT [software] - 10https://gerrit.wikimedia.org/r/943520 (owner: 10Filippo Giunchedi) [12:14:51] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 (owner: 10Filippo Giunchedi) [12:14:56] I'm going to start landing wmf.19 patches for WikiLambda as there are so many and one merge-conflicts. [12:14:58] (Fun.) [12:15:27] (03Merged) 10jenkins-bot: clinic-duty: update NTT [software] - 10https://gerrit.wikimedia.org/r/943521 (owner: 10Filippo Giunchedi) [12:15:36] (03CR) 10Jforrester: [C: 03+2] SpecialViewObject: Don't load if action=edit etc. [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942485 (https://phabricator.wikimedia.org/T342891) (owner: 10Jforrester) [12:15:42] (03CR) 10Jforrester: [C: 03+2] Function impl marked dirty when labels change [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942735 (https://phabricator.wikimedia.org/T342687) (owner: 10Jforrester) [12:15:48] (03CR) 10Jforrester: [C: 03+2] Create standalone JS module with language selector for page header [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942738 (https://phabricator.wikimedia.org/T341500) (owner: 10Jforrester) [12:15:54] (03CR) 10Jforrester: [C: 03+2] SpecialViewObject: Catch errors from trying to create the language [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942739 (https://phabricator.wikimedia.org/T343006) (owner: 10Jforrester) [12:16:00] (03CR) 10Jforrester: [C: 03+2] ApiPerformTest: Catch invalid-Object errors thrown from user input [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942740 (https://phabricator.wikimedia.org/T342901) (owner: 10Jforrester) [12:18:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49807 and previous config saved to /var/cache/conftool/dbconfig/20230731-121839-ladsgroup.json [12:19:42] (03Merged) 10jenkins-bot: SpecialViewObject: Don't load if action=edit etc. [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942485 (https://phabricator.wikimedia.org/T342891) (owner: 10Jforrester) [12:19:44] (03Merged) 10jenkins-bot: Function impl marked dirty when labels change [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942735 (https://phabricator.wikimedia.org/T342687) (owner: 10Jforrester) [12:19:47] (03Merged) 10jenkins-bot: Create standalone JS module with language selector for page header [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942738 (https://phabricator.wikimedia.org/T341500) (owner: 10Jforrester) [12:19:50] (03Merged) 10jenkins-bot: SpecialViewObject: Catch errors from trying to create the language [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942739 (https://phabricator.wikimedia.org/T343006) (owner: 10Jforrester) [12:19:53] (03Merged) 10jenkins-bot: ApiPerformTest: Catch invalid-Object errors thrown from user input [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942740 (https://phabricator.wikimedia.org/T342901) (owner: 10Jforrester) [12:21:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49808 and previous config saved to /var/cache/conftool/dbconfig/20230731-122142-root.json [12:22:20] (03PS1) 10Jforrester: PageRenderingHandler: Set href always, even when we don't set the label [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942741 (https://phabricator.wikimedia.org/T343041) [12:22:25] (03CR) 10Jforrester: [C: 03+2] PageRenderingHandler: Set href always, even when we don't set the label [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942741 (https://phabricator.wikimedia.org/T343041) (owner: 10Jforrester) [12:22:28] 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 (10fgiunchedi) @Aklapper does the above look good to you ? thank you ! [12:23:51] !log installing mariadb-10.5 updates from Bullseye 11.7 point release (libs/tools, unrelated to wmf-mariadb packages) [12:23:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:41] (03CR) 10Btullis: [C: 03+2] Upgrade the analytics instance of airflow to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/939347 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [12:24:53] (03PS1) 10Filippo Giunchedi: admin: add mpossoupe [puppet] - 10https://gerrit.wikimedia.org/r/943551 (https://phabricator.wikimedia.org/T342335) [12:27:19] (03Merged) 10jenkins-bot: PageRenderingHandler: Set href always, even when we don't set the label [extensions/WikiLambda] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942741 (https://phabricator.wikimedia.org/T343041) (owner: 10Jforrester) [12:27:42] 10SRE, 10LDAP-Access-Requests: Grant Access to wmde for Ifrahkhanyaree (Ifrah_WMDE) - https://phabricator.wikimedia.org/T341455 (10fgiunchedi) 05Open→03Resolved @Ifrahkhanyaree I'm optimistically resolving the task, assuming that access is working as expected. Please reach out and reopen if that is not the... [12:27:44] 10SRE, 10SRE-Access-Requests: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10Dbrant) [12:32:05] !log installing xapian-core bugfix updates on Bullseye [12:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:45] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49809 and previous config saved to /var/cache/conftool/dbconfig/20230731-123345-ladsgroup.json [12:36:11] 10SRE, 10SRE-Access-Requests: Requesting access to restricted for dbrant - https://phabricator.wikimedia.org/T343122 (10Seddon) Approved. Noting that Dmitry has shell access https://phabricator.wikimedia.org/rOPUPf9d604a70a271f82b491803e4bafc8bfebc4a85a [12:36:14] (03CR) 10Vgutierrez: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/943505 (https://phabricator.wikimedia.org/T342618) (owner: 10Vgutierrez) [12:36:19] 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 (10Aklapper) Yes yes, thanks a lot again! [12:36:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49810 and previous config saved to /var/cache/conftool/dbconfig/20230731-123647-root.json [12:42:08] (03PS1) 10Filippo Giunchedi: admin: add cyndywikime [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) [12:42:41] jouncebot: nowandnext [12:42:41] No deployments scheduled for the next 0 hour(s) and 17 minute(s) [12:42:41] In 0 hour(s) and 17 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1300) [12:43:07] jnuche: I'll be grabbing backport, so do whatever you need to and I'll wait. [12:43:28] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/943551 (https://phabricator.wikimedia.org/T342335) (owner: 10Filippo Giunchedi) [12:44:22] James_F: thanks! I want to update the CI Jenkins [12:44:27] Fun times. [12:44:29] hopefully it should be quick enough [12:45:02] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add mpossoupe [puppet] - 10https://gerrit.wikimedia.org/r/943551 (https://phabricator.wikimedia.org/T342335) (owner: 10Filippo Giunchedi) [12:45:07] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add cyndywikime [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) (owner: 10Filippo Giunchedi) [12:45:59] (03CR) 10Filippo Giunchedi: "oops! was a little too trigger-happy there with +2" [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) (owner: 10Filippo Giunchedi) [12:46:02] (03PS1) 10Btullis: Update airflow sqlalchemy URI for all airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/943557 (https://phabricator.wikimedia.org/T336286) [12:46:05] 10SRE, 10SRE-swift-storage, 10Performance-Team, 10Traffic, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) >>! In T211661#9054485, @MatthewVernon wrote: > The other thing I can't quite leave alone is - why are we being asked for so... [12:46:11] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) (owner: 10Filippo Giunchedi) [12:46:38] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add cyndywikime [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) (owner: 10Filippo Giunchedi) [12:47:36] (03PS2) 10Filippo Giunchedi: admin: add cyndywikime [puppet] - 10https://gerrit.wikimedia.org/r/943555 (https://phabricator.wikimedia.org/T342230) [12:47:41] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to Turnilo for Mpossoupe - https://phabricator.wikimedia.org/T342335 (10fgiunchedi) 05Open→03Resolved `admin` module fixed in puppet, resolving. [12:47:50] (03PS8) 10Aklapper: sdwiki: set 'wgTranslateNumerals' to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [12:48:16] any deployers feeling bored? i wonder if you could check the status of the last two maint script runs from https://phabricator.wikimedia.org/T315510 for me [12:48:51] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49811 and previous config saved to /var/cache/conftool/dbconfig/20230731-124851-ladsgroup.json [12:48:53] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance [12:48:55] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [12:49:06] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance [12:49:12] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49812 and previous config saved to /var/cache/conftool/dbconfig/20230731-124912-ladsgroup.json [12:49:41] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 (10fgiunchedi) @Cyndymediawiksim you are now part of the `wmf` ldap group, please verify and confirm access works as expected! [12:50:04] 10SRE, 10Infrastructure-Foundations, 10Epic: Tracking task for Bullseye migrations in production - https://phabricator.wikimedia.org/T291916 (10Gehel) [12:50:39] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Alexey_Skripnik) >>! In T275319#9054275, @stjn wrote: > For the record, I don't think that the need to be able to build even long... [12:51:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49813 and previous config saved to /var/cache/conftool/dbconfig/20230731-125152-root.json [12:52:53] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depool db1196 T342284', diff saved to https://phabricator.wikimedia.org/P49814 and previous config saved to /var/cache/conftool/dbconfig/20230731-125252-ladsgroup.json [12:52:56] T342284: db1218 crashed - https://phabricator.wikimedia.org/T342284 [12:53:34] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: Maint [12:53:47] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maint [12:54:10] (03CR) 10Kamila Součková: changeprop: allow to tune monitoring container's resources (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [12:54:55] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance [12:55:08] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance [12:55:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49815 and previous config saved to /var/cache/conftool/dbconfig/20230731-125513-ladsgroup.json [12:55:20] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [12:55:22] (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (CORE_DIFF 6 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42725/console" [puppet] - 10https://gerrit.wikimedia.org/r/943557 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [12:55:47] 10SRE, 10serviceops-radar, 10Patch-For-Review: Drop the `deploy-service` right, move three included users to `deployment` (or drop access)? - https://phabricator.wikimedia.org/T340165 (10fgiunchedi) I'm boldly removing sre-access-requests since I don't think there's anything actionable for clinic duty [12:56:29] (03CR) 10Btullis: [V: 03+1 C: 03+2] Update airflow sqlalchemy URI for all airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/943557 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [12:57:22] !log installing 6.1.38 kernels on Bookworm hosts [12:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:39] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10stjn) ‘Readers expect us to dump everything on one page’ is just your opinion, and so is ‘from usability standpoint, it’s better... [12:57:47] (03PS1) 10Esanders: Explicitly set DiscussionToolsAutoTopicSubEditor to discussiontoolsapi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943558 [12:58:30] (03CR) 10CI reject: [V: 04-1] Explicitly set DiscussionToolsAutoTopicSubEditor to discussiontoolsapi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943558 (owner: 10Esanders) [13:00:06] James_F: (Dis)respected human, time to deploy UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1300). Please do the needful. [13:00:06] James_F: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:10] !log CI Jenkins upgraded to 2.401.3: https://phabricator.wikimedia.org/T342572 [13:00:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:25] Perfect timing. jnuche: OK to proceed? [13:00:36] James_F: indeed, please go ahead [13:01:05] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/941441 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [13:01:06] * Lucas_WMDE is confused about this window [13:01:29] (03CR) 10Elukey: changeprop: allow to tune monitoring container's resources (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [13:02:53] (03CR) 10Jforrester: [C: 03+2] DumpInterwiki: Add f: as interwiki for wikifunctions [extensions/WikimediaMaintenance] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942482 (https://phabricator.wikimedia.org/T325908) (owner: 10Jforrester) [13:02:59] (03CR) 10Jforrester: [C: 03+2] DumpInterwiki: Set Forward=yes to wikifunctions: [extensions/WikimediaMaintenance] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942483 (https://phabricator.wikimedia.org/T342909) (owner: 10Jforrester) [13:03:07] 10SRE, 10SRE-Access-Requests: Requesting access to Wiki Replicas end-to-end tiers for dr0ptp4kt - https://phabricator.wikimedia.org/T343039 (10fgiunchedi) Hi @dr0ptp4kt, could you help me understand what kind of access you are after (i.e. what hosts/service/commands) ? As far as I can see `analytics-privatedat... [13:03:43] (03CR) 10Btullis: [C: 03+2] Upgrade the search instance of airflow to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933088 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [13:04:24] Lucas_WMDE: since you're here, can you check the progress on https://phabricator.wikimedia.org/T315510 for me? thanks :) [13:04:58] ah, I forgot I still had a script running ^^ sure [13:05:19] (03Merged) 10jenkins-bot: DumpInterwiki: Add f: as interwiki for wikifunctions [extensions/WikimediaMaintenance] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942482 (https://phabricator.wikimedia.org/T325908) (owner: 10Jforrester) [13:05:28] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 (10Cyndymediawiksim) >>! In T342230#9055196, @fgiunchedi wrote: > @Cyndymediawiksim you are now part of the `wmf` ldap group, please verify and confirm access works as expec... [13:05:29] wait, do I? my tmux session is gone at least [13:05:32] * Lucas_WMDE reads the comments [13:06:01] (03Merged) 10jenkins-bot: DumpInterwiki: Set Forward=yes to wikifunctions: [extensions/WikimediaMaintenance] (wmf/1.41.0-wmf.19) - 10https://gerrit.wikimedia.org/r/942483 (https://phabricator.wikimedia.org/T342909) (owner: 10Jforrester) [13:06:32] MatmaRex: looks like urbanecm’s s7 instance is somewhere in viwiki territory [13:06:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49816 and previous config saved to /var/cache/conftool/dbconfig/20230731-130657-root.json [13:07:12] and according to /home/urbanecm/matmarex-T315510.log, “Processed 154100 (updated 332) of 8069193 rows” so far [13:07:13] T315510: Start maintenance script to backfill talk page comment database - https://phabricator.wikimedia.org/T315510 [13:07:22] though I’m not 100% sure that’s correct [13:07:30] (Not accepting/receiving prefixes from anycast BGP peer) firing: Alert for device cr2-codfw.wikimedia.org - Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [13:07:31] no, the file was last changed 2023-06-14, that’s definitely not up to date [13:07:52] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to wmf for Cyndymediawiksim - https://phabricator.wikimedia.org/T342230 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi For sure, thank you @Cyndymediawiksim [13:08:12] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Alexey_Skripnik) >>! In T275319#9055247, @stjn wrote: > From usability standpoint, it’s better to have a page that doesn’t weigh... [13:08:18] 10ops-eqiad, 10Traffic: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [13:08:19] hmm [13:08:24] 10ops-eqiad, 10Traffic: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) We finally managed to reinstall lvs1016, thanks for all the support! [13:08:31] (that line also started with “arwiki”, I just wasn’t paying attention) [13:08:37] (03CR) 10Kamila Součková: [C: 03+1] changeprop: allow to tune monitoring container's resources (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [13:08:37] don’t think I can check the status in more detail then [13:08:55] 10ops-eqiad, 10Traffic: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Vgutierrez) 05Open→03Resolved [13:08:57] but it seems to be doing something, ca. 89% CPU consistently [13:09:46] (03CR) 10FNegri: [C: 03+2] irc: Handle custom logging formatters [software/pywmflib] - 10https://gerrit.wikimedia.org/r/940968 (https://phabricator.wikimedia.org/T341793) (owner: 10FNegri) [13:09:49] !log jforrester@deploy1002 Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: (no justification provided) (duration: 07m 16s) [13:10:34] (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:11:01] (03PS2) 10Elukey: changeprop: allow to tune monitoring container's resources [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) [13:11:15] (03PS4) 10Elukey: services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) [13:11:42] (03PS5) 10Elukey: services: shift changeprop's cpu resources from main app to the prometheus [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) [13:11:55] (03CR) 10Elukey: changeprop: allow to tune monitoring container's resources (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943038 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [13:12:30] (Not accepting/receiving prefixes from anycast BGP peer) firing: (2) Alert for device cr1-codfw.wikimedia.org - Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [13:12:50] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941995 (https://phabricator.wikimedia.org/T325910) (owner: 10EpicPupper) [13:13:02] !log WikiLambda backport verified for T342891 T342687 T341500 T343006 T342901 and T343041 [13:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:17] T342901: MediaWiki\Extension\WikiLambda\ZErrorException: Not wellformed - https://phabricator.wikimedia.org/T342901 [13:13:18] T343006: Internal error when viewing /view/en/Special:WhatLinksHere - https://phabricator.wikimedia.org/T343006 [13:13:18] T343041: Links from wikitext to Objects should go to the /view/[lang]/ path not /wiki/ - https://phabricator.wikimedia.org/T343041 [13:13:18] T342687: Object editor: Altering the label (Z2K3) in edit mode activates "done" in the dialog but doesn't activate the page-level edit mode, but others (Z2K4/Z2K5) do - https://phabricator.wikimedia.org/T342687 [13:13:18] T342891: Under `view/lang/zid` routes, Edit pages are rendered with "viewmode=true" even if `?action=edit` is present - https://phabricator.wikimedia.org/T342891 [13:13:19] T341500: Implement UX to select natural language one reads and edits Wikifunctions - https://phabricator.wikimedia.org/T341500 [13:13:30] (03Merged) 10jenkins-bot: Remove F: namespace alias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/941995 (https://phabricator.wikimedia.org/T325910) (owner: 10EpicPupper) [13:14:29] !log jforrester@deploy1002 Started scap: Backport for [[gerrit:941995|Remove F: namespace alias (T325910)]] [13:14:33] T325910: Remove "F" namespace aliases - https://phabricator.wikimedia.org/T325910 [13:15:34] (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:15:44] (03CR) 10Btullis: [C: 03+2] Upgrade the research instance of airflow to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933089 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [13:15:57] (03PS4) 10Btullis: Upgrade the research instance of airflow to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933089 (https://phabricator.wikimedia.org/T336286) [13:16:45] (03Merged) 10jenkins-bot: irc: Handle custom logging formatters [software/pywmflib] - 10https://gerrit.wikimedia.org/r/940968 (https://phabricator.wikimedia.org/T341793) (owner: 10FNegri) [13:19:19] (03Abandoned) 10Volans: constants: add knams as supported PoP datacenter [software/pywmflib] - 10https://gerrit.wikimedia.org/r/933431 (https://phabricator.wikimedia.org/T340465) (owner: 10Volans) [13:20:37] (03Abandoned) 10Volans: sre.discovery: add support for knams as PoP DC [cookbooks] - 10https://gerrit.wikimedia.org/r/933433 (https://phabricator.wikimedia.org/T340465) (owner: 10Volans) [13:22:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49817 and previous config saved to /var/cache/conftool/dbconfig/20230731-132201-root.json [13:23:01] (03PS2) 10Volans: Install hosts: fallback to drmrs [cookbooks] - 10https://gerrit.wikimedia.org/r/933434 (https://phabricator.wikimedia.org/T340465) [13:23:26] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.7 point update - https://phabricator.wikimedia.org/T335575 (10MoritzMuehlenhoff) [13:24:38] !log imported jenkins 2.401.3 to thirdparty/ci for bullseye-wikimedia T342572 [13:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:16] (03PS4) 10FNegri: tcpircbot: add another port for cloud IRC logging [puppet] - 10https://gerrit.wikimedia.org/r/941441 (https://phabricator.wikimedia.org/T342666) [13:26:31] (03CR) 10Andrew Bogott: [C: 03+2] Revert "keystone: hack to reject all new non-alphanumerical project or domain names" [puppet] - 10https://gerrit.wikimedia.org/r/942778 (https://phabricator.wikimedia.org/T341509) (owner: 10Andrew Bogott) [13:27:19] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/941441 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [13:28:46] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "Tested on mwdebug2002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942674 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [13:29:11] !log jforrester@deploy1002 jforrester and epicpupper: Backport for [[gerrit:941995|Remove F: namespace alias (T325910)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [13:29:15] T325910: Remove "F" namespace aliases - https://phabricator.wikimedia.org/T325910 [13:29:16] !log jforrester@deploy1002 jforrester and epicpupper: Continuing with sync [13:29:20] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+1] "Tested on mwdebug, will need to fix the apache config change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942675 (https://phabricator.wikimedia.org/T341859) (owner: 10Giuseppe Lavagetto) [13:29:38] Oy: "13:29:11 Finished sync-testservers (duration: 05m 12s)" [13:30:05] 5 minutes just to sync to five servers (and another 5 to build the docker image). Meh. [13:32:01] (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.2.3 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/943562 [13:32:19] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v1.2.3 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/943562 (owner: 10Volans) [13:32:58] (03PS3) 10Giuseppe Lavagetto: noc: stop serving static files from symlinks [puppet] - 10https://gerrit.wikimedia.org/r/942607 (https://phabricator.wikimedia.org/T341859) [13:33:37] (03CR) 10FNegri: [C: 03+2] tcpircbot: add another port for cloud IRC logging [puppet] - 10https://gerrit.wikimedia.org/r/941441 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [13:37:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49818 and previous config saved to /var/cache/conftool/dbconfig/20230731-133707-root.json [13:38:54] !log jforrester@deploy1002 Finished scap: Backport for [[gerrit:941995|Remove F: namespace alias (T325910)]] (duration: 24m 24s) [13:38:57] T325910: Remove "F" namespace aliases - https://phabricator.wikimedia.org/T325910 [13:39:15] (03PS1) 10Jelto: gitlab: remove cas support [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) [13:40:17] (03PS1) 10Jforrester: Update interwiki map to add f: as an interwiki prefix for Wikifunctions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943564 (https://phabricator.wikimedia.org/T325908) [13:40:20] (03PS1) 10Volans: Upstream release v1.2.3 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/943565 [13:41:03] (03CR) 10Jforrester: [C: 03+2] Update interwiki map to add f: as an interwiki prefix for Wikifunctions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943564 (https://phabricator.wikimedia.org/T325908) (owner: 10Jforrester) [13:41:47] (03CR) 10Btullis: [C: 03+2] Add a missing environment variable to datahub/mae-consumer [deployment-charts] - 10https://gerrit.wikimedia.org/r/943549 (https://phabricator.wikimedia.org/T329514) (owner: 10Btullis) [13:41:59] (03Merged) 10jenkins-bot: Update interwiki map to add f: as an interwiki prefix for Wikifunctions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943564 (https://phabricator.wikimedia.org/T325908) (owner: 10Jforrester) [13:42:46] !log imported fifo-log-demux package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/fifo-log-demux/+/942414) T342154 [13:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:50] (03Merged) 10jenkins-bot: Add a missing environment variable to datahub/mae-consumer [deployment-charts] - 10https://gerrit.wikimedia.org/r/943549 (https://phabricator.wikimedia.org/T329514) (owner: 10Btullis) [13:42:51] T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 [13:43:48] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [13:44:55] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [13:44:57] 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [13:45:06] !log install gtk+3.0 bugfix updates from Bullseye 11.7 point release [13:45:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:07] (03PS2) 10Jelto: gitlab: remove cas support [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) [13:47:49] (03PS2) 10Jforrester: Wikifunctions: Disable the Collection extension for now, broken [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942684 (https://phabricator.wikimedia.org/T342931) [13:47:51] (03PS3) 10Jforrester: tests: Add some PHP testing on logos/config.yaml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942463 [13:47:53] (03PS2) 10Jforrester: Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942685 (https://phabricator.wikimedia.org/T342964) [13:47:57] (03CR) 10Jforrester: [C: 03+2] tests: Add some PHP testing on logos/config.yaml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942463 (owner: 10Jforrester) [13:48:55] (03CR) 10Fabfur: [C: 03+2] Release 2.0.0-4 [debs/file-read-backwards] (debian) - 10https://gerrit.wikimedia.org/r/942491 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur) [13:49:01] !log jforrester@deploy1002 Synchronized wmf-config/interwiki.php: T325908 (duration: 06m 25s) [13:49:05] T325908: Add f: as an interwiki prefix for Wikifunctions - https://phabricator.wikimedia.org/T325908 [13:49:08] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42727/console" [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto) [13:49:18] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942684 (https://phabricator.wikimedia.org/T342931) (owner: 10Jforrester) [13:49:59] (03Merged) 10jenkins-bot: Wikifunctions: Disable the Collection extension for now, broken [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942684 (https://phabricator.wikimedia.org/T342931) (owner: 10Jforrester) [13:50:13] (03Merged) 10jenkins-bot: tests: Add some PHP testing on logos/config.yaml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942463 (owner: 10Jforrester) [13:50:16] !log jforrester@deploy1002 Started scap: Backport for [[gerrit:942684|Wikifunctions: Disable the Collection extension for now, broken (T342931)]] [13:50:20] T342931: "Download as PDF" results in HTTP 502 error - https://phabricator.wikimedia.org/T342931 [13:50:34] (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:51:43] !log jforrester@deploy1002 jforrester: Backport for [[gerrit:942684|Wikifunctions: Disable the Collection extension for now, broken (T342931)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [13:51:54] !log jforrester@deploy1002 jforrester: Continuing with sync [13:52:05] !log reprepro -C main include bullseye-wikimedia gdnsd_3.99.0~alpha2-1_amd64.changes [13:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:14] (03PS1) 10Btullis: Deploy new datahub images [deployment-charts] - 10https://gerrit.wikimedia.org/r/943567 (https://phabricator.wikimedia.org/T341194) [13:54:10] (03CR) 10Volans: [C: 03+2] Upstream release v1.2.3 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/943565 (owner: 10Volans) [13:54:15] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/942683 (https://phabricator.wikimedia.org/T342154) (owner: 10Ssingh) [13:54:48] jouncebot: next [13:54:48] In 1 hour(s) and 35 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1530) [13:54:58] OK, I'll go over but it won't clash. [13:55:03] (03PS1) 10Muehlenhoff: Add library hint for gtk+3.0 [puppet] - 10https://gerrit.wikimedia.org/r/943568 [13:55:15] (03CR) 10Ssingh: [C: 03+2] aptrepo: add component/dnsdist to bookworm [puppet] - 10https://gerrit.wikimedia.org/r/942683 (https://phabricator.wikimedia.org/T342154) (owner: 10Ssingh) [13:55:30] (03PS4) 10Btullis: Update the platform_eng airflow instance to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933090 (https://phabricator.wikimedia.org/T336286) [13:55:34] (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [13:56:08] (03CR) 10Btullis: [C: 03+2] Deploy new datahub images [deployment-charts] - 10https://gerrit.wikimedia.org/r/943567 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [13:56:38] (03PS3) 10Jelto: gitlab: remove cas support [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) [13:56:51] (03Merged) 10jenkins-bot: Deploy new datahub images [deployment-charts] - 10https://gerrit.wikimedia.org/r/943567 (https://phabricator.wikimedia.org/T341194) (owner: 10Btullis) [13:57:13] !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main [13:57:19] (03CR) 10Btullis: [C: 03+2] Update the platform_eng airflow instance to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933090 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [13:57:34] (HelmReleaseBadStatus) firing: Helm release datahub/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [13:57:37] 10SRE, 10Infrastructure-Foundations: Integrate Bookworm 12.1 point update - https://phabricator.wikimedia.org/T343121 (10MoritzMuehlenhoff) [13:57:55] !log jforrester@deploy1002 Finished scap: Backport for [[gerrit:942684|Wikifunctions: Disable the Collection extension for now, broken (T342931)]] (duration: 07m 38s) [13:57:56] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for gtk+3.0 [puppet] - 10https://gerrit.wikimedia.org/r/943568 (owner: 10Muehlenhoff) [13:57:59] T342931: "Download as PDF" results in HTTP 502 error - https://phabricator.wikimedia.org/T342931 [13:58:02] (03Merged) 10jenkins-bot: Upstream release v1.2.3 [software/pywmflib] (debian) - 10https://gerrit.wikimedia.org/r/943565 (owner: 10Volans) [13:58:14] (03CR) 10TrainBranchBot: [C: 03+2] "Approved by jforrester@deploy1002 using scap backport" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942685 (https://phabricator.wikimedia.org/T342964) (owner: 10Jforrester) [13:58:42] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42728/console" [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto) [13:58:55] (03Merged) 10jenkins-bot: Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/942685 (https://phabricator.wikimedia.org/T342964) (owner: 10Jforrester) [13:59:38] !log reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.0-1+wmf12u1_amd64.changes: T342154 [13:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:42] T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 [14:01:48] 10SRE, 10ops-eqiad, 10DBA: db1130 crash memory errors - https://phabricator.wikimedia.org/T343076 (10Jclark-ctr) 05Open→03Resolved @Marostegui replaced Dimm B2 [14:01:59] !log jforrester@deploy1002 Started scap: Backport for [[gerrit:942685|Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)]] [14:02:08] T342964: Add WF: as an alias of Wikifunctions namespace - https://phabricator.wikimedia.org/T342964 [14:02:34] (HelmReleaseBadStatus) resolved: Helm release datahub/main on k8s-staging@eqiad in state pending-upgrade - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s-staging&var-namespace=datahub - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [14:03:22] !log jforrester@deploy1002 jforrester: Backport for [[gerrit:942685|Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option) [14:03:34] !log jforrester@deploy1002 jforrester: Continuing with sync [14:03:46] 10SRE, 10ops-eqiad, 10DBA: db1130 crash memory errors - https://phabricator.wikimedia.org/T343076 (10Marostegui) Thank you John! I will get the host back in production after a few days until making sure it is stable. [14:04:55] (03PS1) 10Marostegui: Revert "db1130: Update notes" [puppet] - 10https://gerrit.wikimedia.org/r/942742 [14:05:29] !log btullis@deploy1002 helmfile [staging] START helmfile.d/services/datahub: apply on main [14:05:57] (03CR) 10Marostegui: [C: 03+2] Revert "db1130: Update notes" [puppet] - 10https://gerrit.wikimedia.org/r/942742 (owner: 10Marostegui) [14:06:32] (JobUnavailable) firing: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:07:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49819 and previous config saved to /var/cache/conftool/dbconfig/20230731-140713-ladsgroup.json [14:07:18] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [14:08:39] 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ssingh) [14:09:27] !log jforrester@deploy1002 Finished scap: Backport for [[gerrit:942685|Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)]] (duration: 07m 27s) [14:09:30] T342964: Add WF: as an alias of Wikifunctions namespace - https://phabricator.wikimedia.org/T342964 [14:10:03] !log btullis@deploy1002 helmfile [staging] DONE helmfile.d/services/datahub: sync on main [14:11:21] (03CR) 10Cory Massaro: "How do we feel about landing and deploying this?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/942017 (owner: 10Cory Massaro) [14:11:32] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:12:03] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.7 point update - https://phabricator.wikimedia.org/T335575 (10MoritzMuehlenhoff) [14:12:13] (03Abandoned) 10Jforrester: Let wikifunctions.org use the Graph system [mediawiki-config] - 10https://gerrit.wikimedia.org/r/740795 (owner: 10Jforrester) [14:13:40] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10Jclark-ctr) @ayounsi What day would work best for you to assist trouble shoot [14:16:32] (JobUnavailable) resolved: (2) Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:16:49] (03PS3) 10Btullis: Upgrade the analytics_product airflow instance to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933091 (https://phabricator.wikimedia.org/T336286) [14:19:02] PROBLEM - Check systemd state on gitlab2002 is CRITICAL: CRITICAL - degraded: The following units failed: sync-gitlab-group-with-ldap.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:21:03] (03PS4) 10Btullis: Upgrade the analytics_product airflow instance to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933091 (https://phabricator.wikimedia.org/T336286) [14:21:31] (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/933091 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [14:21:51] !log btullis@deploy1002 helmfile [codfw] START helmfile.d/services/datahub: apply on main [14:22:20] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49820 and previous config saved to /var/cache/conftool/dbconfig/20230731-142220-ladsgroup.json [14:25:46] !log btullis@deploy1002 helmfile [codfw] DONE helmfile.d/services/datahub: sync on main [14:26:13] !log uploaded python3-wmflib_1.2.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia [14:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:31] (03PS1) 10Filippo Giunchedi: admin: add dbrant to 'restricted' [puppet] - 10https://gerrit.wikimedia.org/r/943575 (https://phabricator.wikimedia.org/T343122) [14:30:46] RECOVERY - Check systemd state on gitlab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:31:24] !log btullis@deploy1002 helmfile [eqiad] START helmfile.d/services/datahub: apply on main [14:32:45] !log imported file-read-backwards package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/file-read-backwards/+/942491) T342154 [14:32:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:50] T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 [14:34:00] PROBLEM - Check systemd state on db2141 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_prometheus-mysqld-exporter@s6.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:34:31] (03CR) 10Vgutierrez: "looking good, could you add a test on modules/varnish/files/test/text/08-mobile-hostnames-rewrite.vtc?" [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [14:34:39] (03CR) 10Hnowlan: services: shift changeprop's cpu resources from main app to the prometheus (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/943039 (https://phabricator.wikimedia.org/T328683) (owner: 10Elukey) [14:35:01] 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [14:35:54] 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) Just a reminder: for `file-read-backwards` package always build with the `-sa` option. ex. ` GIT_PBUILDER_AUTOCONF=no WIKIMEDIA=yes ARCH=amd64 GBP_PBUILDER_DIST=bookworm DIST=bookw... [14:37:26] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49821 and previous config saved to /var/cache/conftool/dbconfig/20230731-143725-ladsgroup.json [14:37:47] (03PS2) 10Majavah: varnish: rewrite m.wikifunctions.org correctly [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) [14:37:49] (03CR) 10Fabfur: [C: 03+2] Release 0.4 [software/prometheus-rdkafka-exporter] - 10https://gerrit.wikimedia.org/r/942613 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur) [14:41:01] (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/42730/console" [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [14:42:48] (03CR) 10Kaleem Bhatti: "David please add review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/937922 (https://phabricator.wikimedia.org/T268203) (owner: 10Kaleem Bhatti) [14:45:03] (03CR) 10Clément Goubert: "This change is ready for review." [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748) (owner: 10Clément Goubert) [14:45:34] !log imported prometheus-rdkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/prometheus-rdkafka-exporter/+/942613) T342154 [14:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:38] T342154: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 [14:45:53] (03PS4) 10Clément Goubert: mediawiki: set requests based on php.workers [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748) [14:46:10] (03CR) 10Majavah: [V: 03+1] varnish: rewrite m.wikifunctions.org correctly (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [14:47:23] 10SRE, 10Traffic, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [14:47:31] !log finished rolling out gdnsd 3.99.0~alpha2 upgrade [14:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:06] (03PS1) 10Kamila Součková: benthos: temporarily disable readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/943578 (https://phabricator.wikimedia.org/T324200) [14:50:02] (03CR) 10Clément Goubert: [C: 03+1] benthos: temporarily disable readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/943578 (https://phabricator.wikimedia.org/T324200) (owner: 10Kamila Součková) [14:50:24] 10SRE-swift-storage, 10collaboration-services: Investigate object storage for Gitlab - https://phabricator.wikimedia.org/T336234 (10eoghan) Hey @MatthewVernon, we're picking up on some of this work again and we'd like to test migrating our object storage to thanos, with a view to running it in production short... [14:51:47] (03CR) 10Jelto: [V: 03+1] "I'm not if we should remove all cas3 config now. We could also start by removing it from the omniauth_providers and don't touch the remain" [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto) [14:52:20] 10SRE-swift-storage, 10collaboration-services: Investigate object storage for Gitlab - https://phabricator.wikimedia.org/T336234 (10eoghan) a:03eoghan [14:52:32] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49823 and previous config saved to /var/cache/conftool/dbconfig/20230731-145232-ladsgroup.json [14:52:34] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance [14:52:36] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [14:52:47] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance [14:52:50] (03CR) 10Kamila Součková: [C: 03+2] benthos: temporarily disable readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/943578 (https://phabricator.wikimedia.org/T324200) (owner: 10Kamila Součková) [14:52:53] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49824 and previous config saved to /var/cache/conftool/dbconfig/20230731-145252-ladsgroup.json [14:52:55] (03CR) 10Jelto: [V: 03+1] gitlab: remove cas support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/943563 (https://phabricator.wikimedia.org/T320390) (owner: 10Jelto) [14:53:45] (03Merged) 10jenkins-bot: benthos: temporarily disable readiness probe [deployment-charts] - 10https://gerrit.wikimedia.org/r/943578 (https://phabricator.wikimedia.org/T324200) (owner: 10Kamila Součková) [14:54:39] (03PS1) 10Elukey: aptrepo: add new key for ROCm repositories [puppet] - 10https://gerrit.wikimedia.org/r/943579 [14:57:37] (03PS1) 10FNegri: tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) [14:58:00] (03CR) 10CI reject: [V: 04-1] tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [15:00:55] (03CR) 10Btullis: [C: 03+2] Upgrade the analytics_product airflow instance to version 2.6.3 [puppet] - 10https://gerrit.wikimedia.org/r/933091 (https://phabricator.wikimedia.org/T336286) (owner: 10Btullis) [15:01:20] (03PS2) 10FNegri: tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) [15:01:45] (03CR) 10CI reject: [V: 04-1] tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [15:04:50] (03PS3) 10FNegri: tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) [15:12:07] 10SRE, 10CAS-SSO, 10Infrastructure-Foundations, 10collaboration-services, and 4 others: migrate gitlab away from the CAS protocol - https://phabricator.wikimedia.org/T320390 (10CodeReviewBot) dancy merged https://gitlab.wikimedia.org/repos/releng/gitlab-settings/-/merge_requests/38 use provider openid_con... [15:12:28] (03CR) 10Jforrester: [C: 03+1] varnish: rewrite m.wikifunctions.org correctly [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [15:12:40] (03CR) 10Elukey: [C: 03+1] hieradata: complete cadvisor rollout on k8s [puppet] - 10https://gerrit.wikimedia.org/r/942426 (https://phabricator.wikimedia.org/T108027) (owner: 10Filippo Giunchedi) [15:15:50] (03CR) 10Volans: [C: 03+1] "LGMT I don't think it would hurt to use the same and we can change it later if needed." [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [15:19:12] (03PS1) 10Ahmon Dancy: gitlab: Use gitlab-settings v1.2.0 [puppet] - 10https://gerrit.wikimedia.org/r/943583 (https://phabricator.wikimedia.org/T320390) [15:19:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49825 and previous config saved to /var/cache/conftool/dbconfig/20230731-151942-ladsgroup.json [15:19:50] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [15:20:01] !log deploying python3-wmflib fleet wide [15:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:31] (03CR) 10FNegri: [C: 03+2] tcpircbot: use same nickname for both bots [puppet] - 10https://gerrit.wikimedia.org/r/943580 (https://phabricator.wikimedia.org/T342666) (owner: 10FNegri) [15:30:05] jan_drewniak: Time to snap out of that daydream and deploy Wikimedia Portals Update. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1530). [15:34:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49826 and previous config saved to /var/cache/conftool/dbconfig/20230731-153448-ladsgroup.json [15:39:36] (03PS1) 10FNegri: Revert "tcpircbot: use same nickname for both bots" [puppet] - 10https://gerrit.wikimedia.org/r/943591 [15:41:24] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/943591 (owner: 10FNegri) [15:42:15] (03CR) 10CI reject: [V: 04-1] Revert "tcpircbot: use same nickname for both bots" [puppet] - 10https://gerrit.wikimedia.org/r/943591 (owner: 10FNegri) [15:42:50] (03PS2) 10FNegri: Revert "tcpircbot: use same nickname for both bots" [puppet] - 10https://gerrit.wikimedia.org/r/943591 [15:46:26] (03CR) 10FNegri: [C: 03+2] Revert "tcpircbot: use same nickname for both bots" [puppet] - 10https://gerrit.wikimedia.org/r/943591 (owner: 10FNegri) [15:48:16] PROBLEM - Check systemd state on db2114 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_prometheus-mysqld-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:49:54] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49827 and previous config saved to /var/cache/conftool/dbconfig/20230731-154954-ladsgroup.json [15:51:42] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943585 (https://phabricator.wikimedia.org/T128546) [15:52:56] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943585 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [15:53:35] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [15:53:48] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943585 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [15:57:00] (03CR) 10Vgutierrez: [C: 03+1] "thanks! (tested locally as well, all good)" [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [15:58:57] (03PS3) 10AOkoth: vrts: add test VM to site [puppet] - 10https://gerrit.wikimedia.org/r/939349 (https://phabricator.wikimedia.org/T340027) [15:58:59] (03PS1) 10AOkoth: vrts: add /var/log/clamav/{clamav,freshclam}.log to rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/943607 [16:01:40] !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:943585| Bumping portals to master (T128546)]] (duration: 06m 44s) [16:01:44] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [16:02:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49828 and previous config saved to /var/cache/conftool/dbconfig/20230731-160220-ladsgroup.json [16:02:24] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [16:03:05] (03CR) 10EoghanGaffney: [C: 03+2] vrts: add /var/log/clamav/{clamav,freshclam}.log to rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/943607 (owner: 10AOkoth) [16:03:34] (KubernetesAPILatency) firing: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [16:03:34] (03CR) 10EoghanGaffney: [C: 03+1] vrts: add /var/log/clamav/{clamav,freshclam}.log to rsyslog [puppet] - 10https://gerrit.wikimedia.org/r/943607 (owner: 10AOkoth) [16:05:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49829 and previous config saved to /var/cache/conftool/dbconfig/20230731-160500-ladsgroup.json [16:05:03] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance [16:05:16] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance [16:07:48] !log btullis@deploy1002 helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main [16:08:05] !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:943585| Bumping portals to master (T128546)]] (duration: 06m 24s) [16:08:09] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [16:08:34] (KubernetesAPILatency) resolved: High Kubernetes API latency (POST pods) on k8s@eqiad - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=eqiad&var-cluster=k8s - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency [16:10:08] (03CR) 10Vgutierrez: [C: 03+2] varnish: rewrite m.wikifunctions.org correctly [puppet] - 10https://gerrit.wikimedia.org/r/942383 (https://phabricator.wikimedia.org/T342846) (owner: 10Majavah) [16:14:48] (03PS1) 10Hnowlan: rest-gateway: add citoid and wikifeeds egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/943609 (https://phabricator.wikimedia.org/T339119) [16:17:23] (03PS1) 10Btullis: Remove an-airflow1003 and its role from puppet [puppet] - 10https://gerrit.wikimedia.org/r/943611 (https://phabricator.wikimedia.org/T315633) [16:17:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49830 and previous config saved to /var/cache/conftool/dbconfig/20230731-161726-ladsgroup.json [16:19:42] !log btullis@cumin1001 START - Cookbook sre.hosts.decommission for hosts an-airflow1003.eqiad.wmnet [16:24:14] (03CR) 10Vgutierrez: [C: 03+1] fifo-log-demux: Add socat as companion package [puppet] - 10https://gerrit.wikimedia.org/r/942446 (https://phabricator.wikimedia.org/T342154) (owner: 10Fabfur) [16:25:24] !log btullis@cumin1001 START - Cookbook sre.dns.netbox [16:25:38] (03CR) 10Btullis: [C: 03+2] Remove an-airflow1003 and its role from puppet [puppet] - 10https://gerrit.wikimedia.org/r/943611 (https://phabricator.wikimedia.org/T315633) (owner: 10Btullis) [16:28:22] !log btullis@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001" [16:30:25] !log btullis@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001" [16:30:25] !log btullis@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:30:25] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1003.eqiad.wmnet [16:32:33] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49831 and previous config saved to /var/cache/conftool/dbconfig/20230731-163232-ladsgroup.json [16:34:53] !log btullis@cumin1001 START - Cookbook sre.wikireplicas.add-wiki [16:37:19] (03PS1) 10Jdlrobson: Design: Provide wordmarks/taglines for Wikiversity projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943614 (https://phabricator.wikimedia.org/T341256) [16:38:50] (03PS1) 10Btullis: Remove db1108 as it is being decommissioned [puppet] - 10https://gerrit.wikimedia.org/r/943615 (https://phabricator.wikimedia.org/T336254) [16:40:34] (03PS1) 10Hnowlan: wmnet: add discovery record for aqs [dns] - 10https://gerrit.wikimedia.org/r/943616 (https://phabricator.wikimedia.org/T342213) [16:41:32] (03CR) 10CI reject: [V: 04-1] wmnet: add discovery record for aqs [dns] - 10https://gerrit.wikimedia.org/r/943616 (https://phabricator.wikimedia.org/T342213) (owner: 10Hnowlan) [16:42:55] !log btullis@cumin1001 START - Cookbook sre.hosts.decommission for hosts db1108.eqiad.wmnet [16:45:19] (03CR) 10Btullis: [C: 03+1] "Thanks elukey. I agree it makes sense to apply this." [puppet] - 10https://gerrit.wikimedia.org/r/941840 (owner: 10Elukey) [16:46:15] (03CR) 10Btullis: [C: 03+2] Remove db1108 as it is being decommissioned [puppet] - 10https://gerrit.wikimedia.org/r/943615 (https://phabricator.wikimedia.org/T336254) (owner: 10Btullis) [16:47:19] (03PS1) 10Jdlrobson: Provide wordmarks for Wikivoyage projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/943617 (https://phabricator.wikimedia.org/T341259) [16:47:23] (03PS2) 10Hnowlan: wmnet: add discovery record for aqs [dns] - 10https://gerrit.wikimedia.org/r/943616 (https://phabricator.wikimedia.org/T342213) [16:47:39] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49832 and previous config saved to /var/cache/conftool/dbconfig/20230731-164738-ladsgroup.json [16:47:40] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance [16:47:43] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [16:47:53] !log btullis@cumin1001 START - Cookbook sre.dns.netbox [16:47:53] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance [16:48:00] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49833 and previous config saved to /var/cache/conftool/dbconfig/20230731-164759-ladsgroup.json [16:50:11] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2023/2024-Q1): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10Andrew) I don't think this is something we should implement right away, but I wonder if al... [16:56:28] /29 [17:00:04] Deploy window MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1700) [17:00:04] ryankemper: #bothumor My software never has bugs. It just develops random features. Rise for Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1700). [17:00:44] !log btullis@cumin1001 Added views for new wiki: gpewiki T338678 [17:00:44] !log btullis@cumin1001 END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) [17:00:50] T338678: Prepare and check storage layer for gpewiki - https://phabricator.wikimedia.org/T338678 [17:01:12] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (FY2023/2024-Q1): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10dcaro) From a conversation in a meet, in order to keep the ability to log messages when ru... [17:02:00] !log btullis@cumin1001 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001" [17:04:19] !log btullis@cumin1001 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001" [17:04:19] !log btullis@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:04:19] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1108.eqiad.wmnet [17:07:54] 10ops-eqiad, 10Data-Platform-SRE, 10decommission-hardware, 10Patch-For-Review: decommission db1108.eqiad.wmnet - https://phabricator.wikimedia.org/T336254 (10BTullis) a:05BTullis→03Jclark-ctr [17:08:07] 10ops-eqiad, 10Data-Platform-SRE, 10decommission-hardware, 10Patch-For-Review: decommission db1108.eqiad.wmnet - https://phabricator.wikimedia.org/T336254 (10BTullis) [17:09:11] (03CR) 10Krinkle: [C: 03+1] highlight.php: Remove ?blame=1 from URLs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/940923 (owner: 10Reedy) [17:12:30] (Not accepting/receiving prefixes from anycast BGP peer) firing: (2) Alert for device cr1-codfw.wikimedia.org - Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [17:12:42] er? [17:12:49] looking [17:13:08] doh2002 [17:17:30] (Not accepting/receiving prefixes from anycast BGP peer) firing: (2) Alert for device cr1-codfw.wikimedia.org - Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [17:18:00] should be resolving [17:18:23] sukhe: what was the issue/resolution? [17:18:32] restarting bird on doh2002 [17:18:45] this is sadly a known bug that we haven't found a resolution for but just the monitoring is in place [17:19:28] what does worry me is that when and if we eventually move the authdns' behind BGP, we need to fix this before that [17:20:24] ack, thanks [17:22:30] (Not accepting/receiving prefixes from anycast BGP peer) resolved: Device cr1-codfw.wikimedia.org recovered from Not accepting/receiving prefixes from anycast BGP peer - https://alerts.wikimedia.org/?q=alertname%3DNot+accepting%2Freceiving+prefixes+from+anycast+BGP+peer [17:48:21] (03CR) 10Jforrester: [C: 03+1] "Per Dan." [puppet] - 10https://gerrit.wikimedia.org/r/942680 (https://phabricator.wikimedia.org/T342199) (owner: 10David Martin) [17:51:08] (03PS10) 10Herron: profile::pyrra::api: create profile [puppet] - 10https://gerrit.wikimedia.org/r/929729 (https://phabricator.wikimedia.org/T302995) [17:53:16] (03PS4) 10Jforrester: wikifunctions: Add timeout values in milliseconds as environment variables [deployment-charts] - 10https://gerrit.wikimedia.org/r/942017 (owner: 10Cory Massaro) [17:53:21] jouncebot: nowandnext [17:53:22] For the next 0 hour(s) and 6 minute(s): MediaWiki infrastucture (UTC late) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T1700) [17:53:22] In 2 hour(s) and 6 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T2000) [17:54:10] (03PS9) 10Herron: profile::pyrra::filesystem: add profile [puppet] - 10https://gerrit.wikimedia.org/r/929731 (https://phabricator.wikimedia.org/T302995) [17:54:28] (03CR) 10Jforrester: [C: 03+2] wikifunctions: Add timeout values in milliseconds as environment variables [deployment-charts] - 10https://gerrit.wikimedia.org/r/942017 (owner: 10Cory Massaro) [17:55:24] (03Merged) 10jenkins-bot: wikifunctions: Add timeout values in milliseconds as environment variables [deployment-charts] - 10https://gerrit.wikimedia.org/r/942017 (owner: 10Cory Massaro) [17:56:22] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49834 and previous config saved to /var/cache/conftool/dbconfig/20230731-175621-ladsgroup.json [17:56:26] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [17:56:26] !log jforrester@deploy1002 helmfile [staging] START helmfile.d/services/wikifunctions: apply [17:56:54] (03CR) 10RLazarus: [C: 03+1] "LGTM, one request below but I don't need to re-review unless you want to discuss it." [deployment-charts] - 10https://gerrit.wikimedia.org/r/943560 (https://phabricator.wikimedia.org/T342748) (owner: 10Clément Goubert) [17:57:18] !log jforrester@deploy1002 helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [17:57:39] !log jforrester@deploy1002 helmfile [codfw] START helmfile.d/services/wikifunctions: apply [17:58:35] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [17:59:36] !log jforrester@deploy1002 helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [17:59:47] !log jforrester@deploy1002 helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [17:59:48] (03PS10) 10Herron: profile::pyrra::filesystem: add profile [puppet] - 10https://gerrit.wikimedia.org/r/929731 (https://phabricator.wikimedia.org/T302995) [18:00:02] (03PS2) 10Herron: thanos-rule: add pyrra filesystem operator output dir to search path [puppet] - 10https://gerrit.wikimedia.org/r/930628 (https://phabricator.wikimedia.org/T302995) [18:00:55] !log jforrester@deploy1002 helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [18:00:59] (03PS3) 10Herron: thanos-rule: add pyrra filesystem operator output dir to search path [puppet] - 10https://gerrit.wikimedia.org/r/930628 (https://phabricator.wikimedia.org/T302995) [18:04:12] (03PS1) 10BCornwall: __init__: Remove duplicate reboot check [cookbooks] - 10https://gerrit.wikimedia.org/r/943620 (https://phabricator.wikimedia.org/T342182) [18:11:28] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49835 and previous config saved to /var/cache/conftool/dbconfig/20230731-181127-ladsgroup.json [18:16:24] (03CR) 10Herron: [V: 03+1] "This change is ready for review." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/929734 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron) [18:19:34] (03PS2) 10BCornwall: init: Optimize puppet disabling on reboot [cookbooks] - 10https://gerrit.wikimedia.org/r/943620 (https://phabricator.wikimedia.org/T342182) [18:20:03] (03CR) 10BCornwall: [V: 03+1 C: 03+2] Allow disabling puppet on reboot (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/939377 (https://phabricator.wikimedia.org/T342182) (owner: 10BCornwall) [18:20:55] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance [18:21:08] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance [18:21:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49836 and previous config saved to /var/cache/conftool/dbconfig/20230731-182114-ladsgroup.json [18:21:18] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [18:26:34] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49837 and previous config saved to /var/cache/conftool/dbconfig/20230731-182633-ladsgroup.json [18:34:20] (03PS1) 10Andrew Bogott: Horizon: update version in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/943626 (https://phabricator.wikimedia.org/T328711) [18:34:51] (03CR) 10Andrew Bogott: [C: 03+2] Horizon: update version in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/943626 (https://phabricator.wikimedia.org/T328711) (owner: 10Andrew Bogott) [18:41:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49838 and previous config saved to /var/cache/conftool/dbconfig/20230731-184140-ladsgroup.json [18:41:42] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance [18:41:45] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [18:41:55] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance [18:42:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49839 and previous config saved to /var/cache/conftool/dbconfig/20230731-184200-ladsgroup.json [18:59:13] (03CR) 10Cwhite: [C: 03+2] Wikifunctions sqoop job: Add missing commandline elements [puppet] - 10https://gerrit.wikimedia.org/r/942680 (https://phabricator.wikimedia.org/T342199) (owner: 10David Martin) [19:03:17] !log xcollazo@deploy1002 Started deploy [airflow-dags/analytics@47f9458]: (no justification provided) [19:03:33] !log xcollazo@deploy1002 Finished deploy [airflow-dags/analytics@47f9458]: (no justification provided) (duration: 00m 16s) [19:48:55] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49840 and previous config saved to /var/cache/conftool/dbconfig/20230731-194854-ladsgroup.json [19:48:59] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [20:00:06] RoanKattouw, Urbanecm, cjming, TheresNoTime, kindrobot, and taavi: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T2000). [20:00:06] koi and Jdlrobson: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [20:04:01] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49841 and previous config saved to /var/cache/conftool/dbconfig/20230731-200401-ladsgroup.json [20:08:25] (I can't deploy this evening) [20:15:11] 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (FY2023/2024-Q1): tcpircbot: enable logging to #wikimedia-cloud-feed - https://phabricator.wikimedia.org/T342666 (10bd808) `lang=irc [20:09] < bd808> flags #wikimedia-cloud-feed logmsgbot_cloud +V [20:09] -ChanServ- logmsgbot_cloud is not r... [20:15:18] me neither, sorry [20:19:07] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49842 and previous config saved to /var/cache/conftool/dbconfig/20230731-201907-ladsgroup.json [20:23:27] (03CR) 10Cwhite: [C: 03+2] hiera: actually delete chunks from loki [puppet] - 10https://gerrit.wikimedia.org/r/929749 (https://phabricator.wikimedia.org/T335610) (owner: 10Cwhite) [20:34:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49843 and previous config saved to /var/cache/conftool/dbconfig/20230731-203413-ladsgroup.json [20:34:16] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance [20:34:18] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [20:34:29] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance [20:34:31] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [20:34:46] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance [20:34:52] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49844 and previous config saved to /var/cache/conftool/dbconfig/20230731-203451-ladsgroup.json [20:34:58] (03PS1) 10Volans: validators: temporary support for esams->knams [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/943637 (https://phabricator.wikimedia.org/T340465) [20:36:37] (03CR) 10Volans: [C: 03+2] "self-merging to unblock Rob" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/943637 (https://phabricator.wikimedia.org/T340465) (owner: 10Volans) [20:37:11] (03Merged) 10jenkins-bot: validators: temporary support for esams->knams [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/943637 (https://phabricator.wikimedia.org/T340465) (owner: 10Volans) [20:37:31] !log volans@cumin1001 START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox [20:44:23] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49845 and previous config saved to /var/cache/conftool/dbconfig/20230731-204422-ladsgroup.json [20:44:27] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [20:45:09] !log volans@cumin1001 END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox [20:55:29] (03CR) 10Volans: "Code looks good, couple of questions inline." [cookbooks] - 10https://gerrit.wikimedia.org/r/943620 (https://phabricator.wikimedia.org/T342182) (owner: 10BCornwall) [20:59:29] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49846 and previous config saved to /var/cache/conftool/dbconfig/20230731-205928-ladsgroup.json [21:00:00] (03PS1) 10Ahmon Dancy: Review access change [dumps] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/943593 [21:00:05] Reedy, sbassett, Maryum, and manfredi: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230731T2100). [21:02:43] (03Abandoned) 10Ahmon Dancy: Review access change [dumps] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/943593 (owner: 10Ahmon Dancy) [21:14:35] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49847 and previous config saved to /var/cache/conftool/dbconfig/20230731-211435-ladsgroup.json [21:18:34] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [21:23:22] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Vladis13) >>! In T275319#9054277, @stjn wrote: > Wikisource editors can absolutely split pages into smaller ones, since those lon... [21:29:41] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49848 and previous config saved to /var/cache/conftool/dbconfig/20230731-212941-ladsgroup.json [21:29:44] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance [21:29:47] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [21:29:57] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance [21:29:58] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance [21:30:11] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance [21:30:17] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49849 and previous config saved to /var/cache/conftool/dbconfig/20230731-213017-ladsgroup.json [21:50:08] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49850 and previous config saved to /var/cache/conftool/dbconfig/20230731-215008-ladsgroup.json [21:50:12] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [21:54:08] 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Jhancock.wm) a:03Jhancock.wm [21:55:03] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Jhancock.wm) a:03Jhancock.wm [21:55:19] (03CR) 10Andrea Denisse: [C: 03+1] "LGTM, thank you!!" [puppet] - 10https://gerrit.wikimedia.org/r/929734 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron) [21:56:58] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install db21[88-95] - https://phabricator.wikimedia.org/T342174 (10Jhancock.wm) a:03Jhancock.wm [21:58:10] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install lists2001.codfw.wmnet - https://phabricator.wikimedia.org/T342375 (10Jhancock.wm) a:03Jhancock.wm [22:00:22] 10SRE, 10ops-codfw, 10DC-Ops, 10Data-Persistence: Q1:rack/setup/install pc201[56] - https://phabricator.wikimedia.org/T342163 (10Jhancock.wm) a:03Jhancock.wm [22:02:12] 10SRE, 10ops-codfw, 10DC-Ops, 10observability: Q1:rack/setup/install titan200[12] - https://phabricator.wikimedia.org/T342300 (10Jhancock.wm) a:03Jhancock.wm [22:02:33] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10stjn) >>! In T275319#9055301, @Alexey_Skripnik wrote: > Could you elaborate on why serving 2.3 Mb of HTML is bad from a usability... [22:05:14] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49851 and previous config saved to /var/cache/conftool/dbconfig/20230731-220514-ladsgroup.json [22:06:32] 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Jhancock.wm) [22:09:27] 10SRE, 10ops-knams, 10DC-Ops: Main Tracking Task for ESAMS Migration to KNAMS - https://phabricator.wikimedia.org/T329219 (10wiki_willy) [22:15:36] PROBLEM - Check systemd state on doc2002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-host-data-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:20:21] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49852 and previous config saved to /var/cache/conftool/dbconfig/20230731-222020-ladsgroup.json [22:33:34] 10ops-eqiad: Inbound interface errors - https://phabricator.wikimedia.org/T342502 (10phaultfinder) [22:35:27] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49853 and previous config saved to /var/cache/conftool/dbconfig/20230731-223526-ladsgroup.json [22:35:28] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance [22:35:31] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [22:35:42] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance [22:35:48] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49854 and previous config saved to /var/cache/conftool/dbconfig/20230731-223547-ladsgroup.json [22:45:00] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49855 and previous config saved to /var/cache/conftool/dbconfig/20230731-224500-ladsgroup.json [22:45:05] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [22:49:34] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Alexey_Skripnik) >>! In T275319#9057235, @stjn wrote: > Because heavy pages load worse for readers, especially on poorer connecti... [23:00:06] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49856 and previous config saved to /var/cache/conftool/dbconfig/20230731-230006-ladsgroup.json [23:13:08] RECOVERY - Check systemd state on doc2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:14:35] 10SRE, 10ops-codfw, 10DC-Ops, 10User-aborrero, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudcontrol200[6-8]-dev, cloudnet200[7-8]-dev - https://phabricator.wikimedia.org/T342456 (10Jhancock.wm) [23:15:13] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Vladis13) >>! In T275319#9057235, @stjn wrote: > (@Vladis13 please keep in mind https://www.mediawiki.org/wiki/Bug_management/Pha... [23:15:13] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49857 and previous config saved to /var/cache/conftool/dbconfig/20230731-231512-ladsgroup.json [23:15:23] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): Q1:rack/setup/install cloudvirt200[4-6]-dev - https://phabricator.wikimedia.org/T342459 (10Jhancock.wm) [23:16:35] (03CR) 10Cwhite: [C: 03+1] pyrra: deploy to thanos-fe hosts [puppet] - 10https://gerrit.wikimedia.org/r/929734 (https://phabricator.wikimedia.org/T302995) (owner: 10Herron) [23:30:19] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49858 and previous config saved to /var/cache/conftool/dbconfig/20230731-233018-ladsgroup.json [23:30:20] !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance [23:30:23] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617 [23:30:34] !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance [23:30:40] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49859 and previous config saved to /var/cache/conftool/dbconfig/20230731-233039-ladsgroup.json [23:44:27] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Reedy) None of this is helping move the discussion forward. Timo's comment in T275319#7947012 is still relevant. And at the sam... [23:45:57] 10SRE, 10Wikimedia-Site-requests, 10serviceops, 10Performance-Team (Radar): Raise limit of $wgMaxArticleSize for Hebrew Wikisource - https://phabricator.wikimedia.org/T275319 (10Vladis13) >>! In T275319#9057297, @Alexey_Skripnik wrote: > Readers don't care directly about the weight of a webpage's HTML. Wha... [23:54:42] !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49860 and previous config saved to /var/cache/conftool/dbconfig/20230731-235442-ladsgroup.json [23:54:46] T342617: Make old columns of externallinks nullable - https://phabricator.wikimedia.org/T342617