[00:01:17] !log (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of T285819 [00:01:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:01:25] T285819: TrainBranchBot merges code to mediawiki-config master branch, causing undeployed code problem - https://phabricator.wikimedia.org/T285819 [00:10:13] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [00:13:53] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [00:16:07] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [00:17:55] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [00:20:17] (03PS4) 10Tim Starling: Include SQL queries in the debug log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701995 [00:22:50] (03CR) 10Tim Starling: [C: 03+2] Include SQL queries in the debug log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701995 (owner: 10Tim Starling) [00:23:35] (03Merged) 10jenkins-bot: Include SQL queries in the debug log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701995 (owner: 10Tim Starling) [00:27:08] !log tstarling@deploy1002 Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s) [00:27:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:27:52] why is there still no multi-file variant of scap? [00:28:56] TimStarling: scap sync-world doesn't count? [00:29:23] does it let me specify a list of files to push out? [00:29:46] not really, but it syncs multiple files 🙂 [00:29:54] then no [00:30:25] the feature set is pretty much the same as the shell script days, but at least we had the excuse of it being hard to implement things in shell script [00:30:50] I tend to just sync bigger directories when necessary [00:31:06] that easily causes wrong order syncs :/ [00:31:32] TimStarling: thanks for getting this one out, btw. [00:31:52] !log tstarling@deploy1002 Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s) [00:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:32:16] I'm using sync-file in a for loop [00:32:58] !log tstarling@deploy1002 Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s) [00:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:05] !log tstarling@deploy1002 Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s) [00:34:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:35:11] !log tstarling@deploy1002 Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s) [00:35:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:36:05] RECOVERY - rpki grafana alert on alert1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [00:36:17] !log tstarling@deploy1002 Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s) [00:36:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:49:27] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [00:51:23] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [01:01:05] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [01:04:59] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [01:18:06] (03CR) 10Ryan Kemper: [C: 03+2] cloudelastic: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702111 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [01:18:35] (03CR) 10Ryan Kemper: [C: 03+2] relforge: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702109 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [01:35:01] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [01:35:29] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [01:52:33] 10SRE, 10serviceops, 10Datacenter-Switchover, 10Performance-Team (Radar): June 2021 Datacenter switchover - https://phabricator.wikimedia.org/T281515 (10Legoktm) Bugs filed as a result of today's switchover: * {T285802} * {T260297} * {T285806} * {T285804} * {T285803} * {T285800} [02:07:17] (03CR) 10Ottomata: "> A lot of people were happy about having info related to their session (kinit vs active session etc..)" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [02:11:02] 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Legoktm) @wiki_willy I think we can do this ASAP now that we've switched over to codfw. [02:20:26] 10SRE, 10serviceops, 10Datacenter-Switchover, 10Performance-Team (Radar): June 2021 Datacenter switchover - https://phabricator.wikimedia.org/T281515 (10Legoktm) Today's summary: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/ENL3P5SA7RSOHPN4ILMXQ2BGBF5XR776/ [02:55:22] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10wkandek) [03:25:42] (03CR) 10Krinkle: "For future reference, this follows-up d9d64bb041b43 (T271260)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702136 (owner: 10Urbanecm) [03:40:09] PROBLEM - SSH on mw1303.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:25:28] PROBLEM - SSH on bast5002 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [04:27:17] RECOVERY - SSH on bast5002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [04:40:55] RECOVERY - SSH on mw1303.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:41:17] (03PS1) 10Marostegui: mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 [04:41:42] (03CR) 10Marostegui: [C: 04-2] "Wait until Thursday" [puppet] - 10https://gerrit.wikimedia.org/r/702255 (owner: 10Marostegui) [05:13:03] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Joe) A few points: * Anyone who works in the technical wikimedia community should be subscribed to wikitech-l * Anyone who releases software should... [05:46:29] !log [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw` [05:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:47] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/suggest/source/{title}/{to} (Suggest a source title to use for translation) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [06:46:35] RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [06:50:22] (03PS5) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [06:53:57] legoktm: correct, it's some perf issues that were discovered thanks to the switchover but not caused by it (putting aside warming up the indices) [06:53:57] (03CR) 10Elukey: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [07:02:10] (03Abandoned) 10Majavah: toolforge: Remove ingress-jobs [puppet] - 10https://gerrit.wikimedia.org/r/699436 (owner: 10Majavah) [07:24:54] 10Puppet, 10Infrastructure-Foundations, 10MobileFrontend (Tracking), 10Readers-Web-Backlog (Tracking), 10User-Jdlrobson: Mobile site does not automatically redirect to desktop version (and not possible to use browser "use desktop view") - https://phabricator.wikimedia.org/T60425 (10Bugreporter) [07:30:45] 10SRE, 10Datacenter-Switchover: switchdc cookbook should perform exponential backoff when checking DNS TTL - https://phabricator.wikimedia.org/T285800 (10Volans) I took a look at the logs and in all runs it took 3 or at most 4 tries (2 or 3 retries) to find the updates, doesn't seem very noisy to me. So betwee... [07:37:38] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet [07:37:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:25] (03PS2) 10Effie Mouzeli: tegola-vector-tiles: Add cronjob for tegola tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 (owner: 10Jgiannelos) [07:41:39] (03PS4) 10Effie Mouzeli: tegola-vector-tiles: add caching support [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [07:43:02] !log filippo@cumin1001 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet [07:43:04] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet [07:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:11] !log filippo@cumin1001 END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet [07:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:41] (03CR) 10Volans: [C: 03+2] "Thanks for the patch!" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/702217 (owner: 10Faidon Liambotis) [07:52:28] (03Merged) 10jenkins-bot: Use allowlist/blocklist instead of whitelist/blacklist [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/702217 (owner: 10Faidon Liambotis) [07:54:25] (03CR) 10Volans: [C: 03+2] "LGTM" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/702218 (owner: 10Faidon Liambotis) [07:55:09] (03Merged) 10jenkins-bot: Fix the wording on some of the reports output [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/702218 (owner: 10Faidon Liambotis) [07:55:27] (03CR) 10Ema: [C: 03+2] varnish: add counters for Varnish SLI [puppet] - 10https://gerrit.wikimedia.org/r/701358 (https://phabricator.wikimedia.org/T284576) (owner: 10Ema) [07:56:51] (03CR) 10Muehlenhoff: [C: 03+2] Convert sretest-logout.py to wmflib.idm [puppet] - 10https://gerrit.wikimedia.org/r/702133 (owner: 10Muehlenhoff) [07:58:54] (03CR) 10Volans: "quick -1 for stretch" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702133 (owner: 10Muehlenhoff) [08:02:00] 10SRE, 10ops-eqiad: Disk failed on thanos-be1003 - https://phabricator.wikimedia.org/T285664 (10fgiunchedi) Thank you @Jclark-ctr ! After a reboot the drive is back as healthy in the controller (but the OS fails to write to it) and shows media failure: ` Enclosure Device ID: 32 Slot Number: 3 Drive's position... [08:02:08] (03CR) 10Volans: "One additional comment" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/702133 (owner: 10Muehlenhoff) [08:05:02] PROBLEM - Check systemd state on ms-fe2005 is CRITICAL: CRITICAL - degraded: The following units failed: swiftrepl-mw.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:09:01] 10SRE, 10Infrastructure-Foundations, 10SRE-tools: Broken disk on thanos-be1003 but not reported / task not opened - https://phabricator.wikimedia.org/T285662 (10Volans) IIRC this already happened at least once. I failed to find the phab task right now, but IIRC one of the suggestion (might be mine or not) wa... [08:16:02] 10SRE, 10observability: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10fgiunchedi) [08:17:49] (03PS1) 10Muehlenhoff: Drop type hint, we still need to support Python 3.5/Stretch [puppet] - 10https://gerrit.wikimedia.org/r/702318 [08:19:25] (03CR) 10Muehlenhoff: "> Patch Set 3: Code-Review+1" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/702133 (owner: 10Muehlenhoff) [08:23:32] PROBLEM - MegaRAID on thanos-be1003 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:23:33] ACKNOWLEDGEMENT - MegaRAID on thanos-be1003 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T285836 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [08:23:37] 10SRE, 10ops-eqiad: Degraded RAID on thanos-be1003 - https://phabricator.wikimedia.org/T285836 (10ops-monitoring-bot) [08:24:01] (03PS2) 10Marostegui: mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 [08:24:27] (03PS1) 10Filippo Giunchedi: swift: set group 'swift' when filesystems are unmounted [puppet] - 10https://gerrit.wikimedia.org/r/702320 [08:24:29] (03CR) 10jerkins-bot: [V: 04-1] mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 (owner: 10Marostegui) [08:25:42] (03CR) 10Muehlenhoff: [C: 03+2] Drop type hint, we still need to support Python 3.5/Stretch [puppet] - 10https://gerrit.wikimedia.org/r/702318 (owner: 10Muehlenhoff) [08:25:46] (03PS3) 10Marostegui: mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 [08:31:17] !log remove sdf1 from thanos-be1003 in swift - T285835 [08:31:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:24] T285835: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 [08:36:33] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/702320 (owner: 10Filippo Giunchedi) [08:37:10] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: set group 'swift' when filesystems are unmounted [puppet] - 10https://gerrit.wikimedia.org/r/702320 (owner: 10Filippo Giunchedi) [08:40:33] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/702224 (https://phabricator.wikimedia.org/T285326) (owner: 10Herron) [08:41:23] (03PS3) 10Jgiannelos: tegola: Add cronjob for tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 [08:41:50] (03CR) 10Jbond: [C: 03+1] "LGTM but will need +1 from otto" [puppet] - 10https://gerrit.wikimedia.org/r/702229 (owner: 10Cwhite) [08:47:01] 10SRE, 10User-jbond: Mapping of servers to stakeholders - https://phabricator.wikimedia.org/T216088 (10jbond) tagging related task: T285058 [08:47:18] !log Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX) [08:47:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:44] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Kormat) >>! In T285806, @Legoktm wrote: > Personally, I (@Legoktm) don't really understand why people aren't subscribed wikitech-l given that's wher... [08:51:37] !log jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked [08:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:04] 10SRE, 10Scap, 10Python3-Porting, 10Release-Engineering-Team (Doing): Porting scap to Python 3 - https://phabricator.wikimedia.org/T279628 (10Jdforrester-WMF) [09:01:31] (03CR) 10Volans: "Thanks for the replies" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/702133 (owner: 10Muehlenhoff) [09:03:40] (03PS1) 10Muehlenhoff: Require cn and uid being passed [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 [09:06:00] (03PS3) 10Jbond: O:puppetmaster::puppetdb: rename role to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) [09:06:02] (03CR) 10Jbond: "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [09:06:57] (03CR) 10jerkins-bot: [V: 04-1] Require cn and uid being passed [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [09:08:26] (03CR) 10Volans: "FYI this was discussed in" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [09:10:01] (03PS1) 10Muehlenhoff: Convert idm-logout.py to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/702323 [09:10:37] (03CR) 10jerkins-bot: [V: 04-1] Convert idm-logout.py to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [09:12:39] (03PS2) 10Muehlenhoff: Convert idm-logout.py to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/702323 [09:14:20] PROBLEM - k8s API server requests latencies on ml-serve-ctrl1001 is CRITICAL: instance=10.64.16.202 verb=CREATE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:18:38] (03PS1) 10Jbond: cloud dev - hiera: add wmflib::expand_path to codfw1dev hiera [puppet] - 10https://gerrit.wikimedia.org/r/702325 (https://phabricator.wikimedia.org/T285539) [09:18:40] (03PS1) 10Jbond: cloud - hiera: add wmflib::expand_path to hiera [puppet] - 10https://gerrit.wikimedia.org/r/702326 (https://phabricator.wikimedia.org/T285539) [09:21:44] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, thank you" [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [09:24:24] (03CR) 10Muehlenhoff: "Tested on idp-test1001" [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [09:26:07] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install (2) new 10G switches - https://phabricator.wikimedia.org/T277340 (10cmooney) Thanks @Cmjohnson be great if we can get the ball rolling. It'd help a lot to get them online early the week starting July 13th, if @Jclark-ctr can help there it would... [09:31:53] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10LSobanski) [09:33:03] (03CR) 10Muehlenhoff: [C: 03+2] swift proxies: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702113 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [09:34:48] RECOVERY - Check systemd state on ms-fe2005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:35:19] !log start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - T162123 [09:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:28] T162123: Refactor swift credentials to be global rather than per-site - https://phabricator.wikimedia.org/T162123 [09:39:22] 10SRE, 10ops-eqiad: Degraded RAID on thanos-be1003 - https://phabricator.wikimedia.org/T285836 (10fgiunchedi) [09:39:55] 10SRE, 10ops-eqiad: Disk failed on thanos-be1003 - https://phabricator.wikimedia.org/T285664 (10fgiunchedi) [09:40:02] 10SRE, 10ops-eqiad, 10User-fgiunchedi: Disk failed on thanos-be1003 - https://phabricator.wikimedia.org/T285664 (10fgiunchedi) [09:42:58] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [09:44:29] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10LSobanski) We are not yet at the point where DC switch is a non event and even when we get there, it's still an operation that can cause broad impac... [09:46:59] 10SRE, 10observability: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10fgiunchedi) [09:47:11] 10SRE, 10observability, 10User-fgiunchedi: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10fgiunchedi) [09:52:14] 10SRE, 10serviceops, 10Datacenter-Switchover: Document communication expectations around planning a DC switchover - https://phabricator.wikimedia.org/T285806 (10Kormat) [10:09:55] (03PS1) 10David Caro: wmcs.puppet_alert: Add more info and differentiate cases [puppet] - 10https://gerrit.wikimedia.org/r/702331 (https://phabricator.wikimedia.org/T285839) [10:12:05] (03CR) 10David Caro: wmcs.puppet_alert: Add more info and differentiate cases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702331 (https://phabricator.wikimedia.org/T285839) (owner: 10David Caro) [10:13:35] (03PS2) 10David Caro: wmcs.puppet_alert: Add more info and differentiate cases [puppet] - 10https://gerrit.wikimedia.org/r/702331 (https://phabricator.wikimedia.org/T285839) [10:13:52] (03CR) 10David Caro: wmcs.puppet_alert: Add more info and differentiate cases (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702331 (https://phabricator.wikimedia.org/T285839) (owner: 10David Caro) [10:15:58] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10dcaro) [10:16:01] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudcephosd1018 - https://phabricator.wikimedia.org/T285799 (10dcaro) [10:17:13] (03PS2) 10Jbond: Require cn and uid being passed [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [10:17:46] (03CR) 10Jbond: [C: 03+1] "> Patch Set 1:" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [10:19:19] 10SRE, 10DBA, 10Datacenter-Switchover: switchdc should automatically downtime "Read only" checks on DB masters being switched - https://phabricator.wikimedia.org/T285803 (10Kormat) To clarify, there's no way to only downtime a specific icinga check across multiple machines (that i know of). I used `sre.hosts... [10:20:26] 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudcephosd1018 - https://phabricator.wikimedia.org/T285799 (10dcaro) Added the relation with the other one to keep track, but please redo to whatever workflow you prefer (maybe just commenting, inverting the parent-child, another task...). [10:21:26] (03CR) 10Hnowlan: [C: 03+1] "lgtm - I'm happy to do a staged rollout to the buster nodes first just to be extra safe if needs be." [puppet] - 10https://gerrit.wikimedia.org/r/702114 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [10:21:58] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [10:24:01] (03CR) 10Volans: [C: 03+2] "> Patch Set 2: Code-Review+1" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [10:25:48] (03CR) 10Kormat: [C: 04-1] mariadb: Set core sections to unidir replication. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702255 (owner: 10Marostegui) [10:26:27] (03Merged) 10jenkins-bot: Require cn and uid being passed [software/pywmflib] - 10https://gerrit.wikimedia.org/r/702322 (owner: 10Muehlenhoff) [10:26:59] (03PS4) 10Marostegui: mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 [10:29:32] (03CR) 10Muehlenhoff: [C: 03+2] Convert idm-logout.py to wmflib [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [10:31:29] !log add 200G to prometheus/eqiad for 'ops' instance [10:31:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:36] (03CR) 10Kormat: [C: 03+1] mariadb: Set core sections to unidir replication. [puppet] - 10https://gerrit.wikimedia.org/r/702255 (owner: 10Marostegui) [10:35:34] (03CR) 10Muehlenhoff: "> My 2c: let's add a specific hiera setting only for stat/airflow roles, and also for the roles that Moritz thinks it fits (like SRE-speci" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [10:43:02] 10SRE, 10serviceops: upgrade mwmaint servers to buster - https://phabricator.wikimedia.org/T267607 (10MoritzMuehlenhoff) So to summarise; the plan is to reimage mwmaint1002 now that eqiad is passive and the reimage mwmaint2002 once eqiad is primary again? [10:44:25] !log installing gnutls security updates on buster [10:44:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:28] (03Abandoned) 10Muehlenhoff: relforge::Switch to profile::nginx [puppet] - 10https://gerrit.wikimedia.org/r/702106 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [10:51:17] (03CR) 10Volans: "post-merge comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [10:55:44] (03CR) 10Muehlenhoff: Convert idm-logout.py to wmflib (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702323 (owner: 10Muehlenhoff) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1100). [11:00:05] Lucas_WMDE and Lucas_WMDE: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:11] o/ [11:00:17] looks like nobody else wanted to deploy today [11:00:23] *in this window [11:01:07] (03PS2) 10Lucas Werkmeister (WMDE): Stop setting Wikibase client repoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701504 (https://phabricator.wikimedia.org/T257260) [11:01:13] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Stop setting Wikibase client repoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701504 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:02:01] (03Merged) 10jenkins-bot: Stop setting Wikibase client repoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701504 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:02:35] testing on mwdebug2001 [11:06:15] everything working as far as I can tell, syncing [11:08:04] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701504|Stop setting Wikibase client repoConceptBaseUri (T257260)]] (duration: 01m 24s) [11:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:11] T257260: entitysources: Clean up any remainders of the legacy back/compat config in the mediawiki-config repository - https://phabricator.wikimedia.org/T257260 [11:09:03] !log lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache [11:09:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:42] alright, watching for any error reports before deploying the second config change [11:10:00] (03PS2) 10David Caro: wmcs-dns-floating-ip-updater: do a more granular retry [puppet] - 10https://gerrit.wikimedia.org/r/701506 (https://phabricator.wikimedia.org/T285537) [11:10:02] (03PS2) 10David Caro: wmcs.labs-ip-alias-dump: add a retry [puppet] - 10https://gerrit.wikimedia.org/r/701515 (https://phabricator.wikimedia.org/T285537) [11:11:45] !log installing libgcrypt security updates on buster [11:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:28] (03PS1) 10Jbond: P:multirrootca: Add ability for intermediates to define the default usage [puppet] - 10https://gerrit.wikimedia.org/r/702343 (https://phabricator.wikimedia.org/T285850) [11:20:18] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30045/console" [puppet] - 10https://gerrit.wikimedia.org/r/702343 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [11:35:55] !log rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates [11:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:32] haven’t noticed any breakage so I’ll proceed with my second config change [11:38:08] (03PS2) 10Lucas Werkmeister (WMDE): Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) [11:38:32] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:39:02] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-1] Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:39:16] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-1] "I missed a reference to the variable. One moment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:40:12] (03PS1) 10KartikMistry: Update cxserver to 2021-06-30-112813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/702346 (https://phabricator.wikimedia.org/T284900) [11:40:40] (03PS3) 10Lucas Werkmeister (WMDE): Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) [11:41:02] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:41:47] (03Merged) 10jenkins-bot: Remove $wmgWikibaseClientRepoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701505 (https://phabricator.wikimedia.org/T257260) (owner: 10Lucas Werkmeister (WMDE)) [11:42:13] quick check on mwdebug2001 [11:42:22] Lucas_WMDE: Let me know once you're done with backport window. I need to update cxserver change. [11:42:27] ok sure [11:43:00] 10SRE, 10DBA, 10Datacenter-Switchover: When switching DCs, update pc hosts in tendril - https://phabricator.wikimedia.org/T266723 (10Kormat) @Legoktm : great, thanks! [11:43:33] should I `scap sync-file tests/multiversion/StaticSettingsTest.php` or skip it? [11:43:52] I assume that’s not actually loaded in prod but maybe we still want the latest version on the servers to avoid confusion [11:43:57] Lucas_WMDE: i usually skip tests [11:44:15] ok thanks [11:44:15] (no technical need to sync it, but also no harm in doing so) [11:44:30] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701505|Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 01m 16s) [11:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:37] T257260: entitysources: Clean up any remainders of the legacy back/compat config in the mediawiki-config repository - https://phabricator.wikimedia.org/T257260 [11:46:04] (03PS2) 10Muehlenhoff: maps: Switch buster nodes to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702114 (https://phabricator.wikimedia.org/T164456) [11:46:06] (03PS1) 10Muehlenhoff: Switch remaining (stretch) maps hosts to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/702347 (https://phabricator.wikimedia.org/T164456) [11:46:06] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701505|Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s) [11:46:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:15] !log EU backport+config window done [11:46:17] kart_: go ahead [11:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:22] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/702114 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [11:48:34] Lucas_WMDE: :) [11:48:40] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/702347 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [11:51:25] (03CR) 10JMeybohm: [C: 04-1] "You'll need to bump chart version" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [11:52:00] (03PS1) 10Jbond: O:pki::multirootca: update the default usages for discovery [puppet] - 10https://gerrit.wikimedia.org/r/702348 (https://phabricator.wikimedia.org/T285850) [11:52:35] (03CR) 10Muehlenhoff: "As discussed on IRC, split for buster/stretch." [puppet] - 10https://gerrit.wikimedia.org/r/702114 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [11:52:46] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30046/console" [puppet] - 10https://gerrit.wikimedia.org/r/702348 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [11:56:52] (03PS1) 10Muehlenhoff: Rename logout class [puppet] - 10https://gerrit.wikimedia.org/r/702350 [11:57:12] (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2021-06-30-112813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/702346 (https://phabricator.wikimedia.org/T284900) (owner: 10KartikMistry) [11:57:23] (03CR) 10Volans: [C: 03+1] "Thanks! lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/702350 (owner: 10Muehlenhoff) [11:59:35] (03Merged) 10jenkins-bot: Update cxserver to 2021-06-30-112813-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/702346 (https://phabricator.wikimedia.org/T284900) (owner: 10KartikMistry) [12:00:16] (03Abandoned) 10Muehlenhoff: Rename logout class [puppet] - 10https://gerrit.wikimedia.org/r/702350 (owner: 10Muehlenhoff) [12:00:22] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:multirrootca: Add ability for intermediates to define the default usage [puppet] - 10https://gerrit.wikimedia.org/r/702343 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [12:00:25] (03CR) 10Jbond: [V: 03+1 C: 03+2] O:pki::multirootca: update the default usages for discovery [puppet] - 10https://gerrit.wikimedia.org/r/702348 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [12:01:36] !log kartik@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [12:01:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:09] 10SRE, 10DBA, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Kormat) >>! In T285519#7182387, @Krinkle wrote: > Yes, this would be my preferred outcome as well. This would be least risky and easiest to reason a... [12:05:49] (03CR) 10JMeybohm: [C: 04-1] "Don't you need to define the `postgres:` stuff from charts/tegola-vector-tiles/values.yaml here somwhere?" (037 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [12:06:47] !log kartik@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [12:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:07] !log kartik@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [12:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:00] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet [12:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:24] PROBLEM - SSH on mw1297.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:15:18] (03CR) 10JMeybohm: [C: 03+1] Switch docker registry to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/698803 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [12:17:40] !log Updated cxserver to 2021-06-30-112813-production (T284900, T284885) [12:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:48] T284885: Create Wikipedia Tachelhit - https://phabricator.wikimedia.org/T284885 [12:17:49] T284900: Support more language pairs with Elia MT - https://phabricator.wikimedia.org/T284900 [12:19:04] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet [12:19:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:36] 10SRE, 10observability, 10User-fgiunchedi: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10fgiunchedi) I've removed the faulty disk from swift ring, however the errors didn't stop and multipart uploads for big blocks are still happening. The timing of this behavior sta... [12:24:12] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet [12:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:24] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet [12:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:40] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet [12:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:04] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet [12:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:15] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet [12:39:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:34] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet [12:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:13] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet [12:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:18] (03PS2) 10Jelto: fix cleanup of config backups, make script more robust [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) [12:50:09] (03CR) 10JMeybohm: [C: 04-1] "Could you describe (maybe in commit message) how you want this job to behave?" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 (owner: 10Jgiannelos) [12:52:52] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet [12:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:04] PROBLEM - SSH on logstash2021.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:53:55] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet [12:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:24] (03CR) 10Jelto: "> Patch Set 1: Code-Review-1" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto) [12:55:35] 10SRE, 10DBA, 10Datacenter-Switchover, 10Patch-For-Review: Figure out how x2 should be handled in DC switchover - https://phabricator.wikimedia.org/T285519 (10Marostegui) Ok, so we need to treat it like parsercache in that regard. [12:57:52] (03PS3) 10Dzahn: gitlab: add parameter for active_host, limit backups to it [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) [12:59:50] (03CR) 10JMeybohm: [C: 03+1] "This LGTM now." [deployment-charts] - 10https://gerrit.wikimedia.org/r/671204 (https://phabricator.wikimedia.org/T264006) (owner: 10Mstyles) [13:00:35] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet [13:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:06] (03CR) 10Dzahn: [V: 03+1 C: 03+2] Switch docker registry to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/698803 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [13:04:29] !log switching docker-registry to nginx light variant T164456 [13:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:36] T164456: Migrate to nginx-light - https://phabricator.wikimedia.org/T164456 [13:05:54] (03PS5) 10Effie Mouzeli: tegola-vector-tiles: add caching support [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [13:06:39] (03CR) 10Effie Mouzeli: "> Patch Set 4: Code-Review-1" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [13:10:51] (03CR) 10Ottomata: "+1 thats fine for sure. And yeah we only need it on client roles. So only on:" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [13:11:10] (03CR) 10Ottomata: "Oh, and" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [13:11:30] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet [13:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:53] (03PS1) 10Jbond: P:pki::multirootca: enable default_usages feature [puppet] - 10https://gerrit.wikimedia.org/r/702362 (https://phabricator.wikimedia.org/T285850) [13:12:37] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30047/console" [puppet] - 10https://gerrit.wikimedia.org/r/702362 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [13:13:30] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:pki::multirootca: enable default_usages feature [puppet] - 10https://gerrit.wikimedia.org/r/702362 (https://phabricator.wikimedia.org/T285850) (owner: 10Jbond) [13:14:08] (03PS6) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [13:15:48] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10dcaro) [13:16:03] (03CR) 10Ottomata: [C: 03+1] "cwhite do you have a Kerberos account yet? If so you'll need that too." [puppet] - 10https://gerrit.wikimedia.org/r/702229 (owner: 10Cwhite) [13:16:12] (03CR) 10Ottomata: [C: 03+2] admin: add cwhite to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/702229 (owner: 10Cwhite) [13:16:21] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10dcaro) [13:16:30] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10dcaro) [13:16:52] (03CR) 10Muehlenhoff: "Ack, I'll update the patch with the needed roles and will re-run PCC" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [13:18:29] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet [13:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:51] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet [13:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:10] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30048/console" [puppet] - 10https://gerrit.wikimedia.org/r/702123 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:23:53] 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Epic: Support maps serving for affiliate sites via an allow list - https://phabricator.wikimedia.org/T261694 (10valerio.bozzolan) >>! In T261694#7172223 Kind up to approve at least https://barriere.wikimedia.it/ :) It is a project from W... [13:24:28] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/701936 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [13:25:01] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet [13:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:25] !log installing fluidsynth security updates on stretch [13:26:28] !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet [13:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:54] (03CR) 10Dzahn: [C: 03+2] "thanks for compiling" [puppet] - 10https://gerrit.wikimedia.org/r/702123 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:30:39] (03CR) 10Dzahn: "noop on gitlab1001 confirmed" [puppet] - 10https://gerrit.wikimedia.org/r/702123 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:32:32] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet [13:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:13] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30049/console" [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:36:16] (03PS1) 10David Caro: wmcs.ceph: update disk models should disable write cache [puppet] - 10https://gerrit.wikimedia.org/r/702370 (https://phabricator.wikimedia.org/T285858) [13:37:26] PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.514e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact [13:41:02] (03CR) 10Jbond: "pupet side of things looks good, have added some picky comments but nothing critical or blocking. accept the ones around profile::pki::get" (0311 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701530 (https://phabricator.wikimedia.org/T264209) (owner: 10JMeybohm) [13:41:11] (03PS1) 10David Caro: wmcs.ceph: Add cloudcephosd1016 to the ceph osd role [puppet] - 10https://gerrit.wikimedia.org/r/702374 (https://phabricator.wikimedia.org/T285858) [13:41:26] (03CR) 10Jelto: [V: 03+1 C: 03+1] "> Patch Set 3: Verified+1" [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:42:56] RECOVERY - k8s API server requests latencies on ml-serve-ctrl1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:47:16] PROBLEM - k8s API server requests latencies on ml-serve-ctrl1002 is CRITICAL: instance=10.64.48.64 verb=CREATE https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [13:49:59] (03PS4) 10Dzahn: gitlab: add parameter for active_host, limit backups to it [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) [13:52:43] (03CR) 10Dzahn: [C: 03+2] gitlab: add parameter for active_host, limit backups to it [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:53:10] (03CR) 10Dzahn: [C: 03+2] gitlab: add parameter for active_host, limit backups to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702126 (https://phabricator.wikimedia.org/T285456) (owner: 10Dzahn) [13:53:52] RECOVERY - SSH on logstash2021.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:56:42] RECOVERY - Thanos compact has not run on alert1001 is OK: (C)24 ge (W)12 ge 0.009367 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact [13:58:14] whois holger [13:59:50] jbond: a question for the ages [13:59:58] :) [14:01:43] https://wikimediafoundation.org/profile/holger-knust/ [14:07:49] (03CR) 10JMeybohm: [C: 04-1] tegola-vector-tiles: add caching support (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [14:09:15] (03PS6) 10Effie Mouzeli: tegola-vector-tiles: add caching support [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [14:10:13] (03PS4) 10Jgiannelos: tegola: Add cronjob for tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 [14:10:25] (03CR) 10Andrew Bogott: [C: 03+1] wmcs.ceph: Add cloudcephosd1016 to the ceph osd role [puppet] - 10https://gerrit.wikimedia.org/r/702374 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro) [14:12:33] (03PS1) 10Muehlenhoff: Rename class [puppet] - 10https://gerrit.wikimedia.org/r/702377 [14:13:32] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10Dzahn) getting requests for ops-maintenance group access. These things have been checkboxes in the general onboarding tickets but there doesn't seem to be... [14:13:58] (03CR) 10David Caro: [C: 03+2] wmcs.ceph: Add cloudcephosd1016 to the ceph osd role [puppet] - 10https://gerrit.wikimedia.org/r/702374 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro) [14:14:02] (03CR) 10David Caro: [C: 03+2] wmcs.ceph: update disk models should disable write cache [puppet] - 10https://gerrit.wikimedia.org/r/702370 (https://phabricator.wikimedia.org/T285858) (owner: 10David Caro) [14:14:16] (03PS7) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:16:06] RECOVERY - SSH on mw1297.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:16:28] (03CR) 10jerkins-bot: [V: 04-1] tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [14:20:04] (03CR) 10JMeybohm: [C: 03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/701369 (owner: 10Jgiannelos) [14:20:10] (03PS5) 10Jgiannelos: tegola: Add cronjob for tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 [14:21:41] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10BTullis) @Dzahn - Thanks, yes I requested it, because it was an entry on [[ https://office.wikimedia.org/wiki/Technology/Onboarding/Checklists/Ben_Tullis#... [14:23:30] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10Dzahn) @BTullis Thanks, yes, I approved it. Welcome to WMF! [14:25:34] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/702377 (owner: 10Muehlenhoff) [14:25:48] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10BTullis) I also have an item on my checklist to say that I should be in the `cn=ops` LDAP group. There are instructions on how I can add myself to that g... [14:30:11] 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10wiki_willy) Thanks @Legoktm, much appreciated! [14:32:01] (03CR) 10Ema: varnish: Add listen on UDS support (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [14:32:18] (03PS8) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:32:50] 10SRE, 10decommission-hardware, 10serviceops, 10Patch-For-Review: decom 44 eqiad appservers purchased on 2016-04-12/13 (mw1261 through mw1301) - https://phabricator.wikimedia.org/T280203 (10Dzahn) Yes, we can start with the lower hanging fruit like canaries here: https://gerrit.wikimedia.org/r/c/operatio... [14:33:14] (03PS1) 10Urbanecm: Initial configuration for dagwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) [14:37:41] (03PS1) 10Urbanecm: Initial configuration for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) [14:40:17] (03PS9) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:42:18] 10SRE, 10GitLab, 10serviceops, 10vm-requests: codfw: 1 of VMs requested for gitlab - https://phabricator.wikimedia.org/T285456 (10Jelto) [14:43:04] 10SRE, 10GitLab, 10serviceops, 10vm-requests: codfw: 1 of VMs requested for gitlab - https://phabricator.wikimedia.org/T285456 (10Jelto) 05Open→03Resolved [14:43:06] (03CR) 10Muehlenhoff: [C: 03+2] Rename class [puppet] - 10https://gerrit.wikimedia.org/r/702377 (owner: 10Muehlenhoff) [14:46:02] (03PS10) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:48:11] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/701936 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [14:49:22] PROBLEM - SSH on mw1303.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:50:17] (03PS6) 10Jgiannelos: tegola: Add cronjob for tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 [14:50:28] 10SRE, 10observability, 10User-fgiunchedi: Thanos bucket operations sporadic errors - https://phabricator.wikimedia.org/T285835 (10fgiunchedi) The compactor eventually does make progress and is able to upload a block, I'm assuming it has to do with large blocks (i.e. I've seen codfw/eqiad compactions fails b... [14:51:15] (03PS1) 10Urbanecm: Initial configuration for banwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702410 (https://phabricator.wikimedia.org/T284389) [14:51:57] jouncebot: next [14:51:58] In 0 hour(s) and 8 minute(s): New wikis creation (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1500) [14:52:36] (03CR) 10Jgiannelos: "> Patch Set 3: Code-Review-1" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 (owner: 10Jgiannelos) [14:53:55] (03PS7) 10Jgiannelos: tegola: Add cronjob for tiles pregeneration [deployment-charts] - 10https://gerrit.wikimedia.org/r/701938 [14:56:14] (03PS11) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:57:27] 10SRE, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Bstorm) [14:58:14] (03PS12) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [14:59:16] (03CR) 10JMeybohm: [C: 03+1] tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [15:00:05] Urbanecm and Amir1: Dear deployers, time to do the New wikis creation deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1500). [15:00:08] \o [15:00:18] o/ [15:00:40] \o/ [15:01:02] (I’m not deploying, just excited for new wikis :P ) [15:01:39] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for dagwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) (owner: 10Urbanecm) [15:02:28] (03CR) 10DannyS712: Initial configuration for dagwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) (owner: 10Urbanecm) [15:02:59] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10herron) >>! In T285754#7187160, @BTullis wrote: > I also have an item on my checklist to say that I should be in the `cn=ops` LDAP group. > > There are i... [15:03:01] (03CR) 10DannyS712: Initial configuration for shiwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) (owner: 10Urbanecm) [15:03:40] (03PS2) 10Urbanecm: Initial configuration for dagwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) [15:03:54] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for dagwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) (owner: 10Urbanecm) [15:04:08] (03PS2) 10Urbanecm: Initial configuration for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) [15:04:46] (03PS3) 10Urbanecm: Initial configuration for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) [15:05:18] (03Merged) 10jenkins-bot: Initial configuration for dagwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702383 (https://phabricator.wikimedia.org/T284450) (owner: 10Urbanecm) [15:05:33] 10SRE, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10JoKalliauer) 05Open→03Stalled [15:05:59] pulled to deployment host [15:06:16] pulling to maintenance host [15:06:32] (03CR) 10Volans: "Question inline" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701876 (owner: 10David Caro) [15:06:57] !log sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' [15:07:02] (thanks for the line to Lucas_WMDE) [15:07:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:26] running addwiki script [15:07:30] !log restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates [15:07:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:36] and...fatal! [15:07:58] :( [15:08:02] https://www.irccloud.com/pastebin/HddOlz9m/ [15:08:22] ... [15:08:30] because something expects eqiad ? [15:08:31] What is after main page? [15:08:41] Amir1: not sure, checking if the db got created properly, etc [15:08:44] mutante: nah, refactor wasn't reflected [15:09:06] cool. In the meantime I check what's going on with that [15:09:13] urbanecm: try that on mwmaint1002 [15:09:21] sql dagwiki --write says 10.192.16.12, which is db2123, which is s5 [15:09:22] sgtm [15:09:31] mutante: we definitely did create a wiki while at codfw before [15:10:02] urbanecm: ok, sorry, please ignore that comment, never said that :) [15:10:04] (plus on eqiad it'll probably complain about RO masters anyway) [15:10:16] Amir1: ok, thanks, yea [15:10:59] (03PS8) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [15:11:04] pulling wiki code to mwdebug2001 [15:11:27] (03CR) 10Vgutierrez: "Thanks for your review ema <3" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [15:11:56] livehacking wikiversions.php on mwdebug2001 to reflect new wiki [15:12:10] this looks like another classic case of services not being properly set on the context of the new wiki [15:12:26] wiki is live, but with no mainpage [15:12:31] https://gerrit.wikimedia.org/g/mediawiki/extensions/GlobalPreferences/+/452f856abd5d3653b06bf61ec79a4a80cda3abbf/includes/Hooks.php#281 [15:12:42] urbanecm: it'll survive [15:12:59] i'm just checking what happened :) [15:13:17] i don't think anything after mainpage was done though [15:13:42] we can pass --noedits to skip mainpage for the other two wikis [15:13:51] yeah, let's do that [15:13:59] (03CR) 10Muehlenhoff: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/698976 (https://phabricator.wikimedia.org/T164454) (owner: 10Muehlenhoff) [15:14:02] from scanning the exception, that should do the trick [15:14:33] ok, running rest of addWiki.php in a shell.php session [15:15:23] Amir1: can DannyS712's refactor be the cause? [15:15:47] not super sure tbh [15:16:05] it can be different timing of mw services being added [15:16:25] basically https://gerrit.wikimedia.org/g/mediawiki/extensions/GlobalPreferences/+/452f856abd5d3653b06bf61ec79a4a80cda3abbf/includes/Hooks.php#281 [15:16:26] ok [15:16:41] Amir1: `[urbanecm@mwmaint2002 ~]$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=dagwiki --site-group=wikipedia --force-protocol=https`, does that look right? [15:16:49] (from https://github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/blob/master/addWiki.php#L184) [15:16:56] yes [15:17:00] ok, running [15:17:07] said "done" [15:17:43] GP replaces the main preference factory and for whatever reason we have the old factory [15:17:48] noted, good to go [15:18:06] ran `[urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=dagwiki --backend=local-multiwrite`, no errors, usual output [15:19:41] so it wasn't my fault? [15:20:02] DannyS712: i don't think we're super sure. Depends why is that factory loaded [15:20:04] proceeding with syncs [15:21:16] Amir1: did you fill the addwiki bug as a task, or should I? [15:21:25] not yet [15:21:30] noted [15:21:33] !log urbanecm@deploy1002 Synchronized wmf-config/db-eqiad.php: Creating dagwiki (T284450) (duration: 01m 16s) [15:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:40] T284450: Create Wikipedia Dagbani - https://phabricator.wikimedia.org/T284450 [15:22:04] PROBLEM - Ensure local MW versions match expected deployment on mwmaint2002 is CRITICAL: CRITICAL: 1 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [15:22:16] ^^expected^^ [15:22:55] !log urbanecm@deploy1002 Synchronized wmf-config/db-codfw.php: Creating dagwiki (T284450) (duration: 01m 13s) [15:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:03] [Newprojects] New wiki: [15:23:03] Inbox [15:23:35] Once the wiki is fully set up, it'll be visible at https:// [15:23:36] mutante: yeah, that's because the email didn't come from addwiki, but my shell. Apparently I forgot to define a variable before running that part. [15:23:52] addwiki bug filled as T285878 [15:23:53] T285878: addWiki.php fatals - https://phabricator.wikimedia.org/T285878 [15:23:59] urbanecm: ACK, but the URL part there is also missing in the second mail, the one that actually says "dagwiki" [15:24:08] (03PS1) 10Matthias Mullie: Cleanup old mediainfo dumps [puppet] - 10https://gerrit.wikimedia.org/r/702413 (https://phabricator.wikimedia.org/T273266) [15:24:19] !log urbanecm@deploy1002 Synchronized dblists: Creating dagwiki (T284450) (duration: 01m 16s) [15:24:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:55] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10fgiunchedi) >>! In T285539#7178969, @faidon wrote: > Thank you @jbond fo... [15:25:00] yup, i missed that when resending, and didn't want to send a third mail :D. I'll run addWiki with the noedits option for the next wiki, and hopefully that'll suppress the fatal. [15:25:01] 10SRE, 10Thumbor, 10serviceops, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10JoKalliauer) [15:25:48] !log [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # T284450 [15:25:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:10] !log urbanecm@deploy1002 rebuilt and synchronized wikiversions files: Creating dagwiki (T284450) [15:26:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:30] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: Creating dagwiki (T284450) (duration: 01m 16s) [15:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:37] T284450: Create Wikipedia Dagbani - https://phabricator.wikimedia.org/T284450 [15:27:52] RECOVERY - Ensure local MW versions match expected deployment on mwmaint2002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [15:28:56] !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: Creating dagwiki (T284450) (duration: 01m 16s) [15:29:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:13] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki (T284450) (duration: 01m 14s) [15:30:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:53] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) (owner: 10Urbanecm) [15:31:10] (03PS3) 10Muehlenhoff: Don't show Kerberos ticket info in general [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) [15:31:47] !log urbanecm@deploy1002 Synchronized langlist: Creating dagwiki (T284450) (duration: 01m 12s) [15:31:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:16] (03Merged) 10jenkins-bot: Initial configuration for shiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702384 (https://phabricator.wikimedia.org/T284885) (owner: 10Urbanecm) [15:32:23] dagwiki is done [15:34:15] pulling shiwiki to mwmaint2002 [15:35:42] run `mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=muswiki --noedits shi wikipedia shiwiki shi.wikipedia.org`, no errors [15:36:05] db created in correct shard [15:36:54] wiki is live at mwdebug, syncing [15:38:18] !log urbanecm@deploy1002 Synchronized wmf-config/db-eqiad.php: Creating shiwiki (T284885) (duration: 01m 14s) [15:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:26] T284885: Create Wikipedia Tachelhit - https://phabricator.wikimedia.org/T284885 [15:40:21] !log urbanecm@deploy1002 Synchronized wmf-config/db-codfw.php: Creating shiwiki (T284885) (duration: 01m 14s) [15:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:53] afk for a bit [15:41:14] ack. I don't expect more errors at this point :) [15:41:39] !log urbanecm@deploy1002 Synchronized dblists: Creating shiwiki (T284885) (duration: 01m 14s) [15:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:14] !log urbanecm@deploy1002 rebuilt and synchronized wikiversions files: Creating shiwiki (T284885) [15:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:26] (03PS4) 10Muehlenhoff: Don't show Kerberos ticket info in general [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) [15:43:55] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [15:44:35] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: Creating shiwiki (T284885) (duration: 01m 15s) [15:44:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:42] T284885: Create Wikipedia Tachelhit - https://phabricator.wikimedia.org/T284885 [15:46:00] !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: Creating shiwiki (T284885) (duration: 01m 13s) [15:46:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:12] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install frdev1002 - https://phabricator.wikimedia.org/T282054 (10Cmjohnson) [15:46:45] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install frdev1002 - https://phabricator.wikimedia.org/T282054 (10Cmjohnson) @jgreen, networking is complete, feel free to do your install. If everything works ok, please close this task [15:47:27] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki (T284885) (duration: 01m 16s) [15:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:39] Amir1: once you get back, can you +2 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/702409? :-) Related to banwikisource, but not going to backport it. [15:48:42] (just so it rides ASAP) [15:48:49] !log urbanecm@deploy1002 Synchronized langlist: Creating shiwiki (T284885) (duration: 01m 16s) [15:48:52] (or maybe ASAF, as soon as feasible) [15:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:03] (03PS2) 10Urbanecm: Initial configuration for banwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702410 (https://phabricator.wikimedia.org/T284389) [15:49:07] (03Abandoned) 10Ahmon Dancy: update train-versions.json (2) [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692721 (owner: 10Ahmon Dancy) [15:49:11] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for banwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702410 (https://phabricator.wikimedia.org/T284389) (owner: 10Urbanecm) [15:49:16] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10Ottomata) @herron, so we should do step 1 and then help Ben to step 2? [15:50:04] RECOVERY - SSH on mw1303.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:50:55] urbanecm I'm not Amir but I +2'ed it [15:51:01] thanks [15:51:21] last but not least, let's ban wikisource [15:51:36] 👍 [15:51:56] (03Merged) 10jenkins-bot: Initial configuration for banwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702410 (https://phabricator.wikimedia.org/T284389) (owner: 10Urbanecm) [15:51:59] (03CR) 10Ottomata: Don't show Kerberos ticket info in general (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [15:53:08] pulling to mwmaint2002 [15:53:36] mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=muswiki --noedits ban wikisource banwikisource ban.wikisource.org finished successfuly [15:53:45] but for shiwiki don't you still need to create the two onwiki pages (main page and MediaWiki:Sitesupport-url) ? Or you do want me to do that so you can focus on this one [15:54:03] i'll do it after i start the syncs [15:54:10] or you can if you wish [15:54:20] banwikisource created at correct shard [15:54:30] I'll let you do it [15:55:29] banwikisource works on mwdebug [15:55:30] syncing [15:57:00] !log urbanecm@deploy1002 Synchronized wmf-config/db-eqiad.php: Creating banwikisource (T284389) (duration: 01m 13s) [15:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:09] T284389: Create Wikisource Balinese - https://phabricator.wikimedia.org/T284389 [15:58:20] !log urbanecm@deploy1002 Synchronized wmf-config/db-codfw.php: Creating banwikisource (T284389) (duration: 01m 14s) [15:58:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:41] (03PS1) 10SBassett: admin: replace existing ssh key for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/702417 (https://phabricator.wikimedia.org/T285877) [16:00:18] !log urbanecm@deploy1002 Synchronized dblists: Creating banwikisource (T284389) (duration: 01m 17s) [16:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:06] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10herron) >>! In T285754#7187636, @Ottomata wrote: > @herron, so we should do step 1 and then help Ben do step 2? I think so, I'm basing that on is the "wh... [16:02:05] !log urbanecm@deploy1002 rebuilt and synchronized wikiversions files: Creating banwikisource (T284389) [16:02:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:12] T284389: Create Wikisource Balinese - https://phabricator.wikimedia.org/T284389 [16:03:31] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: Creating banwikisource (T284389) (duration: 01m 17s) [16:03:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:54] !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: Creating banwikisource (T284389) (duration: 01m 16s) [16:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:22] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource (T284389) (duration: 01m 20s) [16:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:32] one last thing, the iw cache [16:06:45] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702418 [16:06:48] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702418 (owner: 10Urbanecm) [16:06:50] (03CR) 10Muehlenhoff: Don't show Kerberos ticket info in general (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701512 (https://phabricator.wikimedia.org/T244840) (owner: 10Muehlenhoff) [16:07:31] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702418 (owner: 10Urbanecm) [16:08:18] (03CR) 10Ema: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30051/console" [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [16:08:40] !log urbanecm@deploy1002 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s) [16:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:11] (03CR) 10Ema: [V: 03+1 C: 04-1] varnish: Add listen on UDS support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [16:10:53] !log urbanecm@deploy1002 update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s) [16:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:21] apparently i have to run the beta command at bea [16:12:50] (03PS1) 10Urbanecm: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702420 [16:13:07] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702420 (owner: 10Urbanecm) [16:13:30] (03PS9) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [16:14:13] (03Merged) 10jenkins-bot: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702420 (owner: 10Urbanecm) [16:14:24] okay, i'm done [16:14:27] cc Amir1 [16:14:59] (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30052/console" [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [16:16:50] (03CR) 10Elukey: "Hey folks, I think that we should add Ben to 'analytics-admins' as well (and later on to 'ops' but that requires probably an SRE meeting f" [puppet] - 10https://gerrit.wikimedia.org/r/702197 (https://phabricator.wikimedia.org/T285754) (owner: 10Herron) [16:18:16] (03PS1) 10Legoktm: Merge db-codfw.php and db-eqiad.php into db-production.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) [16:19:06] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10Ottomata) Ok, done step 1. @btullis you are in the ops LDAP group now. I believe this means you can create a patch to add your user to the "ops" group... [16:19:06] (03CR) 10Vgutierrez: [V: 03+1] varnish: Add listen on UDS support (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) (owner: 10Vgutierrez) [16:29:04] (03CR) 10Marostegui: [C: 04-1] "Let's not merge this until we are back in eqiad. We are going to move labswiki (wikitech) from s10 to s6. During a few days it will be in " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) (owner: 10Legoktm) [16:29:55] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics cluster for Ben Tullis - https://phabricator.wikimedia.org/T285754 (10Ottomata) [16:30:47] 10SRE: Request for more CPU and RAM for releases1002/2002 - https://phabricator.wikimedia.org/T284772 (10dancy) Most of the time releases1002 isn't doing much, just waiting for jobs to be triggered. One of the jobs (mediawiki-config-pipeline-wmf-publish) currently takes 16 minutes to complete. When this proce... [16:30:52] !log mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # T285866 [16:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:00] T285866: Move Tech/Server_switch_2020 to Tech/Server_switch at Metawiki - too many subpages - https://phabricator.wikimedia.org/T285866 [16:31:35] urbanecm: that didn't leave redirects behind :/ [16:31:40] 10SRE: Request for more CPU and RAM for releases1002/2002 - https://phabricator.wikimedia.org/T284772 (10dancy) [16:31:54] majavah: it never did, i always had to create them manually [16:32:24] having to do that for all the languages manually sounds not fun [16:33:26] tbh this is the normal translatable page move dialog https://usercontent.irccloud-cdn.com/file/nAyqMzlv/image.png [16:33:34] no option to suppress redirects at all [16:33:38] (03CR) 10Elukey: "What I said is wrong, Ben will end up in the ops group so no need for analytics-admins. Nevermind :)" [puppet] - 10https://gerrit.wikimedia.org/r/702197 (https://phabricator.wikimedia.org/T285754) (owner: 10Herron) [16:33:46] as i said, i think this is what it normally does :) [16:34:20] (03CR) 10Legoktm: "> Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) (owner: 10Legoktm) [16:35:15] 10SRE, 10ops-eqiad, 10cloud-services-team (Hardware): labstore1007 crashed after storage controller errors--replace disk? - https://phabricator.wikimedia.org/T281045 (10Bstorm) 05Open→03Resolved Disk is OK now. I'm going to hope that was it! [16:45:55] (03PS1) 10Btullis: Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 [16:45:57] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/702424 (owner: 10Btullis) [16:46:41] (03PS1) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [16:47:53] (03PS2) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [16:48:40] (03CR) 10Btullis: "Adding myself to the ops group." [puppet] - 10https://gerrit.wikimedia.org/r/702424 (owner: 10Btullis) [16:49:02] (03PS3) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [16:49:53] (03PS2) 10Btullis: Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) [16:50:33] (03CR) 10jerkins-bot: [V: 04-1] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis) [16:50:58] 10SRE, 10Commons, 10MediaWiki-Uploading, 10Structured Data Engineering, and 3 others: Various errors when trying to upload large files (Could not acquire lock, Service Temporarily Unavailable, 503 Backend fetch failed, 502 Next Hop Connection Failed) - https://phabricator.wikimedia.org/T280926 (10CBogen) [16:52:44] (03PS4) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [16:53:06] (03CR) 10ArielGlenn: [C: 03+1] "Looks good, merge at will or I'll get it tomorrow morning." [puppet] - 10https://gerrit.wikimedia.org/r/702413 (https://phabricator.wikimedia.org/T273266) (owner: 10Matthias Mullie) [16:53:41] (03CR) 10Kormat: [C: 03+1] "Let's try and see what happens :)" [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [16:54:00] (03PS5) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [16:54:05] (03CR) 10Krinkle: "(ok, done fiddling now)" [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [16:56:34] (03PS3) 10Zabe: Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis) [16:56:36] (03CR) 10Legoktm: mediawiki: Run purgeParserCache.php in parallel for each shard (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [16:57:40] (03PS1) 10Majavah: paws::haproxy: Allow specifying the port for ingress nodes [puppet] - 10https://gerrit.wikimedia.org/r/702426 (https://phabricator.wikimedia.org/T264221) [16:58:37] (03PS4) 10Btullis: Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) [16:58:38] legoktm: are you sure that's needed for this one, or is it out of caution that you'd recommmend it? [16:58:46] (03CR) 10Razzi: [C: 03+1] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis) [16:58:48] in my experience resources that are wholly puppet-managed tend to have some kind of temporary folder or in-memory list of what they will ensure and then remove anything that shouldn't be there in those areas. [16:58:53] I didn't check for this one though. [16:59:12] (03CR) 10Ottomata: [C: 03+1] Add btullis to the ops security group [puppet] - 10https://gerrit.wikimedia.org/r/702424 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis) [17:01:40] Krinkle: I'm 90% sure, mostly because profile::mediawiki::periodic_job has a default parameter of ensure => present [17:02:58] ok. I've not seen a link between those two aspects. e.g. outside wmf, using puppet for nginx configs, I find they support ensure=>absent to let you disable code, but they also remove anything that wouldn't be tracked as well. [17:03:02] the two are not mutually exclusive [17:03:12] I don't see any kind of scaning or staging in our systemd module though [17:03:16] I'll absent it for now then [17:05:02] thx! [17:05:05] (03PS6) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) [17:05:07] (03PS1) 10Krinkle: mediawiki: Remove old 'parser_cache_purging' job [puppet] - 10https://gerrit.wikimedia.org/r/702427 [17:05:15] (03CR) 10Krinkle: mediawiki: Run purgeParserCache.php in parallel for each shard (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [17:07:28] (03CR) 10Bstorm: [C: 03+2] paws::haproxy: Allow specifying the port for ingress nodes [puppet] - 10https://gerrit.wikimedia.org/r/702426 (https://phabricator.wikimedia.org/T264221) (owner: 10Majavah) [17:08:32] (03PS1) 10Majavah: paws::haproxy: Use correct port [puppet] - 10https://gerrit.wikimedia.org/r/702428 [17:10:38] (03CR) 10Bstorm: "Oops, double checking, I see it now, too. Good catch." [puppet] - 10https://gerrit.wikimedia.org/r/702428 (owner: 10Majavah) [17:10:45] (03CR) 10Bstorm: [C: 03+2] paws::haproxy: Use correct port [puppet] - 10https://gerrit.wikimedia.org/r/702428 (owner: 10Majavah) [17:16:57] !log imported jenkins 2.289.2 to thirdparty/ci T285532 [17:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:30] (03CR) 10Kormat: [C: 03+2] mediawiki: Run purgeParserCache.php in parallel for each shard [puppet] - 10https://gerrit.wikimedia.org/r/702425 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [17:18:36] (03CR) 10Marostegui: [C: 04-1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) (owner: 10Legoktm) [17:24:42] (03CR) 10Krinkle: "We have a number of overrides like that already based on wmfDatacenter, which can be as simple as:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702421 (https://phabricator.wikimedia.org/T260297) (owner: 10Legoktm) [17:47:34] (03PS1) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) [17:47:54] (03CR) 10jerkins-bot: [V: 04-1] Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:48:43] (03PS2) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) [17:49:00] (03CR) 10jerkins-bot: [V: 04-1] Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:49:36] (03PS3) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) [17:50:52] (03PS4) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) [17:51:00] (03CR) 10jerkins-bot: [V: 04-1] Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:52:00] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30056/console" [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:53:05] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30057/console" [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:54:01] !log restart releases-jenkins following upgrade [17:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:16] (03PS5) 10Ottomata: Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) [17:57:24] (03CR) 10jerkins-bot: [V: 04-1] Add gobblin_job define and declare first gobblin job in hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [17:57:32] !log restart ci jenkins following upgrade [17:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:56] (03PS1) 10Jdlrobson: Use Vue.js for QuickSurveys on available wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702434 (https://phabricator.wikimedia.org/T285890) [17:58:27] (03PS1) 10Bstorm: cloud kubeadm: quote etcd parameters [puppet] - 10https://gerrit.wikimedia.org/r/702435 [17:59:41] (03CR) 10Bstorm: "This is already updated in the live configmap. This just fixes things in case we rebuilt the cluster from scratch." [puppet] - 10https://gerrit.wikimedia.org/r/702435 (owner: 10Bstorm) [17:59:51] (03CR) 10Bstorm: [C: 03+2] cloud kubeadm: quote etcd parameters [puppet] - 10https://gerrit.wikimedia.org/r/702435 (owner: 10Bstorm) [18:00:05] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:05] brennen and marxarelli: It is that lovely time of the day again! You are hereby commanded to deploy Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1800). [18:01:42] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1002/30059/an-test-coord1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/702430 (https://phabricator.wikimedia.org/T271232) (owner: 10Ottomata) [18:04:14] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install frdev1002 - https://phabricator.wikimedia.org/T282054 (10Jgreen) [18:08:58] (03PS1) 10Jgreen: add A/PTR records for frdev1002 [dns] - 10https://gerrit.wikimedia.org/r/702437 (https://phabricator.wikimedia.org/T285892) [18:10:45] (03CR) 10Jgreen: [C: 03+2] add A/PTR records for frdev1002 [dns] - 10https://gerrit.wikimedia.org/r/702437 (https://phabricator.wikimedia.org/T285892) (owner: 10Jgreen) [18:11:01] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [18:12:30] (03PS1) 10Cathal Mooney: Modified version of LibreNMS Prometheus.php to add prefix [software/librenms] - 10https://gerrit.wikimedia.org/r/702438 (https://phabricator.wikimedia.org/T229542) [18:12:38] !log authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet [18:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:09] (03PS4) 10Ottomata: jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey) [18:19:13] (03CR) 10Krinkle: "This fixed T190455." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701995 (owner: 10Tim Starling) [18:19:47] (03CR) 10Ottomata: [C: 03+2] jupyter: simplify the cron script to clean up user Trash [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey) [18:24:52] (03CR) 10Ottomata: [C: 03+2] eventlogging: remove mariadb profile and create log dir [puppet] - 10https://gerrit.wikimedia.org/r/683831 (https://phabricator.wikimedia.org/T280679) (owner: 10Hnowlan) [18:25:55] 10SRE, 10Traffic, 10MW-1.35-notes (1.35.0-wmf.40; 2020-07-07), 10Patch-For-Review, and 4 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10Krinkle) [18:33:50] (03PS2) 10Herron: add tchin to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/702224 (https://phabricator.wikimedia.org/T285326) [18:34:48] (03CR) 10Herron: [C: 03+2] add tchin to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/702224 (https://phabricator.wikimedia.org/T285326) (owner: 10Herron) [18:38:29] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for TChin - https://phabricator.wikimedia.org/T285326 (10herron) 05Open→03Resolved Hi @tchin, your ldap account is now a member of the wmf group. I'll transition to resolved now but please don't hesitate to reopen if any follow-... [18:38:30] (03Abandoned) 10Ottomata: an-test - Declare camus jobs based on a new stream setting instead of destination_event_service [puppet] - 10https://gerrit.wikimedia.org/r/668125 (https://phabricator.wikimedia.org/T273901) (owner: 10Ottomata) [18:39:08] (03CR) 10Ottomata: [C: 03+2] service_auto_restart - match full line when ensuring absent [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata) [18:39:31] (03CR) 10Ottomata: [C: 03+1] service_auto_restart - match full line when ensuring absent [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata) [18:39:52] (03PS2) 10Ottomata: service_auto_restart - match full line when ensuring absent [puppet] - 10https://gerrit.wikimedia.org/r/697605 [18:41:21] (03CR) 10Ottomata: [V: 03+1 C: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/30060/console" [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata) [18:41:37] (03CR) 10Ottomata: [V: 03+1 C: 03+1] "I'd like to merge this but am a big afraid it'll do some nasty stuff if it doesn't work right, like restarting all services everywhere." [puppet] - 10https://gerrit.wikimedia.org/r/697605 (owner: 10Ottomata) [18:42:31] (03PS1) 10Herron: add fgoodwin (uid=frankie) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/702439 (https://phabricator.wikimedia.org/T285580) [18:42:33] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install frdev1002 - https://phabricator.wikimedia.org/T282054 (10Jgreen) 05Open→03Resolved @Cmjohnson so far it looks good, I was able to log in, reset the drac password, and tweak BIOS settings [18:53:35] !log adding urbanecm as admin of newprojects mailing list [18:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:09] (03CR) 10Cwhite: [C: 03+2] logstash: transition aqs logs to ECS [puppet] - 10https://gerrit.wikimedia.org/r/701617 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [18:54:46] !log legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service [18:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:40] !log legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers [18:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:05] brennen and marxarelli: I, the Bot under the Fountain, allow thee, The Deployer, to do MediaWiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1900). [19:02:13] rollin' [19:03:09] (03PS1) 10Brennen Bearnes: group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702443 [19:03:11] (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702443 (owner: 10Brennen Bearnes) [19:04:03] (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702443 (owner: 10Brennen Bearnes) [19:05:27] !log brennen@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12 [19:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:35] !log brennen@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s) [19:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:39] (03PS6) 10Cwhite: logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) [19:19:08] (03CR) 10Cwhite: [C: 03+2] logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [19:26:03] (03PS2) 10SBassett: admin: replace existing ssh key for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/702417 (https://phabricator.wikimedia.org/T285877) [19:27:20] (03PS1) 10Cathal Mooney: Added optional ability to enable uRPF filtering on arbitary CR ints [homer/public] - 10https://gerrit.wikimedia.org/r/702446 (https://phabricator.wikimedia.org/T285461) [19:28:03] (03CR) 10jerkins-bot: [V: 04-1] Added optional ability to enable uRPF filtering on arbitary CR ints [homer/public] - 10https://gerrit.wikimedia.org/r/702446 (https://phabricator.wikimedia.org/T285461) (owner: 10Cathal Mooney) [19:39:21] (03CR) 10Herron: [C: 03+2] admin: replace existing ssh key for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/702417 (https://phabricator.wikimedia.org/T285877) (owner: 10SBassett) [19:49:19] (03PS1) 10Eevans: Create an aqs-roots group, analogous to restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) [19:49:49] (03CR) 10jerkins-bot: [V: 04-1] Create an aqs-roots group, analogous to restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) (owner: 10Eevans) [19:50:45] (03PS2) 10Cathal Mooney: Added optional ability to enable uRPF filtering on arbitary CR ints [homer/public] - 10https://gerrit.wikimedia.org/r/702446 (https://phabricator.wikimedia.org/T285461) [19:54:52] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti102[34] - https://phabricator.wikimedia.org/T283036 (10Jclark-ctr) [19:55:03] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti102[34] - https://phabricator.wikimedia.org/T283036 (10Jclark-ctr) Received added to netbox [19:55:23] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [19:55:25] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 221, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [19:56:17] (03PS2) 10Eevans: Create an aqs-roots group, analogous to restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) [19:56:52] (03CR) 10jerkins-bot: [V: 04-1] Create an aqs-roots group, analogous to restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) (owner: 10Eevans) [19:57:29] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [19:58:33] (03PS3) 10Eevans: Create an aqs-roots group, analogous to restbase-roots [puppet] - 10https://gerrit.wikimedia.org/r/702452 (https://phabricator.wikimedia.org/T285899) [19:59:24] (03PS1) 10Arlolra: Add Parsoid to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 [20:00:05] brennen and marxarelli: Your horoscope predicts another unfortunate MediaWiki train - American Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T1900). [20:00:05] chrisalbon and accraze: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T2000). [20:00:47] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:01:56] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:09:10] (03CR) 10Subramanya Sastry: Add Parsoid to wmgMonologChannels (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [20:10:03] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:12:39] (03CR) 10Arlolra: Add Parsoid to wmgMonologChannels (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [20:13:01] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:15:08] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephosd102[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T284471 (10Jclark-ctr) [20:15:20] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephosd102[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T284471 (10Jclark-ctr) Received added to netbox [20:25:28] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:28:50] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:30:00] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install pc1011-pc1014 - https://phabricator.wikimedia.org/T282484 (10Jclark-ctr) [20:30:15] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install pc1011-pc1014 - https://phabricator.wikimedia.org/T282484 (10Jclark-ctr) Received host added to netbox [20:30:17] (03Abandoned) 10Razzi: Add dbstore1006 to analytics vlan [homer/public] - 10https://gerrit.wikimedia.org/r/694002 (https://phabricator.wikimedia.org/T283125) (owner: 10Razzi) [20:32:36] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:33:54] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:38:03] (03CR) 10Arlolra: Add Parsoid to wmgMonologChannels (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [20:43:24] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:45:04] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [20:47:46] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:51:32] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:57:17] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:00:00] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:00:18] PROBLEM - NFS Share Volume Space /srv/scratch on cloudstore1008 is CRITICAL: DISK CRITICAL - free space: /srv/scratch 607353 MB (15% inode=99%): https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage%23NFS_volume_cleanup https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1 [21:01:58] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:03:02] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:05:53] (03CR) 10Subramanya Sastry: Add Parsoid to wmgMonologChannels (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [21:09:18] (03CR) 10Arlolra: Add Parsoid to wmgMonologChannels (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [21:16:32] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:18:26] PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes [21:22:18] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:23:17] !log start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394) [21:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:30] T284459: Add Wikidata support for dagwiki - https://phabricator.wikimedia.org/T284459 [21:23:31] T284931: Add Wikidata support for shiwiki - https://phabricator.wikimedia.org/T284931 [21:23:31] T284394: Add Wikidata support for banwikisource - https://phabricator.wikimedia.org/T284394 [21:40:07] !log end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394) [21:40:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:18] T284459: Add Wikidata support for dagwiki - https://phabricator.wikimedia.org/T284459 [21:40:19] T284931: Add Wikidata support for shiwiki - https://phabricator.wikimedia.org/T284931 [21:40:19] T284394: Add Wikidata support for banwikisource - https://phabricator.wikimedia.org/T284394 [21:40:26] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:42:24] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [21:43:46] !log deleting auto-review logs from test2wiki (T285608) [21:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:54] T285608: Stop logging and clean up auto review logs - https://phabricator.wikimedia.org/T285608 [21:49:14] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:51:10] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [21:55:08] (03CR) 10Volans: "Looks mostly good to me, just a couple of nits inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/701498 (owner: 10Jbond) [21:57:44] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:01:36] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:06:24] (03CR) 10Cwhite: "Thanks, all!" [puppet] - 10https://gerrit.wikimedia.org/r/702229 (owner: 10Cwhite) [22:06:47] (Primary outbound port utilisation over 80% #page) firing: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [22:06:47] (Primary outbound port utilisation over 80% #page) firing: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [22:07:22] o/ [22:07:26] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:11:16] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:26:44] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:28:40] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [22:38:09] (03PS1) 10Dave Pifke: Fix NavtimingStaleBeacon false alarms, add test [alerts] - 10https://gerrit.wikimedia.org/r/702477 [22:38:55] (03CR) 10jerkins-bot: [V: 04-1] Fix NavtimingStaleBeacon false alarms, add test [alerts] - 10https://gerrit.wikimedia.org/r/702477 (owner: 10Dave Pifke) [22:41:09] (03PS2) 10Dave Pifke: Fix NavtimingStaleBeacon false alarms, add test [alerts] - 10https://gerrit.wikimedia.org/r/702477 [22:54:19] (03PS1) 10Filippo Giunchedi: hiera: activate public_clouds rate limit for varnish upload [puppet] - 10https://gerrit.wikimedia.org/r/702480 [22:58:20] (03PS1) 10Cwhite: varnish: increase ratelimiting of non-compliant UAs [puppet] - 10https://gerrit.wikimedia.org/r/702481 [22:58:39] (03PS2) 10Cwhite: varnish: increase ratelimiting of non-compliant UAs [puppet] - 10https://gerrit.wikimedia.org/r/702481 [22:59:14] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:59:50] (03PS3) 10Cwhite: varnish: increase ratelimiting of non-compliant UAs [puppet] - 10https://gerrit.wikimedia.org/r/702481 [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210630T2300). [23:00:05] Jdlrobson and arlolra: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:12] (03CR) 10BBlack: [C: 03+1] varnish: increase ratelimiting of non-compliant UAs [puppet] - 10https://gerrit.wikimedia.org/r/702481 (owner: 10Cwhite) [23:00:19] I can deploy today [23:00:24] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:00:33] Jdlrobson: arlolra: hello, are you around? [23:00:54] (03CR) 10Cwhite: [C: 03+2] varnish: increase ratelimiting of non-compliant UAs [puppet] - 10https://gerrit.wikimedia.org/r/702481 (owner: 10Cwhite) [23:01:00] yup [23:01:29] urbanecm: although, I don't exactly have a requisite +1 on that patch [23:01:44] are you familiar with what it's chaning? [23:01:47] changing [23:02:01] it's not a hard requirement though -- and your patch looks like a good one [23:02:16] 10SRE, 10Traffic: Image load failing with 429 from varnish - https://phabricator.wikimedia.org/T285875 (10Aklapper) Interesting - the image URL is different here and works: https://upload.wikimedia.org/wikipedia/commons/a/a1/Young_Folks%27_History_of_Rome_illus090.png I have no good idea where a URL like http... [23:02:19] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:02:30] urbanecm: ok, thanks [23:03:13] arlolra: how verbose is Parsoid under the info level? AFAICS parsoid is a heavily used service, and I'm wondering if info can be _too_ verbose [23:03:44] I don't think there are many places where we log at that level [23:03:54] okay [23:03:58] (03CR) 10Urbanecm: [C: 03+2] Add Parsoid to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [23:04:41] (03Merged) 10jenkins-bot: Add Parsoid to wmgMonologChannels [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702454 (owner: 10Arlolra) [23:05:36] I'm going to sync this one, as it can't be reasonably tested [23:06:06] Jdlrobson: hi, around? [23:06:10] ok, I'll try to keep an eye on the logs [23:06:16] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:06:23] that'd be appreciated arlolra [23:06:47] (Primary outbound port utilisation over 80% #page) resolved: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [23:06:47] (Primary outbound port utilisation over 80% #page) resolved: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [23:07:10] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 8e719d54baa4c26aaa090e02503b0d9473301cce: Add Parsoid to wmgMonologChannels (duration: 01m 07s) [23:07:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:08:28] arlolra: so far, 3734 entries, mostly `[info] Wikitext for this page has duplicate ids: {page_title}` [23:08:49] yes, that seems like a lot [23:09:47] it's growing a lot [23:10:10] maybe warning would have been a better place to start [23:10:19] do you want to revert that? [23:10:23] yeah, I'm reverting it [23:10:27] ok, thanks [23:10:44] do you have time to try again at another level? or leave it for another day [23:10:48] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:11:21] arlolra: we can try warning, sure. [23:11:23] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: REVERT: 8e719d54baa4c26aaa090e02503b0d9473301cce: Add Parsoid to wmgMonologChannels (duration: 00m 38s) [23:11:26] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:44] (03PS1) 10Urbanecm: Revert "Add Parsoid to wmgMonologChannels" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702483 [23:11:46] (03CR) 10Urbanecm: [C: 03+2] Revert "Add Parsoid to wmgMonologChannels" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702483 (owner: 10Urbanecm) [23:11:51] thanks [23:12:10] arlolra: could you please upload a patch for that? [23:12:36] (03Merged) 10jenkins-bot: Revert "Add Parsoid to wmgMonologChannels" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702483 (owner: 10Urbanecm) [23:13:00] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 69, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:13:01] logging file stopped growing (it's now 2.7MB) [23:13:52] working on it [23:13:56] thanks [23:14:33] (03PS1) 10Arlolra: Change Parsoid log level to warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702484 [23:15:29] urbanecm: ^ [23:15:33] thanks [23:15:36] (03PS2) 10Urbanecm: Add Parsoid to wmgMonologChannels with warning level [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702484 (owner: 10Arlolra) [23:15:38] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:15:51] (03CR) 10Urbanecm: [C: 03+2] Add Parsoid to wmgMonologChannels with warning level [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702484 (owner: 10Arlolra) [23:16:36] (03Merged) 10jenkins-bot: Add Parsoid to wmgMonologChannels with warning level [mediawiki-config] - 10https://gerrit.wikimedia.org/r/702484 (owner: 10Arlolra) [23:16:36] arlolra: is there a way to reach parsoid from mwdebug servers? [23:16:50] (is/should/meant to be) [23:17:30] oh [23:18:06] i'm not sure [23:21:06] looking at the logging file from previous attempt, it has 15k rows total (over the few minutes patch was in-production) and 1116 warning rows (mostly from ptwiki, enwiki, ruwiki, dewiki). I'd call that reasonable amount of new rows. [23:21:33] Krinkle: given you question, do you want me wait with syncing the new patch? [23:22:41] No, go ahead. [23:22:49] ok, syncing [23:23:03] I was just wondering why you didn't find out until after syncing. [23:23:10] which I assume to be not staging on mwdebug [23:23:20] and then realized most of rest.php is limited to internal requests last I checked [23:23:30] so chances are it can't be tested on mwdebug. [23:23:41] yes, parsoid is only enabled on its cluster [23:25:10] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 667d88054097b195208818aee15bb1eb58955437: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s) [23:25:15] arlolra: live again [23:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:28] thanks [23:26:00] (03Abandoned) 10Filippo Giunchedi: hiera: activate public_clouds rate limit for varnish upload [puppet] - 10https://gerrit.wikimedia.org/r/702480 (owner: 10Filippo Giunchedi) [23:26:33] looks more reasonable now [23:26:37] yup [23:26:59] I'll continue to keep an eye on it for a bit, but thanks for the help [23:27:07] np [23:28:01] !log Evening B&C window finished [23:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:08] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:33:22] RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:41:29] Krinkle: you can't use mwdebug for testing Parsoid code, you just have to pick a wtp server and `scap pull` on that: https://phabricator.wikimedia.org/T279451#6982066 [23:42:52] legoktm: right, that's one way. [23:43:00] That's what I use for mwmaint1002 testing [23:43:03] (e.g. for noc) [23:46:17] urbanecm: apologies for missing the window today [23:46:33] I got sucked into something else [23:46:46] No problem :) [23:47:04] ACKNOWLEDGEMENT - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 221, down: 1, dormant: 0, excluded: 0, unused: 0: Cathal Mooney Lumen, BDFS2448 on the blink again. https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:47:04] ACKNOWLEDGEMENT - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces Cathal Mooney Lumen, BDFS2448 on the blink again. https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:47:36] have moved to tomorrow morning :) [23:47:44] Looks good! [23:58:08] PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [23:59:12] ok, those first few minutes still seem representative. I'm going to step away for a bit