[00:37:07] 10SRE, 10ops-eqdfw: cr2-eqdfw: PEM 1 Input Voltage Out Of Range flapping - https://phabricator.wikimedia.org/T294009 (10Papaul) re-seat of PEM1 by remote hands didn't clear the alarm. I will get back with Juniper to request a RMA. [01:13:24] RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.40 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [01:35:04] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10aaron) Here are some stats from a custom script on mwdebug (per key "collection"). ` { "overall": { "num_slots": 4099,... [02:07:41] (03CR) 10Gergő Tisza: [C: 03+1] growthexperiments.pp: Run purgeExpiredMentorStatus.php twice a month [puppet] - 10https://gerrit.wikimedia.org/r/734568 (https://phabricator.wikimedia.org/T280307) (owner: 10Urbanecm) [02:09:03] (03PS1) 10Dzahn: mediawiki: add parameter to allow absenting fonts, remove them from appservers [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [02:09:34] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: add parameter to allow absenting fonts, remove them from appservers [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:13:15] (03PS2) 10Dzahn: mediawiki: add parameter to allow absenting fonts, remove them from appservers [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [02:14:59] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/31916/" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:15:54] (03CR) 10Dzahn: "thumbor::mediawiki role has its own "include ::mediawiki::packages::fonts" and left a default value of present" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:16:27] (03CR) 10AntiCompositeNumber: "Should this be applied to modules/profile/manifests/openstack/base/wikitech/web.pp too?" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:18:01] (03CR) 10Dzahn: "We don't usually use ensure => absent with ensure_packages but this is a use case for it." [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:19:44] (03CR) 10Dzahn: "AntiCompositeNumber: good question, probably yes" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [02:21:38] 10SRE, 10ops-codfw, 10DC-Ops, 10Wikidata, and 2 others: Q2:(Need By: End of Q2) rack/setup/install wdqs20[09,10,11] - https://phabricator.wikimedia.org/T294297 (10Papaul) [02:22:08] (03PS3) 10Dzahn: mediawiki: add parameter to allow absenting fonts, remove them from appservers [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [02:22:48] (03PS4) 10Dzahn: mediawiki: add parameter to allow absenting fonts, remove them from appservers [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [02:23:04] 10SRE, 10ops-codfw, 10DC-Ops, 10Wikidata, and 2 others: Q2:(Need By: End of Q2) rack/setup/install wdqs20[09,10,11] - https://phabricator.wikimedia.org/T294297 (10Papaul) [02:27:18] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search, 10Elasticsearch: Q2:(Need By: TBD) rack/setup/install elastic20[61-72] - https://phabricator.wikimedia.org/T294154 (10Papaul) Is it possible here to have 2 hosts in the same rack? [04:06:10] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:07:56] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:39:50] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:41:32] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 233, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:12:37] (03PS1) 10Marostegui: report_users: Change destination email [software] - 10https://gerrit.wikimedia.org/r/734800 [05:13:20] (03CR) 10Marostegui: [C: 03+2] report_users: Change destination email [software] - 10https://gerrit.wikimedia.org/r/734800 (owner: 10Marostegui) [05:13:51] (03Merged) 10jenkins-bot: report_users: Change destination email [software] - 10https://gerrit.wikimedia.org/r/734800 (owner: 10Marostegui) [05:31:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove logpager replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17611 and previous config saved to /var/cache/conftool/dbconfig/20211027-053104-marostegui.json [05:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:13] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [05:41:14] PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [05:43:20] RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [05:47:19] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki: Remove libvips-tools [puppet] - 10https://gerrit.wikimedia.org/r/732387 (https://phabricator.wikimedia.org/T290802) (owner: 10Legoktm) [05:48:41] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "While it's not from the HHVM early days (we stopped shelling out to tidy years into the HHVM experiment), it is definitely not used anymor" [puppet] - 10https://gerrit.wikimedia.org/r/732386 (owner: 10Legoktm) [05:49:05] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Add php 7.4 on buster images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/732642 (https://phabricator.wikimedia.org/T293996) (owner: 10Giuseppe Lavagetto) [06:06:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove recentchanges and recentchangeslinked replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17612 and previous config saved to /var/cache/conftool/dbconfig/20211027-060634-marostegui.json [06:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:42] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [06:17:14] (03PS1) 10Marostegui: wmnet: Replace m5-master with dbproxy1017 [dns] - 10https://gerrit.wikimedia.org/r/734806 (https://phabricator.wikimedia.org/T288093) [06:17:55] (03CR) 10Marostegui: [C: 04-2] "Wait for 14:00 UTC" [dns] - 10https://gerrit.wikimedia.org/r/734806 (https://phabricator.wikimedia.org/T288093) (owner: 10Marostegui) [06:23:54] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall LGTM; a Depends: field is needed in the base image control file." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/732722 (https://phabricator.wikimedia.org/T294034) (owner: 10Ahmon Dancy) [06:49:49] 10SRE, 10SRE-Access-Requests: (WIP) Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10RKemper) [06:58:10] (03PS4) 10ArielGlenn: add the Wikimedia Enterprise content downloader script [puppet] - 10https://gerrit.wikimedia.org/r/734622 (https://phabricator.wikimedia.org/T273585) [07:03:38] (03CR) 10ArielGlenn: add the Wikimedia Enterprise content downloader script (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/734622 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [07:05:13] 10SRE, 10DBA, 10cloud-services-team (Kanban): db1112 (s3 contribs/rc replica) is down - https://phabricator.wikimedia.org/T294295 (10Marostegui) More recent errors from the B1 module: ` Oct 26 03:19:30 db1112 kernel: [29473.766843] mce: [Hardware Error]: Machine check events logged Oct 26 03:35:38 db1112 ker... [07:05:34] (03PS1) 10Ryan Kemper: mstyles no longer needs search access [puppet] - 10https://gerrit.wikimedia.org/r/734886 [07:05:36] (03PS1) 10Ryan Kemper: [WIP] Add ejoseph [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) [07:06:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add ejoseph [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [07:07:15] (03CR) 10Ryan Kemper: "I spoke to maryum and she confirmed she's okay w/ us removing access" [puppet] - 10https://gerrit.wikimedia.org/r/734886 (owner: 10Ryan Kemper) [07:10:24] (03CR) 10Alexandros Kosiaris: "Is there any way we can stage this a bit? e.g. do canaries first? I 'd like us to limit the blast radius if it turns out that we overlooke" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [07:16:05] 10SRE, 10Traffic, 10observability, 10Discovery-Search (Current work): flapping icinga Letsencrypt TLS cert alerts around renewal time - https://phabricator.wikimedia.org/T293826 (10elukey) @Dzahn do you know if the list[12]* nodes are scheduled to be upgraded to Bullseye during the next few weeks? It would... [07:17:38] 10SRE, 10DBA, 10cloud-services-team (Kanban): db1112 (s3 contribs/rc replica) is down - https://phabricator.wikimedia.org/T294295 (10Marostegui) I would say we shouldn't repool this host until the DIMM is changed, given the above errors happening again. [07:25:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove watchlist replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17613 and previous config saved to /var/cache/conftool/dbconfig/20211027-072546-marostegui.json [07:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:02] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [07:45:57] (03PS1) 10Ema: varnishxcache.mtail: avoid unnecessary filtering [puppet] - 10https://gerrit.wikimedia.org/r/734893 (https://phabricator.wikimedia.org/T293879) [07:49:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Contributions replicas from s6.codfw T263127', diff saved to https://phabricator.wikimedia.org/P17614 and previous config saved to /var/cache/conftool/dbconfig/20211027-074935-marostegui.json [07:49:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:43] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [07:49:58] (03PS2) 10Ema: varnishxcache.mtail: avoid unnecessary filtering [puppet] - 10https://gerrit.wikimedia.org/r/734893 (https://phabricator.wikimedia.org/T293879) [07:54:02] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: simplify custom prometheus.cfg [puppet] - 10https://gerrit.wikimedia.org/r/734564 (https://phabricator.wikimedia.org/T294302) (owner: 10Filippo Giunchedi) [08:02:48] (03CR) 10Filippo Giunchedi: [C: 03+1] varnishxcache.mtail: avoid unnecessary filtering [puppet] - 10https://gerrit.wikimedia.org/r/734893 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [08:05:35] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Couple of inlines comments. The premise LGTM." [deployment-charts] - 10https://gerrit.wikimedia.org/r/734692 (https://phabricator.wikimedia.org/T288851) (owner: 10Giuseppe Lavagetto) [08:10:09] (03CR) 10Ema: [C: 03+2] varnishxcache.mtail: avoid unnecessary filtering [puppet] - 10https://gerrit.wikimedia.org/r/734893 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [08:19:01] (03PS1) 10Urbanecm: Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734751 (https://phabricator.wikimedia.org/T293434) [08:19:23] (03PS1) 10Urbanecm: Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734752 (https://phabricator.wikimedia.org/T293434) [08:24:26] 10SRE, 10Observability-Logging, 10Traffic, 10Patch-For-Review, 10User-ema: varnishmtail metric loss due to mtail not reading from pipe fast enough - https://phabricator.wikimedia.org/T293879 (10ema) >>! In T293879#7460663, @gerritbot wrote: > Change 734893 **merged** by Ema: > %%%[operations/puppet@produ... [08:41:07] (03CR) 10Kormat: [C: 03+1] wmnet: Replace m5-master with dbproxy1017 [dns] - 10https://gerrit.wikimedia.org/r/734806 (https://phabricator.wikimedia.org/T288093) (owner: 10Marostegui) [08:43:27] (03PS2) 10Giuseppe Lavagetto: mediawiki: add handling of php-fpm logs via rsyslogd [deployment-charts] - 10https://gerrit.wikimedia.org/r/734692 (https://phabricator.wikimedia.org/T288851) [08:43:29] (03PS1) 10Giuseppe Lavagetto: mediawiki: add the ability to turn off display_errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/734895 [08:45:23] (03CR) 10Alexandros Kosiaris: [C: 03+1] mediawiki: add the ability to turn off display_errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/734895 (owner: 10Giuseppe Lavagetto) [08:50:15] !log Enabling Telxius circuit from cr1-eqiad to asw1-b12-drmrs with homer. [08:50:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:24] RECOVERY - OSPF status on cr3-esams is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [08:53:46] (03PS1) 10Volans: network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) [08:54:19] (03PS4) 10RhinosF1: Add echetty to product-users and ssh access [puppet] - 10https://gerrit.wikimedia.org/r/733916 (https://phabricator.wikimedia.org/T294229) [08:54:41] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: add the ability to turn off display_errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/734895 (owner: 10Giuseppe Lavagetto) [08:57:56] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/6 UP : OSPFv3: 5/5 UP : 6 v2 P2P interfaces vs. 5 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [08:58:06] (03CR) 10Ayounsi: network: add drmrs prefixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [08:59:10] (03Merged) 10jenkins-bot: mediawiki: add the ability to turn off display_errors [deployment-charts] - 10https://gerrit.wikimedia.org/r/734895 (owner: 10Giuseppe Lavagetto) [09:00:49] (03PS2) 10Volans: network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) [09:02:47] (03PS3) 10Volans: network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) [09:03:01] (03CR) 10Volans: "addressed comments" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [09:04:08] !log oblivian@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [09:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:19] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10hashar) In my experience it is better done during low CI traffic, start of morning in Dallas will work just fine. We would then send a... [09:09:00] (03CR) 10Filippo Giunchedi: [C: 03+1] add stack.head field for aggregating events by stack head [software/ecs] - 10https://gerrit.wikimedia.org/r/734698 (https://phabricator.wikimedia.org/T288851) (owner: 10Cwhite) [09:09:48] !log oblivian@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [09:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:03] (03PS1) 10David Caro: p:openstack::cinder::backup: add cinder user option [puppet] - 10https://gerrit.wikimedia.org/r/734902 [09:12:54] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [09:12:55] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31920/console" [puppet] - 10https://gerrit.wikimedia.org/r/734902 (owner: 10David Caro) [09:14:54] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [09:16:20] (03CR) 10Cathal Mooney: [C: 03+1] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [09:19:24] (03CR) 10Arturo Borrero Gonzalez: p:openstack::cinder::backup: add cinder user option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734902 (owner: 10David Caro) [09:20:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove watchlist replicas from s6 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17615 and previous config saved to /var/cache/conftool/dbconfig/20211027-092043-marostegui.json [09:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:50] T263127: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 [09:21:13] TI have made some changes on our databases that serve watchlist on frwiki, jawiki, ruwiki and wikitech, if you notice something (errors, or timeouts etc) please ping me [09:22:44] (03PS2) 10David Caro: p:openstack::cinder::backup: add cinder user option [puppet] - 10https://gerrit.wikimedia.org/r/734902 [09:22:48] (03CR) 10David Caro: p:openstack::cinder::backup: add cinder user option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734902 (owner: 10David Caro) [09:23:49] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31921/console" [puppet] - 10https://gerrit.wikimedia.org/r/734902 (owner: 10David Caro) [09:25:11] (03CR) 10David Caro: [V: 03+1 C: 03+2] p:openstack::cinder::backup: add cinder user option [puppet] - 10https://gerrit.wikimedia.org/r/734902 (owner: 10David Caro) [09:25:16] !log another run of backfill on graphite1004 - T294355 [09:25:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:23] T294355: Several Wikidata Grafana boards missing data before October 2021 - https://phabricator.wikimedia.org/T294355 [09:33:20] (03PS1) 10Giuseppe Lavagetto: mediawiki: add further error handling [deployment-charts] - 10https://gerrit.wikimedia.org/r/734904 [09:33:46] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [09:34:33] (03CR) 10Ayounsi: [C: 03+1] network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [09:34:54] 10SRE, 10serviceops, 10Developer Productivity, 10Performance-Team (Radar), and 2 others: Debug hosts sometimes Fatal error: "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10hashar) Spotted this on labweb1002 / lab1001 today, all messages referred to `127.0... [09:35:26] jouncebot: nowandnext [09:35:26] No deployments scheduled for the next 1 hour(s) and 24 minute(s) [09:35:26] In 1 hour(s) and 24 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1100) [09:35:32] (03CR) 10Urbanecm: [C: 03+2] Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734752 (https://phabricator.wikimedia.org/T293434) (owner: 10Urbanecm) [09:35:36] (03CR) 10Urbanecm: [C: 03+2] Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734751 (https://phabricator.wikimedia.org/T293434) (owner: 10Urbanecm) [09:36:02] (03PS1) 10Urbanecm: MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734765 (https://phabricator.wikimedia.org/T294386) [09:36:07] (03CR) 10Urbanecm: [C: 03+2] MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734765 (https://phabricator.wikimedia.org/T294386) (owner: 10Urbanecm) [09:36:16] (03PS1) 10Urbanecm: MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734906 (https://phabricator.wikimedia.org/T294386) [09:36:21] (03CR) 10Urbanecm: [C: 03+2] MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734906 (https://phabricator.wikimedia.org/T294386) (owner: 10Urbanecm) [09:37:34] 10SRE, 10serviceops, 10Developer Productivity, 10Performance-Team (Radar), and 2 others: Debug hosts sometimes Fatal error: "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10hashar) I have forgot, a reqid example: https://logstash.wikimedia.org/app/dashboard... [09:52:54] (03CR) 10Filippo Giunchedi: "Can't meaningfully comment on the semantics since I'm not an exim expert but LGTM as an incremental improvement" [puppet] - 10https://gerrit.wikimedia.org/r/734391 (https://phabricator.wikimedia.org/T294166) (owner: 10Herron) [09:54:52] (03Merged) 10jenkins-bot: Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734752 (https://phabricator.wikimedia.org/T293434) (owner: 10Urbanecm) [09:54:54] (03Merged) 10jenkins-bot: Mentee overview: Exclude users without homepage enabled [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734751 (https://phabricator.wikimedia.org/T293434) (owner: 10Urbanecm) [09:55:07] * urbanecm is going to test ^^ [09:55:14] +1 :] [09:56:03] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: add further error handling [deployment-charts] - 10https://gerrit.wikimedia.org/r/734904 (owner: 10Giuseppe Lavagetto) [09:56:51] (03Merged) 10jenkins-bot: MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.5) - 10https://gerrit.wikimedia.org/r/734765 (https://phabricator.wikimedia.org/T294386) (owner: 10Urbanecm) [09:56:53] (03Merged) 10jenkins-bot: MentorFilterHooks: Account for no matching users [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734906 (https://phabricator.wikimedia.org/T294386) (owner: 10Urbanecm) [09:57:02] (03PS1) 10Jelto: blubberoid: bump common_templates to 0.4 and chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/734926 (https://phabricator.wikimedia.org/T292390) [10:00:24] (03PS1) 10David Caro: ceph|openstack: add comment when generating ceph keyrings [puppet] - 10https://gerrit.wikimedia.org/r/734928 [10:00:26] (03Merged) 10jenkins-bot: mediawiki: add further error handling [deployment-charts] - 10https://gerrit.wikimedia.org/r/734904 (owner: 10Giuseppe Lavagetto) [10:01:07] !log [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752 [10:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:19] (should take half a minute or so) [10:01:39] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:01:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:32] !log [urbanecm@mwdebug1001 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki --dbshard=s2 --verbose # testing 734752 [10:02:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:01] * urbanecm is happy about the very low runtime of that script [10:03:07] used to take hours, testing'd be more difficult :D [10:03:31] <_joe_> urbanecm: it's fast because we moved from mysql to /dev/null as a database [10:03:42] heh [10:04:42] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:45] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] ceph|openstack: add comment when generating ceph keyrings [puppet] - 10https://gerrit.wikimedia.org/r/734928 (owner: 10David Caro) [10:06:00] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 3 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31924/console" [puppet] - 10https://gerrit.wikimedia.org/r/734928 (owner: 10David Caro) [10:07:00] spot checked 734752, data matches, decrease of rows in the table also matches expectations [10:07:10] I'll test 734906 now and sync [10:07:45] hmm [10:08:02] it _does_ remove the fatal [10:08:13] but changes that with a feature bug [10:08:21] well, given it never properly worked, and now it at least doesn't fatal, I'm syncing [10:10:11] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.5/extensions/GrowthExperiments/: 305e97a, 667a4be: GrowthExperiments backports (T293434, T294386) (duration: 01m 04s) [10:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:19] T294386: InvalidArgumentException: Wikimedia\Rdbms\Database::makeList: empty input for field recentchanges_actor.actor_user - https://phabricator.wikimedia.org/T294386 [10:10:19] T293434: Mentor dashboard: Mentee overview should not include users with homepage and/or mentorship disabled - https://phabricator.wikimedia.org/T293434 [10:11:58] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/: 305e97a, b9eaa20: GrowthExperiments backports (T293434, T294386) (duration: 01m 04s) [10:12:03] * urbanecm done [10:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:13:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:14:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:20] (03CR) 10Lucas Werkmeister (WMDE): Add missing termbox codes from Wikibase (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734722 (https://phabricator.wikimedia.org/T277836) (owner: 10Mbch331) [10:16:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:20:39] (03PS1) 10Arturo Borrero Gonzalez: puppetmaster: reorder private hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/734937 [10:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:37] filled T294421 for the bug i unraveled by making it not fatal [10:31:37] T294421: Recent changes mentorship filters: Starred mentees filter is ignored if user has no starred mentees - https://phabricator.wikimedia.org/T294421 [10:33:34] (03CR) 10Jbond: [C: 03+1] "LGTM, further the only hosts with an entry on the private repo are netbox-dev2001 which has no site equivalent hiera config and cloudbacku" [puppet] - 10https://gerrit.wikimedia.org/r/734937 (owner: 10Arturo Borrero Gonzalez) [10:37:16] PROBLEM - Widespread puppet agent failures- no resources reported on alert1001 is CRITICAL: 0.01529 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [10:37:44] (03PS2) 10Arturo Borrero Gonzalez: puppetmaster: reorder private hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/734937 [10:39:22] RECOVERY - Widespread puppet agent failures- no resources reported on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.002831 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [10:39:45] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/734937 (owner: 10Arturo Borrero Gonzalez) [10:42:23] !log disable puppet fleet wide to deploy a puppetmaster change [10:43:21] !log disable puppet fleet wide to deploy a puppetmaster change gerrit:734937 [10:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:21] (03CR) 10Jbond: [C: 03+2] puppetmaster: reorder private hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/734937 (owner: 10Arturo Borrero Gonzalez) [10:46:43] (03PS2) 10Urbanecm: Deploy Growth mentor dashboard to phase II wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732967 (https://phabricator.wikimedia.org/T278920) [10:53:45] !log enable puppet fleet wide post gerrit:734937 [10:53:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1100). [11:00:05] Urbanecm and zabe: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:11] o/ [11:00:20] o/ [11:00:22] i can deploy today [11:00:24] ok! [11:00:43] (03CR) 10Urbanecm: [C: 03+2] Deploy Growth mentor dashboard to phase II wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732967 (https://phabricator.wikimedia.org/T278920) (owner: 10Urbanecm) [11:01:40] (03Merged) 10jenkins-bot: Deploy Growth mentor dashboard to phase II wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732967 (https://phabricator.wikimedia.org/T278920) (owner: 10Urbanecm) [11:02:26] zabe: did you make sure they're fine with _losing_ deleted pages in that NS? [11:02:35] see the following https://www.irccloud.com/pastebin/SkTaw2Fn/ [11:03:02] for count of deleted pages https://www.irccloud.com/pastebin/4zQgEEwg/ [11:04:41] I was in #wikipedia-es yesterday and asked to cleanup those namespaces. But I didn't specifcly asked about deleted pages. [11:05:04] can you check please? [11:05:10] yes [11:06:06] 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, and 2 others: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet - https://phabricator.wikimedia.org/T294377 (10hnowlan) a:05hnowlan→03Papaul [11:06:37] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:06:47] thanks zabe [11:06:47] (03CR) 10Jbond: [C: 03+1] mstyles no longer needs search access [puppet] - 10https://gerrit.wikimedia.org/r/734886 (owner: 10Ryan Kemper) [11:08:37] (03PS4) 10Jbond: hiera: Add hostname/certname based lookup to secret hierarchy under labs [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [11:09:37] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 3ef422703772621b8248a89724c4770a58f320f6: Deploy Growth mentor dashboard to phase II wikis (T278920) (duration: 01m 03s) [11:09:42] * urbanecm done [11:09:50] zabe: anything else from you? [11:09:54] (or Lucas_WMDE, or anyone :)) [11:09:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:09:55] (03CR) 10Jbond: [C: 03+1] "LGTM, minor nit" [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [11:10:01] no [11:10:04] okay [11:11:06] !log UTC morning B&C done [11:17:56] (03CR) 10Jbond: [C: 03+1] "LGTM minor nit" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [11:18:44] (03CR) 10Volans: "Compiler results available here:" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [11:19:18] (03CR) 10Jbond: [C: 03+1] setup.py: include type hints for dependencies (031 comment) [software/homer] - 10https://gerrit.wikimedia.org/r/734648 (owner: 10Volans) [11:20:05] (03CR) 10Volans: "question inline" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [11:21:35] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: (WIP) Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10jbond) > We probably want approval from @odimitrijevic for the analytics access or @Ottomata [11:22:25] (03CR) 10Jbond: [WIP] Add ejoseph (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [11:24:52] (03CR) 10Jbond: [C: 03+1] Adopt pathlib.Path everywhere (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [11:31:54] (03PS2) 10Volans: Adopt pathlib.Path everywhere [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 [11:32:01] (03CR) 10Volans: "addressed comments" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [11:36:24] (03CR) 10Jbond: [C: 03+1] "LGTM" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [11:37:23] 10SRE, 10ops-eqiad, 10DC-Ops, 10Platform Engineering, and 2 others: Q2:(Need By: TBD) rack/setup/install restbase103[123].eqiad.wmnet - https://phabricator.wikimedia.org/T294372 (10hnowlan) a:05hnowlan→03Papaul [11:37:59] 10SRE, 10Community-Tech, 10serviceops, 10wikidiff2, 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10ldelench_wmf) [11:40:10] 10SRE, 10ops-eqiad, 10DC-Ops, 10Platform Engineering, and 2 others: Q2:(Need By: TBD) rack/setup/install restbase103[123].eqiad.wmnet - https://phabricator.wikimedia.org/T294372 (10hnowlan) [11:40:22] 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, and 2 others: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet - https://phabricator.wikimedia.org/T294377 (10hnowlan) [11:40:36] (03CR) 10Jbond: [C: 03+2] puppetmaster: drop log messages from logstash reporter [puppet] - 10https://gerrit.wikimedia.org/r/719368 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [11:41:35] (03PS7) 10Jbond: P:puppetmaster::common: Add back logstash support [puppet] - 10https://gerrit.wikimedia.org/r/719372 (https://phabricator.wikimedia.org/T222826) [11:46:22] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31928/console" [puppet] - 10https://gerrit.wikimedia.org/r/719372 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [11:53:00] (03CR) 10Majavah: puppetboard: add puppetboard as an active/active service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734263 (owner: 10Jbond) [11:53:05] (03CR) 10Majavah: [C: 04-1] puppetboard: add puppetboard as an active/active service [puppet] - 10https://gerrit.wikimedia.org/r/734263 (owner: 10Jbond) [11:54:24] (03PS8) 10Jbond: P:puppetmaster::common: Add back logstash support [puppet] - 10https://gerrit.wikimedia.org/r/719372 (https://phabricator.wikimedia.org/T222826) [11:57:04] (03PS4) 10Jbond: puppetboard: add puppetboard as an active/active service [puppet] - 10https://gerrit.wikimedia.org/r/734263 [11:57:31] (03CR) 10Jbond: "updated" [puppet] - 10https://gerrit.wikimedia.org/r/734263 (owner: 10Jbond) [11:59:38] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31929/console" [puppet] - 10https://gerrit.wikimedia.org/r/719372 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [12:03:33] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:puppetmaster::common: Add back logstash support [puppet] - 10https://gerrit.wikimedia.org/r/719372 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [12:15:53] (03PS1) 10Jbond: O:puppetmaster::standalon: add support for logstash report [puppet] - 10https://gerrit.wikimedia.org/r/734951 [12:16:51] (03CR) 10Jbond: [C: 03+2] O:puppetmaster::standalon: add support for logstash report [puppet] - 10https://gerrit.wikimedia.org/r/734951 (owner: 10Jbond) [12:40:02] RECOVERY - snapshot of x1 in codfw on alert1001 is OK: Last snapshot for x1 at codfw (db2101.codfw.wmnet:3320) taken on 2021-10-27 12:09:56 (315 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [12:40:20] \o/ [12:49:01] (03PS1) 10Elukey: Fix kerberos keytabs for an-druid nodes [labs/private] - 10https://gerrit.wikimedia.org/r/734959 [12:49:29] (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix kerberos keytabs for an-druid nodes [labs/private] - 10https://gerrit.wikimedia.org/r/734959 (owner: 10Elukey) [12:50:38] (03CR) 10Jbond: [C: 03+1] "lgtm" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [13:00:04] twentyafterfour and hashar: #bothumor I � Unicode. All rise for MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1300). [13:04:35] (03CR) 10Ayounsi: [C: 03+1] network: add drmrs prefixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [13:07:49] (03PS1) 10Jbond: puppetmaster: enable logstash reports [puppet] - 10https://gerrit.wikimedia.org/r/734961 (https://phabricator.wikimedia.org/T1) [13:09:09] (03PS2) 10Jbond: puppetmaster: enable logstash reports [puppet] - 10https://gerrit.wikimedia.org/r/734961 (https://phabricator.wikimedia.org/T1) [13:09:54] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31930/console" [puppet] - 10https://gerrit.wikimedia.org/r/734961 (https://phabricator.wikimedia.org/T1) (owner: 10Jbond) [13:11:07] 10SRE, 10ops-eqiad, 10Infrastructure-Foundations, 10netops: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10ayounsi) Circuit is up. [13:14:29] (03PS3) 10Jbond: puppetmaster: enable logstash reports [puppet] - 10https://gerrit.wikimedia.org/r/734961 (https://phabricator.wikimedia.org/T1) [13:17:07] (03CR) 10Jbond: [C: 03+2] puppetmaster: enable logstash reports [puppet] - 10https://gerrit.wikimedia.org/r/734961 (https://phabricator.wikimedia.org/T1) (owner: 10Jbond) [13:23:13] (03CR) 10Volans: "Reply inline" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [13:28:29] (03PS1) 10Majavah: remove role::beta::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/734962 [13:30:48] (03PS1) 10Hashar: Remove bot humors for deployers [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/734964 [13:31:21] (03CR) 10Subramanya Sastry: [C: 03+1] mediawiki: Remove tidy binary [puppet] - 10https://gerrit.wikimedia.org/r/732386 (owner: 10Legoktm) [13:31:27] (03CR) 10Jbond: [C: 03+1] "LGTM thanks" [puppet] - 10https://gerrit.wikimedia.org/r/734962 (owner: 10Majavah) [13:31:32] (03CR) 10jerkins-bot: [V: 04-1] Remove bot humors for deployers [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/734964 (owner: 10Hashar) [13:31:38] PROBLEM - Widespread puppet agent failures- no resources reported on alert1001 is CRITICAL: 0.01076 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [13:31:59] jbond: can you merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/734962/ as well or do you want reviews from other people? [13:32:08] majavah: sure will do [13:32:11] (03CR) 10Jbond: [C: 03+2] remove role::beta::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/734962 (owner: 10Majavah) [13:32:18] thanks! [13:32:23] np :) [13:33:26] PROBLEM - Widespread puppet agent failures on alert1001 is CRITICAL: 0.01416 ge 0.01 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [13:34:40] jbond: related to any recent change? ^^^ [13:34:41] puppet:///modules/mediawiki/mediawiki-converters.profile: Error 500 on SERVER [13:34:43] it is wip --^? I see 500s logged by puppet while retrieving the catalog (on nodes) [13:34:46] Profile::Environment/File[/etc/profile.d/field.sh]) Could not evaluate: Could not retrieve file metadata for puppet:///modules/base/environment/field.sh: Error 500 on SERVER: [13:35:18] a subsequent puppet run on cp2030 worked, maybe the puppet masters are slowing down [13:36:55] * jbond looking [13:37:26] i think its because i restarted apache without disableing puppet, it should be fine and recover shortly sorry for the noise [13:39:07] (03PS2) 10Hashar: Remove bot humors for deployers [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/734964 [13:39:09] (03PS1) 10Hashar: Ignore flake8-bugbear B904 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/734965 [13:40:50] jbond: ahhh ack nice, np :) thanks for checking [13:41:48] RECOVERY - Widespread puppet agent failures on alert1001 is OK: (C)0.01 ge (W)0.006 ge 0.001133 https://puppetboard.wikimedia.org/nodes?status=failed https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [13:45:25] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: (WIP) Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10Ottomata) Approved. This will be ssh + kerberos access. [13:46:14] RECOVERY - Widespread puppet agent failures- no resources reported on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [13:57:24] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: Fix errors in puppet config - https://phabricator.wikimedia.org/T294435 (10jbond) p:05Triage→03Medium [13:58:11] !log removed /var/run/confd-template/.inference*.err files from puppetmaster2001 (backup saved in /home/elukey just in case) [13:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:04] !log Replace m5-master so it points to dbproxy1017 - T288093 [14:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:12] T288093: Place m5 proxies in codfw and eqiad - https://phabricator.wikimedia.org/T288093 [14:00:29] (03CR) 10Marostegui: [C: 03+2] wmnet: Replace m5-master with dbproxy1017 [dns] - 10https://gerrit.wikimedia.org/r/734806 (https://phabricator.wikimedia.org/T288093) (owner: 10Marostegui) [14:00:41] (03PS1) 10Jbond: Profile::Elasticsearch::Logstash: Ensure keys are strings [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) [14:01:50] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31932/console" [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:02:01] (03PS2) 10Jbond: Profile::Elasticsearch::Logstash: Ensure keys are strings [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) [14:03:15] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31933/console" [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:04:50] (03CR) 10Elukey: "Left some comments, my understanding is that f'..' is needed for all the f-strings right?" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [14:06:17] (03PS1) 10Jbond: hieradata: remove empty files [puppet] - 10https://gerrit.wikimedia.org/r/734969 (https://phabricator.wikimedia.org/T294435) [14:06:34] (03PS1) 10Marostegui: Revert "wmnet: Replace m5-master with dbproxy1017" [dns] - 10https://gerrit.wikimedia.org/r/734908 [14:07:17] (03CR) 10Ayounsi: [C: 03+1] network: add drmrs prefixes (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [14:08:14] (03PS1) 10Ema: varnishreqstats.mtail: remove wildcard match [puppet] - 10https://gerrit.wikimedia.org/r/734970 (https://phabricator.wikimedia.org/T293879) [14:08:38] (03CR) 10Marostegui: [C: 03+2] Revert "wmnet: Replace m5-master with dbproxy1017" [dns] - 10https://gerrit.wikimedia.org/r/734908 (owner: 10Marostegui) [14:09:54] (03PS14) 10Jgiannelos: maps: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [14:11:01] (03PS4) 10Volans: network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) [14:11:10] (03CR) 10Volans: "Included latest changes in network topology" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [14:13:20] (03PS1) 10Jbond: metadata.json: drop metadata files [puppet] - 10https://gerrit.wikimedia.org/r/734971 [14:13:42] 10SRE, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Patch-For-Review: Wikibase dispatchChanges.php runs slow, creates an absurd amount of database connections - https://phabricator.wikimedia.org/T118162 (10karapayneWMDE) [14:14:05] (03CR) 10Jbond: [C: 03+2] hieradata: remove empty files [puppet] - 10https://gerrit.wikimedia.org/r/734969 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:14:09] (03CR) 10Jbond: [C: 03+2] metadata.json: drop metadata files [puppet] - 10https://gerrit.wikimedia.org/r/734971 (owner: 10Jbond) [14:19:12] (03PS2) 10Volans: style: adopt f-strings [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 [14:19:14] (03PS3) 10Volans: Adopt pathlib.Path everywhere [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 [14:19:16] (03CR) 10Volans: "Nice catches, addressed comments" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [14:24:11] (03PS1) 10Jbond: R:acme_chief::cert: drop deprecated paramters [puppet] - 10https://gerrit.wikimedia.org/r/734973 (https://phabricator.wikimedia.org/T294435) [14:24:43] (03CR) 10David Caro: [V: 03+1 C: 03+2] ceph|openstack: add comment when generating ceph keyrings [puppet] - 10https://gerrit.wikimedia.org/r/734928 (owner: 10David Caro) [14:25:17] (03PS1) 10Jgiannelos: tile-pregeneration: Adapt to new event schema [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/734975 (https://phabricator.wikimedia.org/T293366) [14:26:01] (03CR) 10Jbond: "PCC: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31934" [puppet] - 10https://gerrit.wikimedia.org/r/734973 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:28:28] (03PS1) 10Legoktm: mariadb: Allow lists1001 to talk to m5's dbproxy [puppet] - 10https://gerrit.wikimedia.org/r/734976 [14:29:25] (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31935/console" [puppet] - 10https://gerrit.wikimedia.org/r/734976 (owner: 10Legoktm) [14:29:27] (03CR) 10Marostegui: "Thanks for putting the patch up!" [puppet] - 10https://gerrit.wikimedia.org/r/734976 (owner: 10Legoktm) [14:29:50] (03PS1) 10Jbond: P:redis::multidc: drop warning [puppet] - 10https://gerrit.wikimedia.org/r/734977 (https://phabricator.wikimedia.org/T294435) [14:29:56] (03CR) 10Marostegui: mariadb: Allow lists1001 to talk to m5's dbproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734976 (owner: 10Legoktm) [14:32:31] (03CR) 10BBlack: [C: 03+1] network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [14:33:33] (03PS2) 10Legoktm: mariadb: Allow lists1001 to talk to m5's dbproxy [puppet] - 10https://gerrit.wikimedia.org/r/734976 (https://phabricator.wikimedia.org/T288093) [14:33:44] (03CR) 10Legoktm: [V: 03+1] mariadb: Allow lists1001 to talk to m5's dbproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734976 (https://phabricator.wikimedia.org/T288093) (owner: 10Legoktm) [14:34:13] (03CR) 10Marostegui: [C: 03+1] "Thank you" [puppet] - 10https://gerrit.wikimedia.org/r/734976 (https://phabricator.wikimedia.org/T288093) (owner: 10Legoktm) [14:35:39] (03CR) 10Legoktm: [C: 03+2] mariadb: Allow lists1001 to talk to m5's dbproxy [puppet] - 10https://gerrit.wikimedia.org/r/734976 (https://phabricator.wikimedia.org/T288093) (owner: 10Legoktm) [14:36:40] (03PS1) 10Jbond: R:scap::target: drop notice message [puppet] - 10https://gerrit.wikimedia.org/r/734979 [14:37:46] (03PS1) 10Ayounsi: drmrs: Enlarge public from /28 to /27 [dns] - 10https://gerrit.wikimedia.org/r/734981 (https://phabricator.wikimedia.org/T283594) [14:38:28] (03CR) 10jerkins-bot: [V: 04-1] drmrs: Enlarge public from /28 to /27 [dns] - 10https://gerrit.wikimedia.org/r/734981 (https://phabricator.wikimedia.org/T283594) (owner: 10Ayounsi) [14:41:14] (03CR) 10Volans: [C: 03+1] "LGTM according to https://wikitech.wikimedia.org/wiki/DNS/Netbox#Atomically_deploy_auto-generated_records_and_a_manual_change plan" [dns] - 10https://gerrit.wikimedia.org/r/734981 (https://phabricator.wikimedia.org/T283594) (owner: 10Ayounsi) [14:47:22] (03PS1) 10Bearloga: statistics::product_analytics: Switch to monthly execution [puppet] - 10https://gerrit.wikimedia.org/r/734985 (https://phabricator.wikimedia.org/T291957) [14:47:56] (03CR) 10jerkins-bot: [V: 04-1] statistics::product_analytics: Switch to monthly execution [puppet] - 10https://gerrit.wikimedia.org/r/734985 (https://phabricator.wikimedia.org/T291957) (owner: 10Bearloga) [14:48:00] (03PS1) 10Jbond: hiara: use lookup instead of hiera [puppet] - 10https://gerrit.wikimedia.org/r/734986 (https://phabricator.wikimedia.org/T294435) [14:48:18] (03PS2) 10Jbond: hiara: use lookup instead of hiera [puppet] - 10https://gerrit.wikimedia.org/r/734986 (https://phabricator.wikimedia.org/T294435) [14:50:30] (03PS2) 10Bearloga: statistics::product_analytics: Switch to monthly execution [puppet] - 10https://gerrit.wikimedia.org/r/734985 (https://phabricator.wikimedia.org/T291957) [14:51:08] !log volans@cumin2002 START - Cookbook sre.dns.netbox [14:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:17] 10SRE, 10Infrastructure-Foundations, 10netbox, 10netops, 10Patch-For-Review: Stage drmrs in Netbox - https://phabricator.wikimedia.org/T283594 (10ayounsi) 05Open→03Resolved a:03ayounsi Netbox now reflects reality. Only cable IDs and asset tags are missing. [14:53:23] 10SRE, 10Commons, 10DBA, 10MediaWiki-extensions-WikibaseClient, and 3 others: Enable statement usage tracking on Commons and Co - https://phabricator.wikimedia.org/T188730 (10Lucas_Werkmeister_WMDE) [14:53:51] (03PS1) 10Jbond: R:cassandra::instance: explicitly cast string to int [puppet] - 10https://gerrit.wikimedia.org/r/734988 (https://phabricator.wikimedia.org/T294435) [14:55:51] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31936/console" [puppet] - 10https://gerrit.wikimedia.org/r/734988 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:55:57] (03CR) 10Elukey: "Nit on the commit msg, LGTM from my point of view!" [puppet] - 10https://gerrit.wikimedia.org/r/734986 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [14:56:36] !log volans@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:56:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:56] (03CR) 10Volans: [C: 03+1] "recheck" [dns] - 10https://gerrit.wikimedia.org/r/734981 (https://phabricator.wikimedia.org/T283594) (owner: 10Ayounsi) [14:57:04] (03CR) 10Ayounsi: [C: 03+1] "Thanks!" [software/homer] - 10https://gerrit.wikimedia.org/r/734650 (owner: 10Volans) [14:57:51] (03CR) 10Volans: [C: 03+2] drmrs: Enlarge public from /28 to /27 [dns] - 10https://gerrit.wikimedia.org/r/734981 (https://phabricator.wikimedia.org/T283594) (owner: 10Ayounsi) [15:00:22] (03PS1) 10Jbond: R:profile::prometheus::redis_exporter: explicitly cast to int [puppet] - 10https://gerrit.wikimedia.org/r/734989 (https://phabricator.wikimedia.org/T294435) [15:01:32] (03CR) 10Elukey: [C: 03+1] style: adopt f-strings [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [15:01:52] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31937/console" [puppet] - 10https://gerrit.wikimedia.org/r/734989 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:01:58] (03CR) 10Effie Mouzeli: [C: 03+1] "It was useful when we introduced it, but yes not it is useless." [puppet] - 10https://gerrit.wikimedia.org/r/734977 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:02:06] (03CR) 10Volans: [C: 03+2] style: adopt f-strings [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [15:03:44] !log volans@cumin2002 START - Cookbook sre.dns.netbox [15:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:54] (03PS1) 10Jbond: P:webperf::processors: explicitly cast port as a integer [puppet] - 10https://gerrit.wikimedia.org/r/734990 (https://phabricator.wikimedia.org/T294435) [15:05:18] (03Merged) 10jenkins-bot: style: adopt f-strings [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734707 (owner: 10Volans) [15:05:22] (03PS1) 10BBlack: Fix comment for public1-b13-drmrs [dns] - 10https://gerrit.wikimedia.org/r/734991 [15:06:08] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31938/console" [puppet] - 10https://gerrit.wikimedia.org/r/734990 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:07:18] (03PS3) 10Cwhite: Profile::Elasticsearch::Logstash: Ensure keys are strings [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:07:19] !log volans@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:11] (03PS1) 10Jbond: R:netops::check: drop is_array as its depreacated [puppet] - 10https://gerrit.wikimedia.org/r/734992 (https://phabricator.wikimedia.org/T294435) [15:08:29] (03CR) 10BBlack: [C: 03+2] Fix comment for public1-b13-drmrs [dns] - 10https://gerrit.wikimedia.org/r/734991 (owner: 10BBlack) [15:08:58] !log cmooney@cumin1001 START - Cookbook sre.dns.netbox [15:09:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:19] (03CR) 10Juan90264: [C: 03+1] "LGTM :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734361 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [15:12:33] !log cmooney@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [15:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:12] (03CR) 10Jbond: "PCC https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31939" [puppet] - 10https://gerrit.wikimedia.org/r/734992 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:13:21] (03CR) 10Filippo Giunchedi: [C: 03+1] varnishreqstats.mtail: remove wildcard match [puppet] - 10https://gerrit.wikimedia.org/r/734970 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [15:14:40] (03CR) 10Filippo Giunchedi: [C: 03+1] "I don't see the log message but LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/734989 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:16:16] (03PS1) 10Jbond: P:nftables::basefirewall: drop is_ip{4,6}_address as deprecated [puppet] - 10https://gerrit.wikimedia.org/r/734995 (https://phabricator.wikimedia.org/T294435) [15:16:39] (03PS4) 10Cwhite: hiera: logstash::curator_actions: Ensure keys are strings [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:16:57] (03CR) 10Elukey: [C: 03+1] Adopt pathlib.Path everywhere [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [15:17:01] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31940/console" [puppet] - 10https://gerrit.wikimedia.org/r/734995 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:17:56] (03CR) 10Volans: [C: 03+2] Adopt pathlib.Path everywhere [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [15:20:33] (03Merged) 10jenkins-bot: Adopt pathlib.Path everywhere [software/pywmflib] - 10https://gerrit.wikimedia.org/r/734708 (owner: 10Volans) [15:20:41] (03PS1) 10Jbond: C:scap::scripts: ensure we inlucde scap::master [puppet] - 10https://gerrit.wikimedia.org/r/734996 (https://phabricator.wikimedia.org/T294435) [15:21:11] (03CR) 10BBlack: [C: 03+1] "Re-plus-one after looking again and making steely eyes at the diff" [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [15:21:14] (03CR) 10Ottomata: [C: 03+2] statistics::product_analytics: Switch to monthly execution [puppet] - 10https://gerrit.wikimedia.org/r/734985 (https://phabricator.wikimedia.org/T291957) (owner: 10Bearloga) [15:22:56] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 4 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31941/console" [puppet] - 10https://gerrit.wikimedia.org/r/734996 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:24:24] (03CR) 10Jbond: [C: 04-1] "this is not correct we need another way to pass through the variables required" [puppet] - 10https://gerrit.wikimedia.org/r/734996 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:24:51] (03PS1) 10Jbond: C:aptrepo::rsync: explicitly include aptrepo::rsync [puppet] - 10https://gerrit.wikimedia.org/r/734997 (https://phabricator.wikimedia.org/T294435) [15:24:54] (03CR) 10Ayounsi: [C: 03+1] "I looked at some more hosts from your list and can't see anything wrong." [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [15:25:32] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31942/console" [puppet] - 10https://gerrit.wikimedia.org/r/734997 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:26:08] (03PS2) 10BBlack: Add digicert-2021-unified certs and intermediates [puppet] - 10https://gerrit.wikimedia.org/r/732009 (https://phabricator.wikimedia.org/T289507) [15:26:56] (03CR) 10Volans: [C: 03+2] network: add drmrs prefixes [puppet] - 10https://gerrit.wikimedia.org/r/734897 (https://phabricator.wikimedia.org/T282787) (owner: 10Volans) [15:28:06] !log deployed new prefixes for drmrs in modules/network/data/data.yaml - T282787 [15:28:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:12] T282787: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 [15:28:42] (03PS1) 10Jbond: P:httpbb: check for basicauth_credentials using defined [puppet] - 10https://gerrit.wikimedia.org/r/734999 (https://phabricator.wikimedia.org/T294435) [15:29:58] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31943/console" [puppet] - 10https://gerrit.wikimedia.org/r/734999 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:31:52] (03CR) 10BBlack: [C: 03+2] Add digicert-2021-unified certs and intermediates [puppet] - 10https://gerrit.wikimedia.org/r/732009 (https://phabricator.wikimedia.org/T289507) (owner: 10BBlack) [15:32:26] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: (WIP) Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10EJoseph) username: ejoseph email: ejoseph@wikimedia.org ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFQvFscO99eUfcg51aOxPekk5JW1QIBVMjhNbAEuIlR5 ejoseph@wikimedia.org [15:35:05] 10SRE, 10ops-eqiad, 10Infrastructure-Foundations, 10netops: Patch Telxius transport cross-connect to cr1-eqiad - https://phabricator.wikimedia.org/T293709 (10RobH) 05Open→03Resolved [15:36:01] (03PS1) 10Jbond: R:varnish::wikimedia_vcl: drop undefined metaparameters [puppet] - 10https://gerrit.wikimedia.org/r/735005 (https://phabricator.wikimedia.org/T294435) [15:37:37] (03CR) 10Cwhite: [C: 03+2] hiera: logstash::curator_actions: Ensure keys are strings [puppet] - 10https://gerrit.wikimedia.org/r/734968 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:37:40] (03PS1) 10BBlack: Add digicert-2021 to available_unified_certs [puppet] - 10https://gerrit.wikimedia.org/r/735007 (https://phabricator.wikimedia.org/T289507) [15:37:42] (03PS1) 10BBlack: Switch eqsin to digicert-2021 [puppet] - 10https://gerrit.wikimedia.org/r/735008 (https://phabricator.wikimedia.org/T289507) [15:37:44] (03PS1) 10BBlack: Switch esams to digicert-2021 [puppet] - 10https://gerrit.wikimedia.org/r/735009 (https://phabricator.wikimedia.org/T289507) [15:38:31] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 8): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31944/console" [puppet] - 10https://gerrit.wikimedia.org/r/735005 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [15:43:02] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [15:43:36] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn mw2255 is done [15:47:52] (03PS1) 10Jbond: R:cassandra::instance::monitoring: make sure cassandra is loaded [puppet] - 10https://gerrit.wikimedia.org/r/735012 (https://phabricator.wikimedia.org/T294435) [15:50:08] (03PS2) 10Ryan Kemper: [WIP] Add ejoseph [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) [15:50:41] (03CR) 10Dduvall: hiera: Add hostname/certname based lookup to secret hierarchy under labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [15:50:54] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add ejoseph [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [15:51:07] (03CR) 10Ryan Kemper: [WIP] Add ejoseph (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [15:54:34] (03PS5) 10Dduvall: hiera: Add hostname/certname based lookup to secret hierarchy under labs [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) [15:55:18] (03PS1) 10Bartosz Dziewoński: Make 'sourcemodetoolbar' available by default on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735014 [15:55:58] (03PS1) 10Jbond: R:icingamonitor::elasticsearch::cirrus_settings_check [puppet] - 10https://gerrit.wikimedia.org/r/735015 (https://phabricator.wikimedia.org/T294435) [15:56:22] (03CR) 10Dduvall: hiera: Add hostname/certname based lookup to secret hierarchy under labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [15:56:32] (03PS2) 10Bartosz Dziewoński: DiscussionTools: Make 'sourcemodetoolbar' available on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735014 [15:57:26] (03CR) 10BBlack: [C: 03+2] Add digicert-2021 to available_unified_certs [puppet] - 10https://gerrit.wikimedia.org/r/735007 (https://phabricator.wikimedia.org/T289507) (owner: 10BBlack) [15:57:50] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [15:58:12] 10Puppet, 10Cloud Services Proposals, 10Cloud-VPS, 10Infrastructure-Foundations, and 3 others: Easing pain points caused by divergence between cloudservices and production puppet usecases - https://phabricator.wikimedia.org/T285539 (10colewhite) As part of T288618, we started using production puppet to man... [15:59:40] (03PS1) 10Jbond: C:gitlab: drop undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) [16:00:38] (03CR) 10jerkins-bot: [V: 04-1] C:gitlab: drop undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:01:02] (03CR) 10Jbond: "havn;t checked why this is failing pcc yet" [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:02:26] (03PS1) 10Jbond: P:druid::historical: make sure all variables are defined [puppet] - 10https://gerrit.wikimedia.org/r/735017 (https://phabricator.wikimedia.org/T294435) [16:02:32] (03PS3) 10DLynch: Add event stream config for discussiontools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731854 (https://phabricator.wikimedia.org/T286076) [16:02:42] (03CR) 10Ayounsi: [C: 03+1] "Much cleaner!" [software/homer] - 10https://gerrit.wikimedia.org/r/734649 (owner: 10Volans) [16:03:11] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31948/console" [puppet] - 10https://gerrit.wikimedia.org/r/735017 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:05:54] (03PS1) 10Elukey: kserve: move prometheus annotations from Service to container [deployment-charts] - 10https://gerrit.wikimedia.org/r/735018 (https://phabricator.wikimedia.org/T289841) [16:08:14] (03CR) 10Elukey: [C: 03+1] R:profile::prometheus::redis_exporter: explicitly cast to int [puppet] - 10https://gerrit.wikimedia.org/r/734989 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:09:21] (03PS1) 10Jbond: R:ceph::keyring: ensure all variables are defined [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) [16:10:35] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] P:nftables::basefirewall: drop is_ip{4,6}_address as deprecated [puppet] - 10https://gerrit.wikimedia.org/r/734995 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:10:44] (03CR) 10Elukey: [C: 03+2] kserve: move prometheus annotations from Service to container [deployment-charts] - 10https://gerrit.wikimedia.org/r/735018 (https://phabricator.wikimedia.org/T289841) (owner: 10Elukey) [16:11:07] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31949/console" [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:13:10] (03PS3) 10Inductiveload: Allow copy-upload (by URL) for Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730507 (https://phabricator.wikimedia.org/T293205) [16:13:12] (03PS1) 10Inductiveload: enwikisource: Enable copy-upload for autoconfirmed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735020 (https://phabricator.wikimedia.org/T294447) [16:15:08] (03PS1) 10Jbond: R:mtail::program: WIP - i dont think we need to manage metaparamters [puppet] - 10https://gerrit.wikimedia.org/r/735021 (https://phabricator.wikimedia.org/T294435) [16:15:57] (03CR) 10Ottomata: [C: 03+1] Add event stream config for discussiontools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731854 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [16:16:42] (03CR) 10David Caro: R:ceph::keyring: ensure all variables are defined (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:17:35] (03PS1) 10Cwhite: Revert "hiera: logstash::curator_actions: Ensure keys are strings" [puppet] - 10https://gerrit.wikimedia.org/r/734917 [16:18:16] (03PS1) 10Jbond: bigtop::mysql_jdbc: ensure package_name variable is always defined [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) [16:18:34] (03CR) 10jerkins-bot: [V: 04-1] Revert "hiera: logstash::curator_actions: Ensure keys are strings" [puppet] - 10https://gerrit.wikimedia.org/r/734917 (owner: 10Cwhite) [16:18:49] (03CR) 10jerkins-bot: [V: 04-1] bigtop::mysql_jdbc: ensure package_name variable is always defined [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:19:09] (03PS2) 10Cwhite: Revert "hiera: logstash::curator_actions: Ensure keys are strings" [puppet] - 10https://gerrit.wikimedia.org/r/734917 [16:20:18] (03CR) 10Jbond: "pcc: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31951" [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:20:25] (03PS2) 10Jbond: bigtop::mysql_jdbc: ensure package_name variable is always defined [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) [16:20:43] (03CR) 10Cwhite: [C: 03+2] Revert "hiera: logstash::curator_actions: Ensure keys are strings" [puppet] - 10https://gerrit.wikimedia.org/r/734917 (owner: 10Cwhite) [16:21:05] (03CR) 10jerkins-bot: [V: 04-1] bigtop::mysql_jdbc: ensure package_name variable is always defined [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:21:44] !log otto@deploy1002 Started deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18] [16:21:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:52] (03CR) 10Volans: [V: 03+2 C: 03+2] setup.py: include type hints for dependencies [software/homer] - 10https://gerrit.wikimedia.org/r/734648 (owner: 10Volans) [16:24:02] (03CR) 10Volans: [C: 03+2] pylint: fixed newly reported issues [software/homer] - 10https://gerrit.wikimedia.org/r/734649 (owner: 10Volans) [16:24:20] (03CR) 10Volans: [C: 03+2] transports: catch connection error [software/homer] - 10https://gerrit.wikimedia.org/r/734650 (owner: 10Volans) [16:25:02] (03PS1) 10Jbond: P:mariadb::grants::production: ensure we inlucde required classes [puppet] - 10https://gerrit.wikimedia.org/r/735025 (https://phabricator.wikimedia.org/T294435) [16:25:17] (03CR) 10Jbond: "pcc: https://puppet-compiler.wmflabs.org/compiler1002/31950/" [puppet] - 10https://gerrit.wikimedia.org/r/735021 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:27:16] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: (WIP) Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10RKemper) [16:27:40] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 7): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31952/console" [puppet] - 10https://gerrit.wikimedia.org/r/735025 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:27:44] (03PS1) 10Jbond: P:docker::engine: ensure we include all required classes [puppet] - 10https://gerrit.wikimedia.org/r/735026 (https://phabricator.wikimedia.org/T294435) [16:28:05] (03Merged) 10jenkins-bot: pylint: fixed newly reported issues [software/homer] - 10https://gerrit.wikimedia.org/r/734649 (owner: 10Volans) [16:28:42] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31953/console" [puppet] - 10https://gerrit.wikimedia.org/r/735026 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:28:44] (03Merged) 10jenkins-bot: transports: catch connection error [software/homer] - 10https://gerrit.wikimedia.org/r/734650 (owner: 10Volans) [16:29:16] !log elukey@deploy1002 helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [16:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:32] !log elukey@deploy1002 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [16:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:54] (03PS1) 10Jbond: P:wmcs::monitoring: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/735027 (https://phabricator.wikimedia.org/T294435) [16:31:40] (03PS1) 10Jbond: C:bigtop::hue: ensure all variables are deifned [puppet] - 10https://gerrit.wikimedia.org/r/735028 (https://phabricator.wikimedia.org/T294435) [16:34:11] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10RKemper) [16:35:02] (03PS1) 10Jbond: C:statistics::compute: correct user param [puppet] - 10https://gerrit.wikimedia.org/r/735029 (https://phabricator.wikimedia.org/T294435) [16:39:06] PROBLEM - k8s API server requests latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 verb={LIST,PATCH,PUT} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [16:39:38] PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:42:14] !log otto@deploy1002 Finished deploy [analytics/refinery@0d79e18]: Regular analytics weekly train [analytics/refinery@0d79e18] (duration: 20m 30s) [16:42:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:16] RECOVERY - k8s API server requests latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [16:43:50] RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:44:38] (03CR) 10Jbond: "https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31956" [puppet] - 10https://gerrit.wikimedia.org/r/735029 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [16:48:53] (03PS1) 10BryanDavis: toolhub: Add egress to m5-master dbproxy nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/735031 (https://phabricator.wikimedia.org/T294437) [17:29:45] !log otto@deploy1002 Started deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18] [17:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:52] !log otto@deploy1002 Finished deploy [analytics/refinery@0d79e18] (thin): Regular analytics weekly train THIN [analytics/refinery@0d79e18] (duration: 00m 07s) [17:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:34:26] !log otto@deploy1002 Started deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18] [17:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:24] (03CR) 10Dzahn: "eh yea, I certainly don't act on it, barely aware of this. so I don't mind if we remove it. Mukunda,d o you know more?" [puppet] - 10https://gerrit.wikimedia.org/r/734979 (owner: 10Jbond) [17:36:26] (03CR) 10Herron: [C: 03+1] add stack.head field for aggregating events by stack head [software/ecs] - 10https://gerrit.wikimedia.org/r/734698 (https://phabricator.wikimedia.org/T288851) (owner: 10Cwhite) [17:39:04] (03CR) 1020after4: "Seems fine to remove it. The message might be slightly helpful when debugging puppet config but most likely in practice nobody would even " [puppet] - 10https://gerrit.wikimedia.org/r/734979 (owner: 10Jbond) [17:39:10] (03CR) 1020after4: [C: 03+1] R:scap::target: drop notice message [puppet] - 10https://gerrit.wikimedia.org/r/734979 (owner: 10Jbond) [17:39:27] (03CR) 10Herron: [C: 03+2] exim: aggressively retry messages to alert.victorops.com addresses [puppet] - 10https://gerrit.wikimedia.org/r/734391 (https://phabricator.wikimedia.org/T294166) (owner: 10Herron) [17:40:24] (03CR) 10Dzahn: [C: 03+2] "noop on all webperf https://puppet-compiler.wmflabs.org/compiler1002/31957/" [puppet] - 10https://gerrit.wikimedia.org/r/734990 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [17:40:56] !log otto@deploy1002 Finished deploy [analytics/refinery@0d79e18] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d79e18] (duration: 06m 30s) [17:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:41:14] (03CR) 1020after4: [C: 03+1] "The main thing is for people using the '$manage_user' feature to be aware that the user _could_ be defined elsewhere with different permis" [puppet] - 10https://gerrit.wikimedia.org/r/734979 (owner: 10Jbond) [17:42:42] herron: expected that it changes exim defaults for QUEUEINTERVAL everywhere, not just mx? [17:43:03] mutante: yes that's expected [17:43:11] ACK, just noticed on unrelated change [17:43:14] ty [17:44:11] sure np, and also adds the VO retry specifics on the host MTAs since those are in the path for alerts as well fwiw [17:44:38] *nod* [17:45:06] (03CR) 10Dzahn: "noop confirmed in prod on webperf*" [puppet] - 10https://gerrit.wikimedia.org/r/734990 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [17:53:29] (03PS1) 10Ottomata: Bump Refine version to 0.1.20 to get JsonSchemaConvert map type logic change [puppet] - 10https://gerrit.wikimedia.org/r/735038 (https://phabricator.wikimedia.org/T263466) [17:57:08] (03PS1) 10Herron: default_mail_relay: add VO retry config to production host MTAs [puppet] - 10https://gerrit.wikimedia.org/r/735039 (https://phabricator.wikimedia.org/T294166) [17:59:04] (03PS5) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [17:59:37] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/31958/" [puppet] - 10https://gerrit.wikimedia.org/r/735039 (https://phabricator.wikimedia.org/T294166) (owner: 10Herron) [18:00:05] twentyafterfour and hashar: #bothumor My software never has bugs. It just develops random features. Rise for Train log triage with CPT. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1800). [18:00:05] RoanKattouw and Urbanecm: That opportune time is upon us again. Time for a UTC evening backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1800). [18:00:05] nn1l2 and zabe: A patch you scheduled for UTC evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:14] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: allow absenting font packages via hiera, remove fonts from canaries [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:00:18] o/ [18:00:28] Hi [18:01:54] (03CR) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:02:06] (03CR) 10Herron: [C: 03+2] "adding VO retry config to default_mail_relay template that was missed in I18389cfe0b3f9acb802ddf91c8692df1f1cc34ec" [puppet] - 10https://gerrit.wikimedia.org/r/735039 (https://phabricator.wikimedia.org/T294166) (owner: 10Herron) [18:02:40] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10ssingh) [18:02:49] Hello! I'll do the deployment today [18:03:42] Thanks [18:04:35] (03PS6) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [18:04:49] (03PS3) 10Catrope: Temporarily change the votewiki lang to fa [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734451 (https://phabricator.wikimedia.org/T292685) (owner: 104nn1l2) [18:04:54] (03CR) 10Catrope: [C: 03+2] Temporarily change the votewiki lang to fa [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734451 (https://phabricator.wikimedia.org/T292685) (owner: 104nn1l2) [18:06:29] (03Merged) 10jenkins-bot: Temporarily change the votewiki lang to fa [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734451 (https://phabricator.wikimedia.org/T292685) (owner: 104nn1l2) [18:07:56] while you're here, can i get a +2 on a beta-only config change? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/735014 [18:08:59] nn1l2: Your votewiki change is on mwdebug1002, please test [18:09:10] (03PS1) 10Dzahn: wikitech::web: remove font packages from wikitech servers [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) [18:09:16] (03CR) 10Catrope: [C: 03+2] DiscussionTools: Make 'sourcemodetoolbar' available on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735014 (owner: 10Bartosz Dziewoński) [18:09:31] Okay, confirmed [18:09:47] It's Farsi now [18:09:48] (thanks!) [18:09:58] (03PS2) 10Catrope: Disable Education Program namespaces in eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734697 (https://phabricator.wikimedia.org/T294365) (owner: 10Zabe) [18:10:06] (03CR) 10Catrope: [C: 03+2] Disable Education Program namespaces in eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734697 (https://phabricator.wikimedia.org/T294365) (owner: 10Zabe) [18:10:09] (03PS3) 10Jbond: hiera: use lookup() instead of hiera() [puppet] - 10https://gerrit.wikimedia.org/r/734986 (https://phabricator.wikimedia.org/T294435) [18:10:23] (03CR) 10Jbond: "thanks" [puppet] - 10https://gerrit.wikimedia.org/r/734986 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:10:30] (03CR) 10Ottomata: [C: 03+2] Bump Refine version to 0.1.20 to get JsonSchemaConvert map type logic change [puppet] - 10https://gerrit.wikimedia.org/r/735038 (https://phabricator.wikimedia.org/T263466) (owner: 10Ottomata) [18:10:43] !log catrope@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734451|Temporarily change the votewiki lang to fa (T292685)]] (duration: 01m 04s) [18:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:49] (03Merged) 10jenkins-bot: DiscussionTools: Make 'sourcemodetoolbar' available on the beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735014 (owner: 10Bartosz Dziewoński) [18:10:50] T292685: Carry out the 2021 fawiki elections on votewiki - https://phabricator.wikimedia.org/T292685 [18:11:08] (03CR) 10Jbond: [C: 03+2] P:redis::multidc: drop warning (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734977 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:11:27] (03Merged) 10jenkins-bot: Disable Education Program namespaces in eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734697 (https://phabricator.wikimedia.org/T294365) (owner: 10Zabe) [18:12:37] zabe: Your Education Program change is on mwdebug1002, please test [18:12:54] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) >>! In T294271#7460829, @hashar wrote: >start of morning in Dallas will work just fine. Cool, thanks! So, @Papaul maybe you wa... [18:13:04] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:45] RoanKattouw: looks good [18:14:15] (03CR) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:14:21] (03CR) 10BryanDavis: [C: 03+2] toolhub: Add egress to m5-master dbproxy nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/735031 (https://phabricator.wikimedia.org/T294437) (owner: 10BryanDavis) [18:14:42] (03CR) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:14:47] (03CR) 10Jbond: [V: 03+1 C: 04-1] "need to dig into this one more" [puppet] - 10https://gerrit.wikimedia.org/r/735005 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:15:53] !log catrope@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734697|Disable Education Program namespaces in eswiki (T294365)]] (duration: 01m 04s) [18:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:59] T294365: Remove namespace 446 and 447 from eswiki - https://phabricator.wikimedia.org/T294365 [18:16:15] (03CR) 10Jbond: hiera: Add hostname/certname based lookup to secret hierarchy under labs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [18:16:22] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:16:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:37] (03CR) 10Dave Pifke: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/734990 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:16:59] RoanKattouw: thanks for your help :) [18:17:23] (03CR) 10Jbond: [V: 03+1 C: 03+2] "thanks" [puppet] - 10https://gerrit.wikimedia.org/r/734989 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:17:50] Thank you nn1l2 and zabe for being here on time and having your patches in order, that made today's deployment super easy :) [18:18:05] (03PS1) 10Ottomata: Refine: now use -shaded jar [puppet] - 10https://gerrit.wikimedia.org/r/735043 [18:18:25] (03CR) 10Herron: [C: 03+2] rsyslog: centralize remote_syslog_tls lookups into single location in hiera [puppet] - 10https://gerrit.wikimedia.org/r/734401 (https://phabricator.wikimedia.org/T292196) (owner: 10Herron) [18:18:43] (03Merged) 10jenkins-bot: toolhub: Add egress to m5-master dbproxy nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/735031 (https://phabricator.wikimedia.org/T294437) (owner: 10BryanDavis) [18:18:58] (03PS2) 10Ssingh: mstyles no longer needs search access [puppet] - 10https://gerrit.wikimedia.org/r/734886 (owner: 10Ryan Kemper) [18:18:59] Thank you, RoanKattouw [18:19:00] (03PS3) 10Ssingh: admin: add ejoseph to shell, analytics-privatedata-users, cloudelastic-roots [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [18:19:09] (03PS2) 10Ottomata: Refine: now use -shaded jar [puppet] - 10https://gerrit.wikimedia.org/r/735043 [18:19:19] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Refine: now use -shaded jar [puppet] - 10https://gerrit.wikimedia.org/r/735043 (owner: 10Ottomata) [18:22:13] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [18:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:52] (03CR) 10Dzahn: [V: 03+1] "this is on 'C:profile::mediawiki::common': where the compiler picks the examples https://puppet-compiler.wmflabs.org/compiler1002/31960/ a" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:23:12] (03PS2) 10Jbond: C:gitlab: drop undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) [18:23:27] !log bd808@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' . [18:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:55] (03CR) 10jerkins-bot: [V: 04-1] C:gitlab: drop undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:24:04] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:24:43] (03PS3) 10Jbond: C:gitlab: drop undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) [18:24:56] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:25:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:25:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:09] (03CR) 10Jbond: "CI all good now" [puppet] - 10https://gerrit.wikimedia.org/r/735016 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:26:18] !log bd808@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . [18:26:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:49] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:nftables::basefirewall: drop is_ip{4,6}_address as deprecated [puppet] - 10https://gerrit.wikimedia.org/r/734995 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:29:02] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:57] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "looks good to me: UID matches what I see in LDAP, keys in LDAP do NOT match this new production key, user is in corp-LDAP with the same ma" [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [18:33:31] (03CR) 10Ssingh: admin: add ejoseph to shell, analytics-privatedata-users, cloudelastic-roots (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [18:34:03] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "it was important to have Guillaume's approval for cloudelastic-roots but that is on the ticket as well" [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [18:34:21] (03PS2) 10Jbond: R:ceph::keyring: ensure all variables are defined [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) [18:35:32] (03CR) 10Ssingh: [C: 03+2] admin: add ejoseph to shell, analytics-privatedata-users, cloudelastic-roots [puppet] - 10https://gerrit.wikimedia.org/r/734887 (https://phabricator.wikimedia.org/T294379) (owner: 10Ryan Kemper) [18:37:04] (03CR) 10Jbond: R:ceph::keyring: ensure all variables are defined (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:37:15] (03CR) 10Ssingh: [C: 03+2] mstyles no longer needs search access [puppet] - 10https://gerrit.wikimedia.org/r/734886 (owner: 10Ryan Kemper) [18:38:11] (03CR) 10Jbond: R:ceph::keyring: ensure all variables are defined (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735019 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:39:05] (03PS3) 10Jbond: bigtop::mysql_jdbc: ensure package_name variable is always defined [puppet] - 10https://gerrit.wikimedia.org/r/735023 (https://phabricator.wikimedia.org/T294435) [18:41:20] 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production for ejoseph - https://phabricator.wikimedia.org/T294379 (10ssingh) 05Open→03Resolved ` sukhe@krb1001:~$ sudo manage_principals.py create ejoseph --email_address=ejoseph@wikimedia.org Principal successfully created. Make sur... [18:41:44] 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10RobH) [18:41:53] 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10RobH) [18:42:11] 10SRE, 10serviceops, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10Legoktm) a:03Legoktm [18:42:31] (03CR) 10Jbond: [C: 03+2] R:netops::check: drop is_array as its depreacated [puppet] - 10https://gerrit.wikimedia.org/r/734992 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:42:44] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 44.82 ms [18:43:42] (03CR) 10Urbanecm: [C: 03+1] "ready to go any time" [puppet] - 10https://gerrit.wikimedia.org/r/731192 (https://phabricator.wikimedia.org/T293447) (owner: 10Urbanecm) [18:43:47] 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) replace mr1-eqiad - https://phabricator.wikimedia.org/T294474 (10RobH) a:03Jclark-ctr [18:46:20] !log installing python-swiftclient on mw1305 for debugging [18:46:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:19] (03CR) 10Dzahn: [V: 03+1] "validating calendar interval syntax:) V+1" [puppet] - 10https://gerrit.wikimedia.org/r/731192 (https://phabricator.wikimedia.org/T293447) (owner: 10Urbanecm) [18:53:09] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:druid::historical: make sure all variables are defined [puppet] - 10https://gerrit.wikimedia.org/r/735017 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [18:54:16] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:57:43] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:docker::engine: ensure we include all required classes [puppet] - 10https://gerrit.wikimedia.org/r/735026 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:00:04] twentyafterfour and hashar: I, the Bot under the Fountain, call upon thee, The Deployer, to do MediaWiki train - Utc-7+Utc-0 Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1900). [19:00:54] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31962/console" [puppet] - 10https://gerrit.wikimedia.org/r/735028 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:02:04] (03CR) 10Jbond: C:bigtop::hue: ensure all variables are deifned [puppet] - 10https://gerrit.wikimedia.org/r/735028 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:07:05] (03CR) 10Jbond: [C: 04-1] "minor -1 but otherwise lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [19:11:40] ok deploying group1 train. [19:13:37] (03CR) 10Jbond: "im not sure why pcc shows no diff?" [puppet] - 10https://gerrit.wikimedia.org/r/735029 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:14:31] (03CR) 10Jbond: [V: 03+1 C: 03+2] C:aptrepo::rsync: explicitly include aptrepo::rsync [puppet] - 10https://gerrit.wikimedia.org/r/734997 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:19:35] (03CR) 10Dzahn: [V: 03+1 C: 03+2] growthexperiments/updatementeedata: Run updates every three hours [puppet] - 10https://gerrit.wikimedia.org/r/731192 (https://phabricator.wikimedia.org/T293447) (owner: 10Urbanecm) [19:20:00] (03CR) 10Dzahn: "per the "7 minutes for all growth wikis" comment on ticket etc" [puppet] - 10https://gerrit.wikimedia.org/r/731192 (https://phabricator.wikimedia.org/T293447) (owner: 10Urbanecm) [19:20:26] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31963/console" [puppet] - 10https://gerrit.wikimedia.org/r/735012 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:21:59] (03PS2) 10Jbond: R:cassandra::instance::monitoring: make sure cassandra is loaded [puppet] - 10https://gerrit.wikimedia.org/r/735012 (https://phabricator.wikimedia.org/T294435) [19:24:25] (03CR) 10Dzahn: [V: 03+1] mediawiki: allow absenting font packages via hiera, remove fonts from canaries (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [19:24:34] (03PS2) 10Dzahn: wikitech::web: remove font packages from wikitech servers [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) [19:24:44] !log cp5xxx: disabling puppet ahead of digicert unified certificate update rollout [19:24:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:59] (03CR) 10Dzahn: "deployed on mwmaint1002. I saw one timer per DB cluster change schedule." [puppet] - 10https://gerrit.wikimedia.org/r/731192 (https://phabricator.wikimedia.org/T293447) (owner: 10Urbanecm) [19:26:02] (03CR) 10jerkins-bot: [V: 04-1] wikitech::web: remove font packages from wikitech servers [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [19:26:44] (03CR) 10BBlack: [C: 03+2] Switch eqsin to digicert-2021 [puppet] - 10https://gerrit.wikimedia.org/r/735008 (https://phabricator.wikimedia.org/T289507) (owner: 10BBlack) [19:28:06] (03CR) 10Jbond: "PCC shows a diff here but im not sure if its fixing a bug or introducing one?" [puppet] - 10https://gerrit.wikimedia.org/r/735012 (https://phabricator.wikimedia.org/T294435) (owner: 10Jbond) [19:28:55] !log cp5001: switching unified cert to digicert-2021 [19:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:31:50] (03PS1) 1020after4: group1 wikis to 1.38.0-wmf.6 refs T293947 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735051 [19:31:54] (03CR) 1020after4: [C: 03+2] group1 wikis to 1.38.0-wmf.6 refs T293947 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735051 (owner: 1020after4) [19:32:37] (03Merged) 10jenkins-bot: group1 wikis to 1.38.0-wmf.6 refs T293947 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735051 (owner: 1020after4) [19:34:30] !log twentyafterfour@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.6 refs T293947 [19:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:37] T293947: 1.38.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T293947 [19:36:18] !log twentyafterfour@deploy1002 Synchronized php: group1 wikis to 1.38.0-wmf.6 refs T293947 (duration: 01m 47s) [19:36:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:17] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:13] (03PS1) 10BBlack: Make digicert-2021 cert available on cp nodes [puppet] - 10https://gerrit.wikimedia.org/r/735053 (https://phabricator.wikimedia.org/T289507) [19:40:57] (03CR) 10BBlack: [C: 03+2] Make digicert-2021 cert available on cp nodes [puppet] - 10https://gerrit.wikimedia.org/r/735053 (https://phabricator.wikimedia.org/T289507) (owner: 10BBlack) [19:42:28] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:20] !log cp5007: switching unified cert to digicert-2021 [19:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:39] !log cp5xxx: switching unified cert to digicert-2021 [19:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:04] twentyafterfour and hashar: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train - Utc-7+Utc-0 Version . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1900). [20:00:04] chrisalbon and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T2000). [20:25:15] (03PS2) 10Dzahn: add miscweb to LVS [puppet] - 10https://gerrit.wikimedia.org/r/694625 (https://phabricator.wikimedia.org/T281538) [20:30:56] (03PS2) 10BBlack: Switch esams to digicert-2021 [puppet] - 10https://gerrit.wikimedia.org/r/735009 (https://phabricator.wikimedia.org/T289507) [20:42:24] PROBLEM - mediawiki-installation DSH group on mw2255 is CRITICAL: Host mw2255 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [20:45:45] I got that one [20:47:02] (03PS1) 10Reedy: Set $wgSecurePollShowErrorDetail on mwmaint1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735060 (https://phabricator.wikimedia.org/T294489) [20:47:05] jouncebot: nowandnext [20:47:06] For the next 0 hour(s) and 12 minute(s): MediaWiki train - Utc-7+Utc-0 Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T1900) [20:47:06] For the next 0 hour(s) and 12 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T2000) [20:47:06] In 2 hour(s) and 12 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T2300) [20:47:30] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet [20:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:43] !log mw2255 - scap pull, repooling - after DRAC firmware was upgraded - T283582 [20:47:48] (03CR) 10Reedy: [C: 03+2] Set $wgSecurePollShowErrorDetail on mwmaint1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735060 (https://phabricator.wikimedia.org/T294489) (owner: 10Reedy) [20:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:49] T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 [20:47:54] PROBLEM - Ensure local MW versions match expected deployment on mw2255 is CRITICAL: CRITICAL: 658 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [20:48:11] ^ sync in progress [20:48:30] (03Merged) 10jenkins-bot: Set $wgSecurePollShowErrorDetail on mwmaint1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735060 (https://phabricator.wikimedia.org/T294489) (owner: 10Reedy) [20:50:46] cannot delete non-empty directory: php-1.37.0-wmf.1 [20:51:16] RECOVERY - mediawiki-installation DSH group on mw2255 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [20:51:16] RECOVERY - Ensure local MW versions match expected deployment on mw2255 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [20:51:29] !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet [20:51:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:36] !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: T294489 (duration: 01m 59s) [20:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:51:42] T294489: SecurePoll's tally.php throwing GPG error - https://phabricator.wikimedia.org/T294489 [20:52:16] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [20:52:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:53:24] mutante: what's left in it? [20:55:13] 10SRE, 10MW-on-K8s, 10Traffic, 10serviceops, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [20:55:32] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [20:55:36] Spookreeeno: cache/l10n/*.cdb [20:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:55:47] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) 05Open→03Resolved **Production URL **testing (1.929.416 URLs) results in https://people.wikimedia.org/~akosiaris/prod_urls... [20:55:57] mutante: I assume they not much use for an undeployed version [20:56:12] I recall a task might exist about that tbh [20:56:14] Spookreeeno: yea, this is not a new problem, just a recoccuring one [20:56:24] every once in a while someone manually deletes them [20:56:34] there is one copy of each cdb file for each old version [20:57:38] it was more than that one directory [20:57:53] https://phabricator.wikimedia.org/T275826 is what I was thinking of [20:58:16] that's it [20:58:57] not just on deploy hosts, on the actual appservers too [20:59:27] Maybe task needs update? [20:59:48] * Spookreeeno notes it in his mental list for when Miraheze does multiversion [21:03:49] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) Thanks @Papaul ! it's back in service now I am not sure what is next exac... [21:10:58] Spookreeeno: well, I left a comment at least [21:11:17] not very high prio with the move to k8s I guess [21:12:03] Ye I guess [21:31:44] (03PS5) 10Juan90264: Enable talk for mobile users on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/732705 (https://phabricator.wikimedia.org/T293946) [21:38:08] !log run namespaceDupes.php for a bunch of Wikipedias [21:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:42:26] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist wikipedia namespaceDupes.php --fix | tee namespacedupes-wikipedia-real.log # run namespaceDupes.php for all Wikipedias [21:42:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:29] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn thank you. I think it is best to just close this task and go "on d... [21:47:31] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Papaul) @Dzahn Next week Monday 1st at 9:30 am CT [21:57:11] (03CR) 10Cwhite: [C: 03+2] "PCC checks out: https://puppet-compiler.wmflabs.org/compiler1002/31965/" [puppet] - 10https://gerrit.wikimedia.org/r/732827 (https://phabricator.wikimedia.org/T240685) (owner: 10Cwhite) [21:57:28] (03PS1) 10Reedy: Pass ->restrict( Shell::RESTRICT_NONE ) to GPG Shell Command [extensions/SecurePoll] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734918 (https://phabricator.wikimedia.org/T294489) [21:59:08] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10hashar) It is an holiday here in France (All-saints) , then I am not critical to the DRAC upgrade ;) I will make arrangement, it will b... [22:00:42] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) @hashar I am wondering if you need me around (for mgmt access / root / +2 ). I have a request to be off that day but it's not sur... [22:01:42] (03CR) 10Reedy: [C: 03+2] Pass ->restrict( Shell::RESTRICT_NONE ) to GPG Shell Command [extensions/SecurePoll] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734918 (https://phabricator.wikimedia.org/T294489) (owner: 10Reedy) [22:03:10] 10SRE, 10SRE-swift-storage: The file "XXX" is in an inconsistent state within the internal storage backends - https://phabricator.wikimedia.org/T291137 (10Urbanecm) >>! In T291137#7453171, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://sal.toolforge.org/log/Yy7Tr... [22:04:51] (03Merged) 10jenkins-bot: Pass ->restrict( Shell::RESTRICT_NONE ) to GPG Shell Command [extensions/SecurePoll] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/734918 (https://phabricator.wikimedia.org/T294489) (owner: 10Reedy) [22:06:43] !log reedy@deploy1002 Synchronized php-1.38.0-wmf.6/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: T294489 (duration: 01m 15s) [22:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:06:49] T294489: SecurePoll's tally.php throwing GPG error - https://phabricator.wikimedia.org/T294489 [22:10:32] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [22:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:38] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [22:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:28] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) 05Open→03Resolved a:03Dzahn I agree and boldly resolve it, expecting... [22:31:23] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) After re-thinking this and chatting some more on IRC I now think we should not do this and close my own request as invalid. It's... [22:38:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [22:38:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:48] 10ops-codfw, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [22:38:56] (03PS2) 10Gergő Tisza: Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) [22:39:13] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) 05Open→03Declined Suggesting to do this once T256422 is resolved or T294276 or CI does not run on contint* servers anymore, w... [22:39:54] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) Be bold and reopen if you really think otherwise. [22:41:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [22:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:43] 10SRE, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) P.S. The actual "contint2001.mgmt" alert in Icinga is actually quite some time ago.. not worth it. but there are other alerts (IP... [22:55:30] 10SRE, 10Traffic, 10observability, 10Discovery-Search (Current work): flapping icinga Letsencrypt TLS cert alerts around renewal time - https://phabricator.wikimedia.org/T293826 (10Dzahn) @elukey Not list* but we could potentially test it with librenms.wikimedia.org. That fulfills the requirements of "uses... [23:00:05] RoanKattouw and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211027T2300). [23:00:05] Juan_90264 and tgr: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:20] o/ [23:01:27] 10SRE, 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10Release-Engineering-Team (Radar): logstash-beta.wmflab throws multiple "Error: Could not locate that visualization" - https://phabricator.wikimedia.org/T204845 (10colewhite) 05Open→03Invalid There has been no DBQuery dashb... [23:01:46] I can do the deployment today but I'm not at my desk right now, I'll be back in 10-15 minutes [23:02:23] I'm present! [23:02:36] Let's start? [23:03:05] Urbanecm: ? [23:03:50] RoanKattouw: ? [23:04:36] RoanKattouw seconds before you joined said "I can do the deployment today but I'm not at my desk right now, I'll be back in 10-15 minutes" [23:05:00] PROBLEM - MariaDB Replica IO: s6 on db2141 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2129.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:06:03] Okay perryprog [23:06:21] Waiting :) [23:07:05] (03CR) 10Dzahn: [V: 03+1] mediawiki: allow absenting font packages via hiera, remove fonts from canaries (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [23:07:52] I can do it if Roan does not have time / prefers not to be voluntold today, I'm just preoccupied for a few more minutes [23:08:04] (03PS7) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [23:10:50] Great tgr [23:11:16] RECOVERY - MariaDB Replica IO: s6 on db2141 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:13:53] (03PS8) 10Dzahn: mediawiki: allow absenting font packages via hiera, remove fonts from canaries [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [23:13:55] (03PS3) 10Dzahn: wikitech::web: remove font packages from wikitech servers [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) [23:15:00] (03CR) 10Juan90264: [C: 03+1] "LGTM :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735020 (https://phabricator.wikimedia.org/T294447) (owner: 10Inductiveload) [23:16:10] (03CR) 10jerkins-bot: [V: 04-1] wikitech::web: remove font packages from wikitech servers [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [23:17:34] PROBLEM - MariaDB Replica IO: s6 on db2141 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2013, Errmsg: error reconnecting to master repl@db2129.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Lost connection to MySQL server at reading authorization packet, system error: 71 Protocol error https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:18:29] tgr: If Roan doesn't show up, at 23:30 UTC do you deploy? [23:18:31] Juan_90264: You have requested deployment of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/735020 , but not of its dependency https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/730507 . Do you want me to deploy both? [23:19:40] RECOVERY - MariaDB Replica IO: s6 on db2141 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:19:47] Yes Roan [23:19:47] (03PS9) 10Dzahn: mediawiki: allow absenting font packages via hiera [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) [23:19:54] PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1281.19 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:20:49] RoanKattouw: Yes, but starting with https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/735020 [23:21:29] That's not possible, 735020 depends on 730507, so 730507 has to go first [23:22:24] (03PS5) 10Catrope: Add mobile wordmark for Meetei (Manipuri) Wikipedia to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734361 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [23:22:41] (03CR) 10Catrope: [C: 03+2] Add mobile wordmark for Meetei (Manipuri) Wikipedia to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734361 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [23:23:36] (03Merged) 10jenkins-bot: Add mobile wordmark for Meetei (Manipuri) Wikipedia to config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734361 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [23:24:06] Perfect [23:24:12] RoanKattouw: So forget about deploying 735020, to take a closer look [23:26:13] mwdebug1001 ou 1002? [23:26:13] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:26:17] *or [23:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:59] !log catrope@deploy1002 Synchronized static/images/mobile/copyright/wikipedia-wordmark-mni.svg: Config: [[gerrit:734361|Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189)]] (duration: 01m 03s) [23:27:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:08] T294189: Add a mobile logo for the Meetei (Manipuri) Wikipedia - https://phabricator.wikimedia.org/T294189 [23:27:19] At first I thought it was just an image file so I deployed it immediately, oops [23:27:26] I don't think new images are testable on mwdebug [23:28:04] (03CR) 10Dzahn: [C: 04-2] "depends on previous change, and should consists of just the Hiera line. will solve the rebase mess once the parent is merged (WIP)" [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [23:28:05] Now deploying the InitialiseSettings.php change, and then it'll hopefully work in production [23:28:06] !log catrope@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734361|Add mobile wordmark for Meetei (Manipuri) Wikipedia to config (T294189)]] (duration: 01m 02s) [23:28:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:13] There we go, please test in production now [23:28:40] Sometimes these changes need a cache purge, but I don't think that this would because it's a new image file, not a change to an existing one [23:29:23] RoanKattouw: I also think it won't be necessary to purge [23:29:30] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:29:32] Having taken a look at the copy-upload patches, I think those are fine, I'll just deploy them both [23:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:51] (03PS4) 10Catrope: Allow copy-upload (by URL) for Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730507 (https://phabricator.wikimedia.org/T293205) (owner: 10Inductiveload) [23:29:56] (03PS2) 10Catrope: enwikisource: Enable copy-upload for autoconfirmed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735020 (https://phabricator.wikimedia.org/T294447) (owner: 10Inductiveload) [23:30:06] (03CR) 10Catrope: [C: 03+2] Allow copy-upload (by URL) for Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730507 (https://phabricator.wikimedia.org/T293205) (owner: 10Inductiveload) [23:30:11] (03CR) 10Catrope: [C: 03+2] enwikisource: Enable copy-upload for autoconfirmed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735020 (https://phabricator.wikimedia.org/T294447) (owner: 10Inductiveload) [23:30:11] images can be tested on mwdebug IIRC, although it wouldn't catch cache issues [23:30:39] RoanKattouw: It's already up and running at https://mni.m.wikipedia.org/wiki/ꯃꯔꯨꯑꯣꯏꯕ_ꯂꯃꯥꯏ [23:30:50] Nice! [23:31:22] (03Merged) 10jenkins-bot: Allow copy-upload (by URL) for Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730507 (https://phabricator.wikimedia.org/T293205) (owner: 10Inductiveload) [23:31:32] (03Merged) 10jenkins-bot: enwikisource: Enable copy-upload for autoconfirmed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735020 (https://phabricator.wikimedia.org/T294447) (owner: 10Inductiveload) [23:32:27] Juan_90264: The copy-upload patches are now on mwdebug1002, please test [23:32:40] Okay tgr [23:33:16] RoanKattouw: Yes, I will testing [23:33:59] (03PS1) 10Dzahn: mediawiki: purge font packages from mwdebug1001 [puppet] - 10https://gerrit.wikimedia.org/r/735075 (https://phabricator.wikimedia.org/T294378) [23:37:14] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [23:38:35] RoanKattouw: I tested and approved [23:38:41] OK great, deploying [23:39:01] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:18] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [23:40:31] (03PS3) 10Catrope: Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) (owner: 10Gergő Tisza) [23:40:47] !log catrope@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Allow upload by URL for Wikisources (T293205), and enable it on enwikisource for autoconfirmed (T294447) (duration: 01m 03s) [23:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:54] T294447: English Wikisource: Enable upload-by-url for autoconfirmed users - https://phabricator.wikimedia.org/T294447 [23:40:54] T293205: Wikisource config: Allow copy-uploads to Wikisources - https://phabricator.wikimedia.org/T293205 [23:41:38] (03CR) 10Catrope: [C: 03+2] Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) (owner: 10Gergő Tisza) [23:42:25] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:42:27] (03Merged) 10jenkins-bot: Enable GrowthExperiments Add Image feature on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731880 (https://phabricator.wikimedia.org/T290949) (owner: 10Gergő Tisza) [23:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:45:01] tgr: Your GE config patch is on mwdebug1002, please test [23:46:15] (03CR) 10Legoktm: [C: 03+1] mediawiki: allow absenting font packages via hiera [puppet] - 10https://gerrit.wikimedia.org/r/734798 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [23:46:21] (03CR) 10Legoktm: [C: 03+1] mediawiki: purge font packages from mwdebug1001 [puppet] - 10https://gerrit.wikimedia.org/r/735075 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [23:46:26] RoanKattouw: Both changes working correctly, thanks for your time :) [23:46:42] Of course, happy to help! [23:47:04] And thank you for writing config patches [23:48:20] "HTTP request timed out." [23:48:43] I guess it needs some kind of proxy setting? [23:51:09] Yeah app servers can't access the internet without a proxy [23:51:17] I forget how to configure this [23:51:52] according to the docs... [23:51:53] httpRequestFactory->create( [23:51:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:52:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:06] alias set_proxy="export http_proxy=http://webproxy.eqiad.wmnet:8080; export HTTPS_PROXY=http://webproxy.eqiad.wmnet:8080;" [23:52:06] * - proxy The proxy to use. [23:52:06] * Otherwise it will use $wgHTTPProxy or $wgLocalHTTPProxy (if set) [23:52:06] * Otherwise it will use the environment variable "http_proxy" (if set) [23:52:37] $webproxy = "http://webproxy.${::site}.wmnet:8080" [23:52:49] yeah, but shouldn't that just work? [23:53:02] since it's in $wgHTTPProxy already [23:53:22] what's the context? [23:53:30] group0 uses the new LocalHTTPProxy thing [23:53:42] >>> $wgLocalHTTPProxy [23:53:42] => "http://localhost:6501" [23:54:00] so some sort of auto-configuration exists [23:54:15] I mean, what's trying to make requests? [23:54:15] Nothing is listening on port 6501 on mwdebug1002 [23:54:21] hm [23:54:43] GrowthExperiment's ServiceImageRecommendationProvider is what should've been enabled [23:55:13] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [23:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:39] wfm? [23:55:40] legoktm@mwdebug1002:~$ curl -H 'Host: meta.wikimedia.org' 'http://localhost:6501/w/api.php?action=query&format=json' [23:55:40] {"batchcomplete":""} [23:55:58] Oh huh [23:56:17] I had just quickly tried to see if there was anything listening at all [23:56:20] catrope@mwdebug1002:~$ nc -vvv localhost 6501 [23:56:21] localhost [127.0.0.1] 6501 (?) : Connection refused [23:56:21] sent 0, rcvd 0 [23:56:31] some people have the export HTTP_PROXY settings in their dot files and others dont [23:56:45] wait [23:56:55] But you're right that curl works [23:56:56] is this trying to talk to a wmcloud.org domain? [23:57:00] Yes it is [23:57:06] that most certainly won't work [23:57:10] Are those prohibited from using the proxy? [23:57:19] yeah, seems like that would use $wgHTTPProxy, not the local one [23:57:38] we don't allow this, it's blocked at multiple levels [23:58:35] there's no global $wgHTTPProxy set, on purpose. $wgLocalHTTPProxy is for internal requests to other MW wikis (e.g. InstantCommons/GlobalUserPage/cross-wiki notifs) [23:59:16] so would that need a dedicated proxy like urldownloader does?