[04:16:35] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM overall but drop the proxyfetch checks." [puppet] - 10https://gerrit.wikimedia.org/r/721904 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:16:57] (03CR) 10Giuseppe Lavagetto: [C: 03+1] service: Switch new Shellboxes to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/721905 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:17:10] (03CR) 10Giuseppe Lavagetto: [C: 03+1] service: Switch new Shellboxes to monitoring_setup [puppet] - 10https://gerrit.wikimedia.org/r/721906 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:17:24] (03CR) 10Giuseppe Lavagetto: [C: 03+1] service: Switch new Shellboxes to production [puppet] - 10https://gerrit.wikimedia.org/r/721907 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:21:47] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "As long as you add the IPs to netbox following https://wikitech.wikimedia.org/wiki/DNS/Netbox#How_to_manually_allocate_a_special_purpose_I" [dns] - 10https://gerrit.wikimedia.org/r/721908 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:23:15] (03PS2) 10Legoktm: Add LVS for new Shellboxes: media, syntaxhighlight & timeline [puppet] - 10https://gerrit.wikimedia.org/r/721904 (https://phabricator.wikimedia.org/T289226) [04:23:17] (03PS2) 10Legoktm: service: Switch new Shellboxes to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/721905 (https://phabricator.wikimedia.org/T289226) [04:23:19] (03PS2) 10Legoktm: service: Switch new Shellboxes to monitoring_setup [puppet] - 10https://gerrit.wikimedia.org/r/721906 (https://phabricator.wikimedia.org/T289226) [04:23:21] (03PS2) 10Legoktm: service: Switch new Shellboxes to production [puppet] - 10https://gerrit.wikimedia.org/r/721907 (https://phabricator.wikimedia.org/T289226) [04:25:09] (03PS1) 10Legoktm: service: Remove ProxyFetch checks for shellbox & shellbox-constraints [puppet] - 10https://gerrit.wikimedia.org/r/722732 [04:25:37] (03CR) 10Legoktm: Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/721908 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:27:04] (03CR) 10Legoktm: Add LVS for new Shellboxes: media, syntaxhighlight & timeline (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721904 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:36:04] (03CR) 10Legoktm: Add wcqs.svc.{codfw,eqiad}.wmnet (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/713929 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [04:37:48] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31182/console" [puppet] - 10https://gerrit.wikimedia.org/r/721904 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:38:27] (03PS2) 10Legoktm: Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes [dns] - 10https://gerrit.wikimedia.org/r/721908 (https://phabricator.wikimedia.org/T289226) [04:38:29] (03PS2) 10Legoktm: Add new Shellboxes to discovery [dns] - 10https://gerrit.wikimedia.org/r/721909 (https://phabricator.wikimedia.org/T289226) [04:39:06] (03CR) 10Legoktm: "Fixed in Change-Id: Id52c569f05a8c1245e238f6b3ca4456408c7384f" [dns] - 10https://gerrit.wikimedia.org/r/713929 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [04:43:57] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes [dns] - 10https://gerrit.wikimedia.org/r/721908 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:45:45] (03CR) 10Legoktm: [C: 03+2] Add *.svc.{codfw,eqiad}.wmnet entries for new Shellboxes [dns] - 10https://gerrit.wikimedia.org/r/721908 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:48:30] !log ran authdns-update for adding new shellbox svc entries https://gerrit.wikimedia.org/r/721908 [04:48:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:54:38] (03CR) 10Legoktm: [C: 03+2] Add LVS for new Shellboxes: media, syntaxhighlight & timeline [puppet] - 10https://gerrit.wikimedia.org/r/721904 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [04:59:05] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31184/console" [puppet] - 10https://gerrit.wikimedia.org/r/721905 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [05:01:32] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add KCVelaga [puppet] - 10https://gerrit.wikimedia.org/r/722598 (https://phabricator.wikimedia.org/T291475) (owner: 10Marostegui) [05:03:22] marostegui: could you merge in my change too please? [05:03:39] yes! [05:03:50] legoktm: done! [05:03:53] ty [05:03:54] <_joe_> marostegui: you're late today heh [05:04:11] <_joe_> the usual slacker [05:04:53] I know, I slept in today! [05:09:43] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for KCVelaga_(wikimf) - https://phabricator.wikimedia.org/T291475 (10Marostegui) 05Open→03Resolved a:03Marostegui Access granted! [05:10:37] 10SRE, 10LDAP-Access-Requests: Grant Access to LDAP/WMF for CBlanton - https://phabricator.wikimedia.org/T291518 (10Marostegui) p:05Triage→03Medium a:05dr0ptp4kt→03Marostegui [05:10:51] (03PS3) 10Legoktm: service: Switch new Shellboxes to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/721905 (https://phabricator.wikimedia.org/T289226) [05:10:53] (03PS3) 10Legoktm: service: Switch new Shellboxes to monitoring_setup [puppet] - 10https://gerrit.wikimedia.org/r/721906 (https://phabricator.wikimedia.org/T289226) [05:10:55] (03PS3) 10Legoktm: service: Switch new Shellboxes to production [puppet] - 10https://gerrit.wikimedia.org/r/721907 (https://phabricator.wikimedia.org/T289226) [05:10:57] (03PS2) 10Legoktm: service: Remove ProxyFetch checks for shellbox & shellbox-constraints [puppet] - 10https://gerrit.wikimedia.org/r/722732 [05:11:13] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for KCVelaga_(wikimf) - https://phabricator.wikimedia.org/T291475 (10KCVelaga_WMF) Thanks @Marostegui for quickly processing this. [05:11:27] (03CR) 10Legoktm: [C: 03+2] service: Switch new Shellboxes to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/721905 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [05:12:55] !log sudo cumin 'O:lvs::balancer' 'run-puppet-agent' [05:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:15] icinga is going to alert about PyBal diffs soon [05:13:41] <_joe_> that's ok and expected [05:14:45] puppet finished [05:16:21] (03PS1) 10Marostegui: data.yaml: Add Cai Blanton to ldap users [puppet] - 10https://gerrit.wikimedia.org/r/722733 (https://phabricator.wikimedia.org/T291518) [05:16:50] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) p:05Triage→03Medium [05:17:53] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) [05:17:54] !log restarting pybal on lvs1016 [05:17:56] PROBLEM - PyBal connections to etcd on lvs2010 is CRITICAL: CRITICAL: 80 connections established with conf2004.codfw.wmnet:4001 (min=83) https://wikitech.wikimedia.org/wiki/PyBal [05:17:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:05] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) L3 signed: Aug 16 2021, 23:11 [05:18:16] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) [05:18:20] PROBLEM - PyBal connections to etcd on lvs1015 is CRITICAL: CRITICAL: 68 connections established with conf1004.eqiad.wmnet:4001 (min=71) https://wikitech.wikimedia.org/wiki/PyBal [05:18:38] PROBLEM - PyBal IPVS diff check on lvs2009 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.65:4014, 10.2.1.64:4015, 10.2.1.66:4012]) https://wikitech.wikimedia.org/wiki/PyBal [05:19:40] (03CR) 10Effie Mouzeli: [C: 04-1] "Thank you for this work Petr! I think first of all we need to split this patch into 3 patches; starting with the common_templates one, the" [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [05:20:58] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) This needs approval from @Ottomata and @DannyH [05:22:04] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) I assume this also needs `analytics-privatedata-access` group with no SSH and no kerberos. [05:22:18] PROBLEM - PyBal IPVS diff check on lvs2010 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.65:4014, 10.2.1.64:4015, 10.2.1.66:4012]) https://wikitech.wikimedia.org/wiki/PyBal [05:22:45] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) [05:23:00] !log restarting pybal on lvs1015 [05:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:27:29] !log restarting pybal on lvs2010 [05:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:06] RECOVERY - PyBal connections to etcd on lvs2010 is OK: OK: 83 connections established with conf2004.codfw.wmnet:4001 (min=83) https://wikitech.wikimedia.org/wiki/PyBal [05:28:32] RECOVERY - PyBal connections to etcd on lvs1015 is OK: OK: 71 connections established with conf1004.eqiad.wmnet:4001 (min=71) https://wikitech.wikimedia.org/wiki/PyBal [05:31:13] !log restarting pybal on lvs2009 [05:31:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:32:36] RECOVERY - PyBal IPVS diff check on lvs2010 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [05:32:59] (03CR) 10Legoktm: [C: 03+2] service: Switch new Shellboxes to monitoring_setup [puppet] - 10https://gerrit.wikimedia.org/r/721906 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [05:33:06] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/713959 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [05:34:14] RECOVERY - PyBal IPVS diff check on lvs2009 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal [05:39:24] PROBLEM - SSH on bast5002 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [05:41:36] RECOVERY - SSH on bast5002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [05:41:44] (03CR) 10Legoktm: [C: 03+2] service: Switch new Shellboxes to production [puppet] - 10https://gerrit.wikimedia.org/r/721907 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [05:45:40] (03CR) 10Legoktm: [C: 03+2] Add new Shellboxes to discovery [dns] - 10https://gerrit.wikimedia.org/r/721909 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [05:47:52] !log legoktm@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=shellbox-media [05:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:48:29] !log legoktm@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline [05:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:48:43] !log legoktm@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=shellbox-syntaxhighlight [05:48:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:49:25] all done adding the new shellbox LVSes :) [05:51:43] (03PS3) 10Giuseppe Lavagetto: service::catalog: remove ProxyFetch checks from services on k8s [puppet] - 10https://gerrit.wikimedia.org/r/722278 [05:52:47] (03Abandoned) 10Legoktm: service: Remove ProxyFetch checks for shellbox & shellbox-constraints [puppet] - 10https://gerrit.wikimedia.org/r/722732 (owner: 10Legoktm) [05:56:44] (03PS1) 10Legoktm: services_proxy: Add envoy proxies for new Shellboxes [puppet] - 10https://gerrit.wikimedia.org/r/722736 (https://phabricator.wikimedia.org/T289226) [05:58:46] (03PS1) 10Legoktm: ProductionServices: Add new Shellboxes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722737 (https://phabricator.wikimedia.org/T289226) [06:00:01] (03CR) 10Legoktm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31186/console" [puppet] - 10https://gerrit.wikimedia.org/r/722736 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [06:00:15] (03CR) 10jerkins-bot: [V: 04-1] ProductionServices: Add new Shellboxes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722737 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [06:01:30] (03PS2) 10Legoktm: ProductionServices: Add new Shellboxes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722737 (https://phabricator.wikimedia.org/T289226) [06:02:13] !log update pcc facts [06:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:02:26] (03CR) 10Legoktm: [V: 03+1 C: 03+2] services_proxy: Add envoy proxies for new Shellboxes [puppet] - 10https://gerrit.wikimedia.org/r/722736 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [06:04:57] (03CR) 10Effie Mouzeli: [C: 03+2] scaffold: add more options for PHP [deployment-charts] - 10https://gerrit.wikimedia.org/r/719973 (owner: 10Effie Mouzeli) [06:06:09] (03CR) 10Elukey: "Ping again to see if we can progress this :)" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/708094 (owner: 10R4q3NWnUx2CEhVyr) [06:08:52] (03Merged) 10jenkins-bot: scaffold: add more options for PHP [deployment-charts] - 10https://gerrit.wikimedia.org/r/719973 (owner: 10Effie Mouzeli) [06:43:03] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31191/console" [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [06:43:26] (03CR) 10Elukey: [V: 03+1] "I missed profile::kubernetes::deployment_server::services for the releasesXXXX hosts, going to add it!" [puppet] - 10https://gerrit.wikimedia.org/r/720048 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [06:44:17] (03PS31) 10Elukey: kubernetes: add revscoring-editquality in the services configs [puppet] - 10https://gerrit.wikimedia.org/r/720048 (https://phabricator.wikimedia.org/T286791) [06:47:15] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31193/console" [puppet] - 10https://gerrit.wikimedia.org/r/720048 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [06:57:05] (03PS1) 10Muehlenhoff: dhcp: Switch mx1001 to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/722816 (https://phabricator.wikimedia.org/T286911) [07:00:33] (03CR) 10Muehlenhoff: [C: 03+2] dhcp: Switch mx1001 to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/722816 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [07:02:14] Hi everybody, I am going to make some changes to helmfile's secrets, please sync with me first if you need to deploy on k8s via helmfile :) [07:06:36] 10Puppet, 10Infrastructure-Foundations: Puppetdb: audit existing configuration - https://phabricator.wikimedia.org/T291538 (10Volans) p:05Triage→03Medium [07:22:19] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10Volans) p:05Triage→03Medium [07:22:28] 10Puppet, 10Infrastructure-Foundations: Puppetdb: not refreshed on config change? - https://phabricator.wikimedia.org/T291540 (10Volans) p:05Triage→03Medium [07:22:47] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10Volans) I've opened T291540 for the more general case. [07:27:02] 10Puppet, 10Infrastructure-Foundations: Host distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10Volans) p:05Triage→03Medium [07:30:08] 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Cookbooks repository: avoid stale code in master branch - https://phabricator.wikimedia.org/T287465 (10Volans) 05Open→03Resolved AFAIK it's all working fine, resolving, feel free to re-open if you encounter any issue. [07:32:05] 10Puppet, 10Infrastructure-Foundations: Hosts distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10Volans) [07:34:50] (03PS1) 10Marostegui: dumps-*-m5.sql: Remove labswiki grants [puppet] - 10https://gerrit.wikimedia.org/r/722817 (https://phabricator.wikimedia.org/T167973) [07:46:47] (03PS32) 10Elukey: kubernetes: add revscoring-editquality in the services configs [puppet] - 10https://gerrit.wikimedia.org/r/720048 (https://phabricator.wikimedia.org/T286791) [07:50:53] (03CR) 10Elukey: [C: 03+2] kubernetes: add revscoring-editquality in the services configs [puppet] - 10https://gerrit.wikimedia.org/r/720048 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [07:59:39] (03PS1) 10Elukey: Add missing dir declaration to helmfile private configurations [puppet] - 10https://gerrit.wikimedia.org/r/722818 (https://phabricator.wikimedia.org/T286791) [07:59:41] (03PS1) 10Hashar: logging: send DuplicateParse bucket to Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722819 [08:01:05] (03CR) 10Hashar: "We only send INFO level and more to Logstash but the DuplicateParse messages are send at the DEBUG level. Fixed by https://gerrit.wikimedi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722601 (owner: 10Hashar) [08:02:50] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/722733 (https://phabricator.wikimedia.org/T291518) (owner: 10Marostegui) [08:03:39] hi, could anyone check a logging config tweak I have made to send some events to Logstash please? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/722819 :] [08:03:52] or well I can self deploy [08:03:53] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31194/console" [puppet] - 10https://gerrit.wikimedia.org/r/722818 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [08:03:58] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10Volans) Has been a while since we discussed this but the problem still stands and I think we need to get some progress here. What would be the best way... [08:04:56] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add missing dir declaration to helmfile private configurations [puppet] - 10https://gerrit.wikimedia.org/r/722818 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [08:06:35] (03PS2) 10Jcrespo: dbbackups: Switch s1 backup generation from db2097 to db2141 [puppet] - 10https://gerrit.wikimedia.org/r/721285 (https://phabricator.wikimedia.org/T290865) [08:06:37] (03PS1) 10Jcrespo: dbbackups: Remove dump grants from m5 for labswiki [puppet] - 10https://gerrit.wikimedia.org/r/722820 (https://phabricator.wikimedia.org/T167973) [08:07:39] (03Abandoned) 10Jcrespo: dbbackups: Remove dump grants from m5 for labswiki [puppet] - 10https://gerrit.wikimedia.org/r/722820 (https://phabricator.wikimedia.org/T167973) (owner: 10Jcrespo) [08:09:57] (03CR) 10Effie Mouzeli: [C: 03+1] service::catalog: remove ProxyFetch checks from services on k8s [puppet] - 10https://gerrit.wikimedia.org/r/722278 (owner: 10Giuseppe Lavagetto) [08:10:06] (03CR) 10Jcrespo: [C: 03+1] "I've cleaned up the grants on the right servers already." [puppet] - 10https://gerrit.wikimedia.org/r/722817 (https://phabricator.wikimedia.org/T167973) (owner: 10Marostegui) [08:13:23] (03CR) 10Marostegui: dumps-*-m5.sql: Remove labswiki grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722817 (https://phabricator.wikimedia.org/T167973) (owner: 10Marostegui) [08:13:26] (03CR) 10Marostegui: [C: 03+2] dumps-*-m5.sql: Remove labswiki grants [puppet] - 10https://gerrit.wikimedia.org/r/722817 (https://phabricator.wikimedia.org/T167973) (owner: 10Marostegui) [08:18:16] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add Cai Blanton to ldap users [puppet] - 10https://gerrit.wikimedia.org/r/722733 (https://phabricator.wikimedia.org/T291518) (owner: 10Marostegui) [08:21:21] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to LDAP/WMF for CBlanton - https://phabricator.wikimedia.org/T291518 (10Marostegui) 05Open→03Resolved This has been granted [08:22:32] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki: Add request ID to php-wmerrors error page [puppet] - 10https://gerrit.wikimedia.org/r/721923 (https://phabricator.wikimedia.org/T291192) (owner: 10Krinkle) [08:29:26] (03CR) 10Volans: [C: 03+2] "I think we got enough consensus to move forward. I'd be happy to address any follow up comments." [software/spicerack] - 10https://gerrit.wikimedia.org/r/720993 (owner: 10Volans) [08:29:50] (03CR) 10Volans: [C: 03+2] "Trivial, self-merging" [software/spicerack] - 10https://gerrit.wikimedia.org/r/722376 (owner: 10Volans) [08:30:44] (03CR) 10Volans: [C: 03+2] dhcp: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720994 (owner: 10Volans) [08:31:13] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/720993 (owner: 10Volans) [08:31:15] (03PS1) 10Elukey: role::deployment_server: add admin_ng service config [puppet] - 10https://gerrit.wikimedia.org/r/722824 [08:31:32] (03CR) 10Volans: [C: 03+2] icinga: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720995 (owner: 10Volans) [08:31:54] (03CR) 10Volans: [C: 03+2] puppet: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720996 (owner: 10Volans) [08:33:01] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31195/console" [puppet] - 10https://gerrit.wikimedia.org/r/722824 (owner: 10Elukey) [08:34:04] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) [08:34:26] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) Thank you @ssastry, I updated the task descr to include them [08:35:36] (03PS2) 10Elukey: role::deployment_server: add admin_ng service config [puppet] - 10https://gerrit.wikimedia.org/r/722824 [08:36:13] (03Merged) 10jenkins-bot: pylint: fix newly reported issue [software/spicerack] - 10https://gerrit.wikimedia.org/r/722376 (owner: 10Volans) [08:36:15] (03Merged) 10jenkins-bot: remote: add support to enable/disable Cumin output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720993 (owner: 10Volans) [08:36:19] (03Merged) 10jenkins-bot: dhcp: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720994 (owner: 10Volans) [08:37:24] (03Merged) 10jenkins-bot: icinga: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720995 (owner: 10Volans) [08:37:42] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (NOOP 2 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31196/console" [puppet] - 10https://gerrit.wikimedia.org/r/722824 (owner: 10Elukey) [08:37:50] (03PS2) 10Arturo Borrero Gonzalez: openstack: manila: install manila-data package [puppet] - 10https://gerrit.wikimedia.org/r/722645 (https://phabricator.wikimedia.org/T291257) [08:38:44] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila: install manila-data package [puppet] - 10https://gerrit.wikimedia.org/r/722645 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [08:39:12] (03Merged) 10jenkins-bot: puppet: reduce verbosity of Cumin's output [software/spicerack] - 10https://gerrit.wikimedia.org/r/720996 (owner: 10Volans) [08:41:10] 10SRE, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) I see a huge improvement on the "stability" (if you... [08:45:04] (03PS1) 10Jgiannelos: WIP: Send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 [08:45:53] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) natalia-rodriguez is on the `wmf` ldap group. [08:46:11] !log upgrade php7.2 on api-canaries and restart service - T291052 [08:46:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:17] T291052: Deploy PHP patch for DOM replaceChild/removeChild performance - https://phabricator.wikimedia.org/T291052 [08:46:34] (03CR) 10Jgiannelos: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/722825 (owner: 10Jgiannelos) [08:50:13] (03PS4) 10Elukey: helmfile.d: move private dirs to the new format [deployment-charts] - 10https://gerrit.wikimedia.org/r/722276 (https://phabricator.wikimedia.org/T286791) [08:50:15] (03PS19) 10Elukey: Add revscoring-editquality as first ml-service to helmfile.d [deployment-charts] - 10https://gerrit.wikimedia.org/r/719128 (https://phabricator.wikimedia.org/T286791) [08:50:17] (03PS17) 10Elukey: Rakefile: change HELMFILE_GLOB to include ml-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/719522 (https://phabricator.wikimedia.org/T286791) [08:50:19] (03PS11) 10Elukey: helmfile: add the ability to inject labels to Namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/720997 (https://phabricator.wikimedia.org/T290476) [08:50:21] (03PS7) 10Elukey: kubeflow-kfserving: move Namespace creation to helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/721268 (https://phabricator.wikimedia.org/T288829) [08:51:57] (03PS5) 10Elukey: helmfile.d: move private dirs to the new format [deployment-charts] - 10https://gerrit.wikimedia.org/r/722276 (https://phabricator.wikimedia.org/T286791) [08:51:59] (03PS20) 10Elukey: Add revscoring-editquality as first ml-service to helmfile.d [deployment-charts] - 10https://gerrit.wikimedia.org/r/719128 (https://phabricator.wikimedia.org/T286791) [08:52:01] (03PS18) 10Elukey: Rakefile: change HELMFILE_GLOB to include ml-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/719522 (https://phabricator.wikimedia.org/T286791) [08:52:02] sorryyy for the spam [08:52:03] (03PS12) 10Elukey: helmfile: add the ability to inject labels to Namespaces [deployment-charts] - 10https://gerrit.wikimedia.org/r/720997 (https://phabricator.wikimedia.org/T290476) [08:52:05] (03PS8) 10Elukey: kubeflow-kfserving: move Namespace creation to helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/721268 (https://phabricator.wikimedia.org/T288829) [09:03:48] (03Abandoned) 10Elukey: role::deployment_server: add admin_ng service config [puppet] - 10https://gerrit.wikimedia.org/r/722824 (owner: 10Elukey) [09:05:05] finalizing in a couple of mins the helmfile maintenance [09:06:15] (03CR) 10Elukey: [C: 03+2] helmfile.d: move private dirs to the new format [deployment-charts] - 10https://gerrit.wikimedia.org/r/722276 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [09:11:14] maintenance should be over! [09:14:26] (03CR) 10DCausse: "Thanks for the review!" [alerts] - 10https://gerrit.wikimedia.org/r/720066 (https://phabricator.wikimedia.org/T276467) (owner: 10DCausse) [09:14:50] (03PS4) 10DCausse: search-platform: add flink alerts [alerts] - 10https://gerrit.wikimedia.org/r/720066 (https://phabricator.wikimedia.org/T276467) [09:14:52] (03PS4) 10DCausse: search-platform: Alert when blazegraph burns allocator too rapidly [alerts] - 10https://gerrit.wikimedia.org/r/720684 (https://phabricator.wikimedia.org/T284446) [09:15:25] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10jbond) 05Open→03In progress [09:16:43] (03CR) 10Jgiannelos: [C: 03+1] "Looks good to me. Lets merge when things are in deployed in k8s." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722605 (https://phabricator.wikimedia.org/T291178) (owner: 10MSantos) [09:18:19] (03PS4) 10ZPapierski: Add kafka clusters' brokers to spicerack config [puppet] - 10https://gerrit.wikimedia.org/r/721857 (https://phabricator.wikimedia.org/T276469) [09:20:54] (03CR) 10Hashar: "That is to send DuplicateParse message to Logstash/Kibana" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722819 (owner: 10Hashar) [09:35:12] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10jbond) I sent the puppetdb sent a `kill -HUP` to the puppetdb service, looks like i missed sending it on puppetdb2002. sending HUP is AFAIK mostly undocumented. T... [09:36:21] (03CR) 10ZPapierski: "PCC is happy: https://puppet-compiler.wmflabs.org/compiler1001/31201/" [puppet] - 10https://gerrit.wikimedia.org/r/721857 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [09:38:15] !log jiji@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [09:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:06] (03CR) 10Michael Große: query_service: support multiple variants of wdqs microsite (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/719502 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [09:49:46] (03PS1) 10Elukey: helmfile.d: fix private paths of cxserver,mwdebug,proton [deployment-charts] - 10https://gerrit.wikimedia.org/r/722830 [09:51:28] !log jiji@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [09:51:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:05] if you have to hemfile-deploy mwdebug,proton,cxserver please wait since I have to fix them first [09:54:47] (03CR) 10Elukey: [C: 03+2] helmfile.d: fix private paths of cxserver,mwdebug,proton [deployment-charts] - 10https://gerrit.wikimedia.org/r/722830 (owner: 10Elukey) [09:54:53] (03PS1) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [09:55:03] 10Puppet, 10Infrastructure-Foundations: Puppetdb: not refreshed on config change? - https://phabricator.wikimedia.org/T291540 (10jbond) Currently the this is a conscious decisions. when puppetdb is restarting all submissions to it are rejected which generally cause the "wide spread puppet issues" alert. Rece... [09:55:28] (03CR) 10jerkins-bot: [V: 04-1] Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [09:59:49] (03PS1) 10Giuseppe Lavagetto: kubernetes::deployment_server: add general data [puppet] - 10https://gerrit.wikimedia.org/r/722833 [09:59:57] fixed! [10:08:02] (03PS2) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [10:08:44] (03CR) 10jerkins-bot: [V: 04-1] Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [10:11:42] (03PS3) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [10:14:29] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [10:20:35] !log jiji@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [10:20:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:25] (03PS1) 10Jbond: C:puppetdb::app: drop and rename deprected config options [puppet] - 10https://gerrit.wikimedia.org/r/722838 (https://phabricator.wikimedia.org/T291538) [10:27:18] (03PS4) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [10:30:07] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [10:31:43] !log jiji@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [10:31:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:46] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31203/console" [puppet] - 10https://gerrit.wikimedia.org/r/722838 (https://phabricator.wikimedia.org/T291538) (owner: 10Jbond) [10:37:05] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: Puppetdb: audit existing configuration - https://phabricator.wikimedia.org/T291538 (10jbond) a:03jbond [10:37:17] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: Puppetdb: audit existing configuration - https://phabricator.wikimedia.org/T291538 (10jbond) 05Open→03In progress [10:38:19] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10jbond) This should be fixed now, and i have re imported the facts to the compiler hosts [10:38:54] (03CR) 10ArielGlenn: [C: 03+2] snapshot: Change URL of xmldatadumps-l from mailman2 to mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/721811 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [10:39:00] 10Puppet, 10Infrastructure-Foundations: Numa fact: puppetdb has the fact for only ~60% of the fleet - https://phabricator.wikimedia.org/T291539 (10jbond) 05In progress→03Resolved a:03jbond [10:39:02] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [10:39:57] (03CR) 10ArielGlenn: [C: 03+2] snapshot: Drop flaggedimages from dumps [puppet] - 10https://gerrit.wikimedia.org/r/722026 (https://phabricator.wikimedia.org/T290340) (owner: 10Ladsgroup) [10:46:04] (03PS1) 10Volans: docs: add how to contribute section [software/spicerack] - 10https://gerrit.wikimedia.org/r/722841 [10:50:05] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [10:50:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:00] 10SRE, 10MW-on-K8s, 10Performance-Team, 10Release-Engineering-Team, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) >>! In T290536#7364817, @Joe wrote: > I have some alternative ideas. Specifically, right now we have a limited number of different... [10:54:26] (03PS5) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: May I have your attention please! European mid-day backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1100) [11:00:05] No Gerrit patches in the queue for this window AFAICS. [11:00:11] yay [11:00:11] (03PS2) 10Volans: C:puppetdb::app: drop and rename deprecated config options [puppet] - 10https://gerrit.wikimedia.org/r/722838 (https://phabricator.wikimedia.org/T291538) (owner: 10Jbond) [11:00:15] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [11:00:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] docker: add security updates to Bullseye base image (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/720241 (owner: 10Hashar) [11:01:30] (03CR) 10Volans: [C: 03+1] "LGTM if the compiler is happy" [puppet] - 10https://gerrit.wikimedia.org/r/722838 (https://phabricator.wikimedia.org/T291538) (owner: 10Jbond) [11:01:48] (03CR) 10Alexandros Kosiaris: [C: 04-1] docker: add security updates to Bullseye base image (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/720241 (owner: 10Hashar) [11:08:52] (03CR) 10Jbond: "See inline comments. I appreciate that the changes i have suggested will make diffing the files a bit tricker however i think if we get t" [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [11:11:00] (03CR) 10Jbond: [C: 03+1] docs: add how to contribute section [software/spicerack] - 10https://gerrit.wikimedia.org/r/722841 (owner: 10Volans) [11:11:39] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [11:11:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:40] (03CR) 10Volans: [C: 03+2] docs: add how to contribute section [software/spicerack] - 10https://gerrit.wikimedia.org/r/722841 (owner: 10Volans) [11:12:50] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:14:56] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:18:55] (03Merged) 10jenkins-bot: docs: add how to contribute section [software/spicerack] - 10https://gerrit.wikimedia.org/r/722841 (owner: 10Volans) [11:30:21] 10Puppet, 10Infrastructure-Foundations: Hosts distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10jbond) > its resolution depends entirely on the order of the search Seems like hosts don't have the $site.wmnet as the first entry in the search path, this is probably an easy fix. bu... [11:32:44] (03CR) 10Jbond: [C: 03+2] C:puppetdb::app: drop and rename deprecated config options [puppet] - 10https://gerrit.wikimedia.org/r/722838 (https://phabricator.wikimedia.org/T291538) (owner: 10Jbond) [11:33:33] !log disable puppet fleet wide to preforme puppdb restart [11:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:45] !log enable puppet fleet wide to post puppetdb restart [11:38:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:29] (03CR) 10Jbond: "lgtm, minor nit" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [11:46:59] !log jiji@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [11:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:54] (03PS1) 10Effie Mouzeli: common_templates: allow custon routed_via value [deployment-charts] - 10https://gerrit.wikimedia.org/r/722845 [11:50:57] (03PS2) 10Effie Mouzeli: common_templates: allow custon routed_via value [deployment-charts] - 10https://gerrit.wikimedia.org/r/722845 [11:54:20] (03PS3) 10KartikMistry: Add support for SectionTranslationTargetLanguages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720982 (https://phabricator.wikimedia.org/T290302) [12:00:45] (03PS6) 10Muehlenhoff: Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 [12:01:02] (03CR) 10Muehlenhoff: Setup timer to generate OS reports daily (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [12:03:56] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [12:17:20] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [12:26:53] (03CR) 10Muehlenhoff: [C: 03+2] Setup timer to generate OS reports daily [puppet] - 10https://gerrit.wikimedia.org/r/722832 (owner: 10Muehlenhoff) [12:30:21] (03Abandoned) 10Matthias Mullie: Add MediaSearch assessment filter map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689788 (https://phabricator.wikimedia.org/T276257) (owner: 10Matthias Mullie) [12:30:27] (03PS1) 10Muehlenhoff: Fix typo in group name [puppet] - 10https://gerrit.wikimedia.org/r/722853 [12:33:08] (03CR) 10Muehlenhoff: [C: 03+2] Fix typo in group name [puppet] - 10https://gerrit.wikimedia.org/r/722853 (owner: 10Muehlenhoff) [12:36:46] 10SRE, 10Wikimedia-Mailing-lists, 10I18n: mailman3 encoding issues on unsubscription emails - https://phabricator.wikimedia.org/T290613 (10akosiaris) I 've received the unredacted body of the message from @MarcoAurelio. It is typical [Quoted-Printable](https://en.wikipedia.org/wiki/Quoted-printable). This is... [12:40:11] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Set php-wmerrors reqId to "unknown" for Logstash [puppet] - 10https://gerrit.wikimedia.org/r/721924 (https://phabricator.wikimedia.org/T291192) (owner: 10Krinkle) [12:40:18] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Add request ID to php-wmerrors error page [puppet] - 10https://gerrit.wikimedia.org/r/721923 (https://phabricator.wikimedia.org/T291192) (owner: 10Krinkle) [12:41:25] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: Move statsd call from php-wmerrors page to end of script [puppet] - 10https://gerrit.wikimedia.org/r/721925 (owner: 10Krinkle) [12:43:03] 10Puppet, 10Infrastructure-Foundations: Hosts distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10Volans) >>! In T291541#7371589, @jbond wrote: >> its resolution depends entirely on the order of the search > Seems like hosts don't have the $site.wmnet as the first entry in the sear... [12:44:52] (03PS2) 10Giuseppe Lavagetto: mediawiki: Set "mwversion" for Logstash entries from php-wmerrors [puppet] - 10https://gerrit.wikimedia.org/r/722483 (https://phabricator.wikimedia.org/T253781) (owner: 10Krinkle) [12:46:01] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "We should change the charts to use our own canary mechanism, rather than adapting to those." [deployment-charts] - 10https://gerrit.wikimedia.org/r/722845 (owner: 10Effie Mouzeli) [12:46:35] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node [12:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:49] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node (duration: 00m 14s) [12:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:08] nemo-yiannis: [12:48:11] ^ [12:49:20] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Given all eventgate releases are located in different namespaces, having the same release name is not an issue." [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [12:49:38] (03CR) 10Herron: [C: 03+1] Temporarily filter port 25 on mx1001 for reimage [homer/public] - 10https://gerrit.wikimedia.org/r/722551 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [12:51:10] (03CR) 10Muehlenhoff: [C: 03+2] Temporarily filter port 25 on mx1001 for reimage [homer/public] - 10https://gerrit.wikimedia.org/r/722551 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [12:55:24] (03PS1) 10Muehlenhoff: Revert "Temporarily filter port 25 on mx1001 for reimage" [homer/public] - 10https://gerrit.wikimedia.org/r/722859 [12:55:49] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5% [12:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:04] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5% (duration: 00m 15s) [12:56:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:14] (03CR) 10Alexandros Kosiaris: [C: 03+1] kubernetes::deployment_server: add general data [puppet] - 10https://gerrit.wikimedia.org/r/722833 (owner: 10Giuseppe Lavagetto) [12:59:38] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10Joe) @jijiki I guess you want to rebuild the php base image to include the patches to optimize DOM performance before running the tests [13:00:05] dduvall and hashar: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for MediaWiki train - American+European Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1300). [13:03:58] how painful [13:04:59] !log joal@deploy1002 Started deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f] [13:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:31] (03CR) 10Ottomata: Eventgate: Symlink _helpers and _tls_helpers (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [13:09:11] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10% [13:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:25] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10% (duration: 00m 14s) [13:09:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:24] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Ottomata) Approved. > I assume this also needs analytics-privatedata-access group with no SSH and no kerberos. Given the reason for access, I this will need SSH and Ker... [13:12:28] 10Puppet, 10Infrastructure-Foundations: Hosts distribution across puppetmasters - https://phabricator.wikimedia.org/T291541 (10jbond) > Basically instead of puppet have puppet.eqiad.wmnet, that might point to a different host, even in a difference datacenter for temporary failover purposes. Ahh i see, what you... [13:21:51] PROBLEM - etcd request latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 operation=list https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [13:22:08] PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [13:22:12] (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.0.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/722865 [13:23:25] !log joal@deploy1002 Finished deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f] (duration: 18m 25s) [13:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:44] (03PS2) 10Volans: CHANGELOG: add changelogs for release v1.0.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/722865 [13:24:21] (03PS4) 10Jbond: git - schema: Add new schema for adding git information [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) [13:24:47] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) Thanks @Ottomata! @NRodriguez you'd need then, to provide your ssh public key on the task. Please read: https://wikitech.wikimedia.org/wiki/SRE/Production_ac... [13:25:24] (03CR) 10Jbond: "thanks made some updates" [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [13:26:02] RECOVERY - etcd request latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [13:26:20] RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [13:26:48] !log mx1001 filterered on the routers for forthcoming reimage to bullseye T286911 [13:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:53] T286911: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 [13:27:41] 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-access for NRodriguez - https://phabricator.wikimedia.org/T291508 (10Marostegui) [13:33:41] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v1.0.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/722865 (owner: 10Volans) [13:39:22] !log flushed mx1001 mail queue to mx2001 T286911 [13:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:27] T286911: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 [13:40:17] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v1.0.0 [software/spicerack] - 10https://gerrit.wikimedia.org/r/722865 (owner: 10Volans) [13:48:58] (03PS1) 10Volans: Upstream release v1.0.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/722868 [13:53:55] (03CR) 10Brennen Bearnes: "I'm in favor of moving this stuff to Puppet, but will defer to others on the specifics." [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [13:58:46] (03CR) 10Volans: [C: 03+2] Upstream release v1.0.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/722868 (owner: 10Volans) [14:05:14] (03PS4) 10KartikMistry: Add support for SectionTranslationTargetLanguages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720982 (https://phabricator.wikimedia.org/T290302) [14:06:17] (03Merged) 10jenkins-bot: Upstream release v1.0.0 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/722868 (owner: 10Volans) [14:07:41] (03PS5) 10KartikMistry: Add support for SectionTranslationTargetLanguages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720982 (https://phabricator.wikimedia.org/T290302) [14:11:05] (03PS1) 10Alexandros Kosiaris: Experiment with a jenkins-debug user [deployment-charts] - 10https://gerrit.wikimedia.org/r/722869 (https://phabricator.wikimedia.org/T290360) [14:13:34] (03PS1) 10Muehlenhoff: Prefer codfw wiki smarthost over eqiad one for mx1001 reimage [puppet] - 10https://gerrit.wikimedia.org/r/722870 (https://phabricator.wikimedia.org/T286911) [14:14:34] !log uploaded spicerack_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [14:14:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:43] (03CR) 10Herron: [C: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1003/31205/mw1311.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/722870 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [14:19:20] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/31204/idp1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/722870 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [14:20:12] (03CR) 10Muehlenhoff: [C: 03+2] Prefer codfw wiki smarthost over eqiad one for mx1001 reimage [puppet] - 10https://gerrit.wikimedia.org/r/722870 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [14:29:50] (03CR) 10Arturo Borrero Gonzalez: create role to deploy staging instance for quarry (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) (owner: 10Michael DiPietro) [14:29:58] (03PS1) 10Jbond: schemas - metrics: Add puppet keys to the metrics name space [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) [14:30:08] (03PS1) 10Andrew Bogott: Nova cloud-init/vendordata: more special pleading to avoid puppet races [puppet] - 10https://gerrit.wikimedia.org/r/722874 [14:31:25] (03CR) 10Andrew Bogott: [C: 03+2] Nova cloud-init/vendordata: more special pleading to avoid puppet races [puppet] - 10https://gerrit.wikimedia.org/r/722874 (owner: 10Andrew Bogott) [14:35:09] (03CR) 10Jbond: schemas - metrics: Add puppet keys to the metrics name space (031 comment) [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [14:43:23] (03PS1) 10Elukey: Add separate secrets for the k8s admin services [labs/private] - 10https://gerrit.wikimedia.org/r/722876 [14:46:25] (03PS2) 10Elukey: Add separate secrets for the k8s admin services [labs/private] - 10https://gerrit.wikimedia.org/r/722876 [14:47:09] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add separate secrets for the k8s admin services [labs/private] - 10https://gerrit.wikimedia.org/r/722876 (owner: 10Elukey) [14:47:53] !log upgraded spicerack to 1.0.0 on cumin hosts [14:47:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:36] (03PS1) 10Elukey: helmfile: add secrets for the admin_ng configs [puppet] - 10https://gerrit.wikimedia.org/r/722877 [14:51:35] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31206/console" [puppet] - 10https://gerrit.wikimedia.org/r/722877 (owner: 10Elukey) [14:53:43] (03PS2) 10Elukey: helmfile: add secrets for the admin_ng configs [puppet] - 10https://gerrit.wikimedia.org/r/722877 [14:53:46] (03CR) 10BryanDavis: "Y'all can link this change to T291530. And thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/722833 (owner: 10Giuseppe Lavagetto) [14:57:41] (03CR) 10Michael DiPietro: create role to deploy staging instance for quarry (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) (owner: 10Michael DiPietro) [15:02:44] !log re-installing mx1001 with bullseye T286911 [15:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:50] T286911: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 [15:09:22] PROBLEM - SSH on mx1001 is CRITICAL: connect to address 208.80.154.76 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring [15:09:51] (03PS1) 10Elukey: role::deployment_server: fix settings for revscoring-editquality [puppet] - 10https://gerrit.wikimedia.org/r/722881 [15:10:14] PROBLEM - Exim SMTP on mx1001 is CRITICAL: connect to address 208.80.154.76 and port 25: Connection refused https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting [15:10:43] (03CR) 10Elukey: [C: 03+2] role::deployment_server: fix settings for revscoring-editquality [puppet] - 10https://gerrit.wikimedia.org/r/722881 (owner: 10Elukey) [15:11:04] (03PS2) 10Jbond: P:sre::check_user: add support for wikitech querys [puppet] - 10https://gerrit.wikimedia.org/r/720056 [15:11:51] (03PS1) 10Volans: sre.experimental.reimage: disable progress bars [cookbooks] - 10https://gerrit.wikimedia.org/r/722882 [15:12:38] (03CR) 10jerkins-bot: [V: 04-1] P:sre::check_user: add support for wikitech querys [puppet] - 10https://gerrit.wikimedia.org/r/720056 (owner: 10Jbond) [15:15:52] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node" [15:15:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:07] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node" (duration: 00m 15s) [15:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:03] nemo-yiannis: ^ [15:17:19] thanks mbsantos [15:18:25] (03CR) 10Elukey: [C: 03+2] Add revscoring-editquality as first ml-service to helmfile.d [deployment-charts] - 10https://gerrit.wikimedia.org/r/719128 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [15:18:41] (03PS21) 10Elukey: Add revscoring-editquality as first ml-service to helmfile.d [deployment-charts] - 10https://gerrit.wikimedia.org/r/719128 (https://phabricator.wikimedia.org/T286791) [15:20:10] (03PS3) 10Jelto: modules::gitlab add missing fields from ansible gitlab.rb template [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) [15:20:50] (03CR) 10jerkins-bot: [V: 04-1] modules::gitlab add missing fields from ansible gitlab.rb template [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [15:22:20] (03PS4) 10Jelto: modules::gitlab add missing fields from ansible gitlab.rb template [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) [15:23:33] (03PS19) 10Elukey: Rakefile: change HELMFILE_GLOB to include ml-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/719522 (https://phabricator.wikimedia.org/T286791) [15:24:22] (03PS1) 10Jbond: O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) [15:25:27] (03CR) 10Volans: [C: 03+2] "Testing in real life the new remote capabilities of spicerack 1.0.0." [cookbooks] - 10https://gerrit.wikimedia.org/r/722882 (owner: 10Volans) [15:26:13] (03CR) 10Jbond: "@wolfgang, jobo: please approve as a genral expansion to expand the privalges of the sre-admins (sre's with our root access)" [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [15:26:24] RECOVERY - Exim SMTP on mx1001 is OK: OK - Certificate mx1001.wikimedia.org will expire on Sun 14 Nov 2021 01:37:16 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting [15:27:22] (03CR) 10Herron: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/722859 (owner: 10Muehlenhoff) [15:27:32] RECOVERY - SSH on mx1001 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [15:27:43] (03CR) 10Muehlenhoff: [C: 03+2] Revert "Temporarily filter port 25 on mx1001 for reimage" [homer/public] - 10https://gerrit.wikimedia.org/r/722859 (owner: 10Muehlenhoff) [15:27:54] PROBLEM - spamassassin on mx1001 is CRITICAL: PROCS CRITICAL: 0 processes with args spamd https://wikitech.wikimedia.org/wiki/Mail%23SpamAssassin [15:28:03] (03Merged) 10jenkins-bot: sre.experimental.reimage: disable progress bars [cookbooks] - 10https://gerrit.wikimedia.org/r/722882 (owner: 10Volans) [15:28:49] (03CR) 10Elukey: [C: 03+2] Rakefile: change HELMFILE_GLOB to include ml-services [deployment-charts] - 10https://gerrit.wikimedia.org/r/719522 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [15:30:28] (03PS3) 10Brennen Bearnes: gitlab cas: uid instead of CN; add nickname_key [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/714382 (https://phabricator.wikimedia.org/T288392) (owner: 10Jbond) [15:31:56] RECOVERY - spamassassin on mx1001 is OK: PROCS OK: 3 processes with args spamd https://wikitech.wikimedia.org/wiki/Mail%23SpamAssassin [15:34:08] (03PS1) 10Elukey: admin_ng: set helm3 for ml-serve deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/722906 [15:36:15] (03PS2) 10Jbond: O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) [15:38:45] (03CR) 10Elukey: [C: 03+2] admin_ng: set helm3 for ml-serve deployments [deployment-charts] - 10https://gerrit.wikimedia.org/r/722906 (owner: 10Elukey) [15:39:34] (03CR) 10Jelto: "thanks for the review. I added some code and thoughts. Let me know what you think." [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [15:42:40] (03CR) 10Jbond: [C: 03+1] "this looks good to me, @jelto think it make senses for you to merge but im happy to if you would prefer" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/714382 (https://phabricator.wikimedia.org/T288392) (owner: 10Jbond) [15:46:41] (03PS1) 10Dduvall: Avoid $wgUser deprecation warnings [extensions/CentralAuth] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722896 (https://phabricator.wikimedia.org/T291515) [15:47:13] (03CR) 10DannyS712: "I was just coming to ask if this should be backported, I guess the answer is yes" [extensions/CentralAuth] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722896 (https://phabricator.wikimedia.org/T291515) (owner: 10Dduvall) [15:48:42] (03CR) 10Dduvall: Avoid $wgUser deprecation warnings (031 comment) [extensions/CentralAuth] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722896 (https://phabricator.wikimedia.org/T291515) (owner: 10Dduvall) [15:49:08] !log joal@deploy1002 Started deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f] [15:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:15] !log joal@deploy1002 Finished deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f] (duration: 00m 07s) [15:49:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:54] !log joal@deploy1002 Started deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f] [15:49:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:09] !log removed filters on mx1001 filterered on the routers due to an issue with the mx1001 reinstall T286911 [15:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:14] T286911: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 [15:53:00] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Remove ocg remnant [labs/private] - 10https://gerrit.wikimedia.org/r/715744 (owner: 10Alexandros Kosiaris) [15:54:22] (03CR) 10Volans: "Commenting just to make sure that this is not waiting for me at the moment." [software/spicerack] - 10https://gerrit.wikimedia.org/r/721563 (owner: 10Arturo Borrero Gonzalez) [15:55:07] (03PS1) 10Alexandros Kosiaris: Add mwdebug secrets to releases servers [labs/private] - 10https://gerrit.wikimedia.org/r/722910 (https://phabricator.wikimedia.org/T288629) [15:56:11] !log joal@deploy1002 Finished deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f] (duration: 06m 17s) [15:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:35] !log volans@cumin1001 START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet [15:57:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:02] (03PS1) 10Volans: sre.experimental.reimage: disable progress bars(2) [cookbooks] - 10https://gerrit.wikimedia.org/r/722911 [16:00:09] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add mwdebug secrets to releases servers [labs/private] - 10https://gerrit.wikimedia.org/r/722910 (https://phabricator.wikimedia.org/T288629) (owner: 10Alexandros Kosiaris) [16:04:31] (03PS1) 10Hnowlan: ratelimit: load environment variables file in entrypoint [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/722914 (https://phabricator.wikimedia.org/T254917) [16:06:17] (03CR) 10Volans: [C: 03+2] "Missed some additional ones in the first pass." [cookbooks] - 10https://gerrit.wikimedia.org/r/722911 (owner: 10Volans) [16:06:27] (03PS1) 10Ladsgroup: Set jQuery migrate to false everywhere except metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722916 (https://phabricator.wikimedia.org/T280944) [16:06:34] jouncebot: nowandnext [16:06:34] No deployments scheduled for the next 1 hour(s) and 53 minute(s) [16:06:34] In 1 hour(s) and 53 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [16:06:34] In 1 hour(s) and 53 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [16:06:54] noice, deploying 722916 now [16:08:41] (03CR) 10Ladsgroup: [C: 03+2] Set jQuery migrate to false everywhere except metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722916 (https://phabricator.wikimedia.org/T280944) (owner: 10Ladsgroup) [16:08:43] (03Merged) 10jenkins-bot: sre.experimental.reimage: disable progress bars(2) [cookbooks] - 10https://gerrit.wikimedia.org/r/722911 (owner: 10Volans) [16:08:50] !log volans@cumin1001 END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet [16:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:10] (03Merged) 10jenkins-bot: Set jQuery migrate to false everywhere except metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722916 (https://phabricator.wikimedia.org/T280944) (owner: 10Ladsgroup) [16:10:27] (03PS1) 10Volans: sre.experimental.reimage: fix log messages [cookbooks] - 10https://gerrit.wikimedia.org/r/722918 [16:12:53] 10SRE, 10MW-on-K8s, 10Performance-Team, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10thcipriani) [16:13:46] !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:722916|Set jQuery migrate to false everywhere except metawiki (T280944)]] (duration: 01m 56s) [16:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:51] T280944: Phase out jQuery Migrate v3 - https://phabricator.wikimedia.org/T280944 [16:14:04] !log joal@deploy1002 Started deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46] [16:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:05] (03CR) 10Jbond: modules::gitlab add missing fields from ansible gitlab.rb template (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [16:17:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:17:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:59] (03CR) 10Jbond: [C: 03+1] prospector: disable pylint consider-using-f-string [software/cumin] - 10https://gerrit.wikimedia.org/r/722378 (owner: 10Volans) [16:19:07] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:02] (03PS4) 10Reedy: Set wgProhibitedFileExtensions not wgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721011 (https://phabricator.wikimedia.org/T290640) (owner: 10Jforrester) [16:21:02] (03PS50) 10Btullis: Install Alluxio to the test cluster [puppet] - 10https://gerrit.wikimedia.org/r/712974 (https://phabricator.wikimedia.org/T266641) [16:22:18] (03PS1) 10Reedy: Update some more renamed $wg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722919 (https://phabricator.wikimedia.org/T290640) [16:23:01] jouncebot: now [16:23:01] No deployments scheduled for the next 1 hour(s) and 36 minute(s) [16:23:03] jouncebot: next [16:23:04] In 1 hour(s) and 36 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [16:23:04] In 1 hour(s) and 36 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [16:24:06] (03CR) 10Btullis: "This change is ready for review." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/714753 (https://phabricator.wikimedia.org/T287864) (owner: 10Btullis) [16:25:41] 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: (Need By: TBD) rack/setup/install ml-train100[1-4] - https://phabricator.wikimedia.org/T291579 (10RobH) [16:26:29] 10ops-eqiad, 10DC-Ops, 10Machine-Learning-Team: (Need By: TBD) rack/setup/install ml-train100[1-4] - https://phabricator.wikimedia.org/T291579 (10RobH) [16:28:25] (03PS1) 10Reedy: Add transtion code to rename $wmgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722921 (https://phabricator.wikimedia.org/T290640) [16:28:27] (03PS1) 10Reedy: Rename $wmgFileBlacklist to $wmgProhibitedFileExtensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722922 [16:28:29] (03PS1) 10Reedy: Remove $wmgFileBlacklist back compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722923 (https://phabricator.wikimedia.org/T290640) [16:28:35] (03CR) 10Volans: [C: 03+2] sre.experimental.reimage: fix log messages [cookbooks] - 10https://gerrit.wikimedia.org/r/722918 (owner: 10Volans) [16:28:54] (03CR) 10Volans: [C: 03+2] prospector: disable pylint consider-using-f-string [software/cumin] - 10https://gerrit.wikimedia.org/r/722378 (owner: 10Volans) [16:29:31] Hang on... I think I just duplicated some patches [16:29:42] Bah [16:29:44] (03CR) 10jerkins-bot: [V: 04-1] Add transtion code to rename $wmgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722921 (https://phabricator.wikimedia.org/T290640) (owner: 10Reedy) [16:29:49] (03PS1) 10Ssingh: durum: add and set CSP headers for check.wikimedia-dns.org [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) [16:29:51] (03CR) 10jerkins-bot: [V: 04-1] Rename $wmgFileBlacklist to $wmgProhibitedFileExtensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722922 (owner: 10Reedy) [16:30:17] (03CR) 10Reedy: [C: 03+2] Set wgProhibitedFileExtensions not wgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721011 (https://phabricator.wikimedia.org/T290640) (owner: 10Jforrester) [16:30:40] (03PS3) 10Reedy: Alter wgMimeTypeExclusions not wgMimeTypeBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721030 (owner: 10Jforrester) [16:30:50] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [16:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:57] (03CR) 10Reedy: [C: 03+2] Alter wgMimeTypeExclusions not wgMimeTypeBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721030 (owner: 10Jforrester) [16:31:27] (03Merged) 10jenkins-bot: Set wgProhibitedFileExtensions not wgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721011 (https://phabricator.wikimedia.org/T290640) (owner: 10Jforrester) [16:31:30] PROBLEM - etcd request latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 operation=list https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:31:50] PROBLEM - etcd request latencies on kubestagemaster1001 is CRITICAL: instance=10.64.16.203 operation={get,list,listWithCount,update} https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:32:07] (03PS1) 10MVernon: README.md: mention requirement for rsync [software/swift-ring] - 10https://gerrit.wikimedia.org/r/722927 [16:32:14] (03Merged) 10jenkins-bot: sre.experimental.reimage: fix log messages [cookbooks] - 10https://gerrit.wikimedia.org/r/722918 (owner: 10Volans) [16:32:24] !log joal@deploy1002 Finished deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46] (duration: 18m 19s) [16:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:48] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31208/console" [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [16:32:50] (03Merged) 10jenkins-bot: Alter wgMimeTypeExclusions not wgMimeTypeBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721030 (owner: 10Jforrester) [16:32:57] (03CR) 10Dzahn: [C: 03+2] rancid: convert crons to systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/721854 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:33:27] (03CR) 10MVernon: "Hi," [software/swift-ring] - 10https://gerrit.wikimedia.org/r/722927 (owner: 10MVernon) [16:33:36] RECOVERY - etcd request latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:33:54] RECOVERY - etcd request latencies on kubestagemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Etcd/Main_cluster https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=28 [16:34:16] akosiaris: you have pending changes on the puppet master (but I could just merge only my change anyways, which was nice) [16:35:20] !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: Use wgMimeTypeExclusions and set wgProhibitedFileExtensions not wgFileBlacklist (duration: 01m 05s) [16:35:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:29] (03PS2) 10Reedy: Update some more renamed $wg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722919 (https://phabricator.wikimedia.org/T290640) [16:35:33] akosiaris: also.. one of them looks exactly like what I wanted to ask you about.. how to add the secrets for releases servers. cool :)) [16:36:37] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:36:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:47] !log volans@cumin1001 START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet [16:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:57] (03CR) 10Reedy: [C: 03+2] Update some more renamed $wg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722919 (https://phabricator.wikimedia.org/T290640) (owner: 10Reedy) [16:37:20] (03Abandoned) 10Reedy: Add transtion code to rename $wmgFileBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722921 (https://phabricator.wikimedia.org/T290640) (owner: 10Reedy) [16:37:22] (03Abandoned) 10Reedy: Rename $wmgFileBlacklist to $wmgProhibitedFileExtensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722922 (owner: 10Reedy) [16:37:24] (03Abandoned) 10Reedy: Remove $wmgFileBlacklist back compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722923 (https://phabricator.wikimedia.org/T290640) (owner: 10Reedy) [16:37:32] (03Merged) 10jenkins-bot: prospector: disable pylint consider-using-f-string [software/cumin] - 10https://gerrit.wikimedia.org/r/722378 (owner: 10Volans) [16:37:47] !log joal@deploy1002 Started deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46] [16:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:54] !log joal@deploy1002 Finished deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46] (duration: 00m 07s) [16:37:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:58] (03Merged) 10jenkins-bot: Update some more renamed $wg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722919 (https://phabricator.wikimedia.org/T290640) (owner: 10Reedy) [16:38:00] mutante: ah sorry. Thanks! fixed [16:38:03] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:29] Reedy: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721012 etc. [16:38:29] (03CR) 10Cwhite: [C: 03+1] "Looks great, thanks for the follow-up!" [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [16:38:36] (03PS3) 10Jforrester: Rename wmfFileBlacklist to wmgProhibitedFileExtensions part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721012 [16:38:37] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:38:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:44] James_F: Yeah, hence my three abandons above :P [16:38:46] akosiaris: thanks for saving me the time to ask about those secrets.. I was going to write an entire mail about that just ow :) [16:38:48] (03PS2) 10Ssingh: durum: add and set CSP headers for check.wikimedia-dns.org [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) [16:38:59] Sorry, am in meetings. [16:39:04] :) [16:39:30] mutante: :) [16:39:47] I 've merged the equivalent change in puppetmaster1001 btw [16:39:57] !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: Rename wgEnableUserEmailBlacklist to wgEnableUserEmailMuteList (duration: 01m 05s) [16:40:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:43] !log [netmon1002:~] $ sudo systemctl start rancid-clean-logs [16:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:09] 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10User-fgiunchedi: Decom ms-be[1019-1026] - https://phabricator.wikimedia.org/T272836 (10Cmjohnson) [16:41:19] (03PS3) 10Reedy: Rename wmfFileBlacklist to wmgProhibitedFileExtensions part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721013 (owner: 10Jforrester) [16:41:20] 10SRE-swift-storage, 10User-fgiunchedi: Refresh and expand Swift hardware capacity - https://phabricator.wikimedia.org/T266016 (10Cmjohnson) [16:41:25] (03PS3) 10Reedy: Rename wmfFileBlacklist to wmgProhibitedFileExtensions part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721014 (owner: 10Jforrester) [16:41:26] !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Rename wgShortPagesNamespaceBlacklist to wgShortPagesNamespaceExclusions (duration: 01m 05s) [16:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:39] 10SRE, 10SRE-swift-storage, 10ops-eqiad, 10User-fgiunchedi: Decom ms-be[1019-1026] - https://phabricator.wikimedia.org/T272836 (10Cmjohnson) 05Open→03Resolved removed from rack and updated [16:41:44] (03CR) 10Reedy: [C: 03+2] Rename wmfFileBlacklist to wmgProhibitedFileExtensions part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721012 (owner: 10Jforrester) [16:41:47] !log [netmon1002:~] $ sudo systemctl start rancid-differ [16:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:09] (03PS4) 10Reedy: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721012 (owner: 10Jforrester) [16:42:21] (03CR) 10Reedy: [C: 03+2] Rename wmgFileBlacklist to wmgProhibitedFileExtensions part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721012 (owner: 10Jforrester) [16:42:28] 10SRE, 10ops-eqiad, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install cloudcephosd102[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T284471 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts: ` cloudcephosd1022.eqiad.wmnet ` The log can be found in... [16:43:15] (03CR) 10RLazarus: [C: 03+1] "Implementation LGTM pending typos below, thanks for doing this!" [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [16:43:17] !log joal@deploy1002 Started deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46] [16:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:34] (03Merged) 10jenkins-bot: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part I [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721012 (owner: 10Jforrester) [16:44:49] (03PS4) 10Reedy: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721013 (owner: 10Jforrester) [16:44:51] (03PS4) 10Reedy: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721014 (owner: 10Jforrester) [16:45:15] !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Add wmgProhibitedFileExtensions (duration: 01m 07s) [16:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:46] (03CR) 10Reedy: [C: 03+2] Rename wmgFileBlacklist to wmgProhibitedFileExtensions part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721013 (owner: 10Jforrester) [16:46:05] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:46:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:33] (03PS3) 10Jbond: O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) [16:46:39] (03Merged) 10jenkins-bot: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721013 (owner: 10Jforrester) [16:47:36] (03PS4) 10Jbond: O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) [16:47:54] (03CR) 10Reedy: [C: 03+2] Rename wmgFileBlacklist to wmgProhibitedFileExtensions part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721014 (owner: 10Jforrester) [16:48:06] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:33] !log reedy@deploy1002 Synchronized wmf-config/CommonSettings.php: Use wmgProhibitedFileExtensions (duration: 01m 05s) [16:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:53] (03Merged) 10jenkins-bot: Rename wmgFileBlacklist to wmgProhibitedFileExtensions part III [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721014 (owner: 10Jforrester) [16:49:34] !log joal@deploy1002 Finished deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46] (duration: 06m 17s) [16:49:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:21] !log reedy@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Remove wmgFileBlacklist (duration: 01m 06s) [16:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:38] (03CR) 10Jbond: "thanks updated" [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [16:51:56] (03CR) 10Jbond: git - schema: Add new schema for adding git information (031 comment) [software/ecs] - 10https://gerrit.wikimedia.org/r/722580 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [16:53:42] (03PS1) 10Dzahn: rancid: remove absented cron code [puppet] - 10https://gerrit.wikimedia.org/r/722930 (https://phabricator.wikimedia.org/T273673) [16:53:55] !log volans@cumin1001 END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet [16:53:55] (03CR) 10RLazarus: [C: 03+1] O:idp: update access permissions for sre-admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [16:53:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:02] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 78 probes of 619 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [16:54:44] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [16:54:54] 10SRE, 10ops-eqiad, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install cloudcephosd102[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T284471 (10Cmjohnson) @Papaul @robh I moved the disks from cloudcephosd1022 to cloudcephosd1021 and attempted a re-install. That did not work, the partitioner hung up at 4... [16:55:07] (03PS5) 10Jbond: O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) [16:55:22] (03CR) 10Dzahn: "crontab -u rancid -l is now empty on netmon1002. started both new services manually." [puppet] - 10https://gerrit.wikimedia.org/r/721854 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:55:34] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:48] (03CR) 10Jbond: O:idp: update access permissions for sre-admins (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [16:56:15] (03CR) 10Dzahn: [C: 03+2] rancid: remove absented cron code [puppet] - 10https://gerrit.wikimedia.org/r/722930 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:56:24] !log cmjohnson@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [16:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:46] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [16:56:52] (03CR) 10Dzahn: "Sep 22 16:49:14 netmon1002 systemd[1]: Started run rancid-run." [puppet] - 10https://gerrit.wikimedia.org/r/721854 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:57:36] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [16:57:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:25] (03CR) 10Dzahn: "I am torn between your view points, like Jelto I am also a bit worried about adding too much complexity but John has good counter argument" [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [16:58:35] !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [16:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:22] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 41 probes of 619 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [16:59:27] (03CR) 10Dzahn: [C: 04-2] "ITS replied and they actually DO still use this every once in a while, so this needs to stay and be converted to timer" [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) (owner: 10Dzahn) [17:00:32] jouncebot: nowandnext [17:00:32] No deployments scheduled for the next 0 hour(s) and 59 minute(s) [17:00:33] In 0 hour(s) and 59 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [17:00:33] In 0 hour(s) and 59 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800) [17:01:14] 4YO-candies?totes [17:01:21] grp [17:01:26] (03PS1) 10Legoktm: Revert "Drop i18n messages for removed token API" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722901 [17:03:22] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:03:27] (03CR) 10Jforrester: "This week too? :-(" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722901 (owner: 10Legoktm) [17:04:23] (03PS1) 10MewOphaswongse: Post-edit Panel: Set task.pageviews to null rather than undefined [extensions/GrowthExperiments] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722932 (https://phabricator.wikimedia.org/T291510) [17:04:54] (03PS1) 10Legoktm: Revert "Drop action api token methods deprecated in 1.24" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722933 (https://phabricator.wikimedia.org/T291202) [17:05:30] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:06:45] (03CR) 10Dzahn: [C: 04-2] mail::mx: remove cron that mails aliases to OIT (ITS) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) (owner: 10Dzahn) [17:06:54] (03CR) 10Legoktm: [C: 03+2] Revert "Drop i18n messages for removed token API" (031 comment) [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722901 (owner: 10Legoktm) [17:07:04] (03CR) 10Legoktm: [C: 03+2] Revert "Drop action api token methods deprecated in 1.24" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722933 (https://phabricator.wikimedia.org/T291202) (owner: 10Legoktm) [17:07:33] 10SRE, 10ops-eqiad, 10DC-Ops: Q4:(Need By: TBD) rack/setup/install cloudcephosd102[1-4].eqiad.wmnet - https://phabricator.wikimedia.org/T284471 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudcephosd1022.eqiad.wmnet'] ` and were **ALL** successful. [17:08:46] (03PS3) 10Hashar: docker: add security updates to Bullseye base image [puppet] - 10https://gerrit.wikimedia.org/r/720241 [17:09:14] (03CR) 10RLazarus: [C: 03+1] O:idp: update access permissions for sre-admins [puppet] - 10https://gerrit.wikimedia.org/r/722884 (https://phabricator.wikimedia.org/T289779) (owner: 10Jbond) [17:10:53] (03CR) 10Hashar: "Should we good now. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/720241 (owner: 10Hashar) [17:12:41] (03PS11) 10Ppchelko: Eventgate: Symlink _helpers and _tls_helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) [17:13:51] (03PS12) 10Ppchelko: Eventgate: Symlink _helpers and _tls_helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) [17:14:16] 10SRE, 10Traffic, 10MW-1.35-notes (1.35.0-wmf.40; 2020-07-07), 10Patch-For-Review, and 2 others: Harmonise the identification of requests across our stack - https://phabricator.wikimedia.org/T201409 (10Krinkle) [17:14:45] (03PS2) 10Dzahn: mail::mx: convert cron that sends alias file to ITS to timer [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) [17:22:48] (03PS3) 10Dzahn: mail::mx: convert cron that sends alias file to ITS to timer [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) [17:23:38] (03PS4) 10Dzahn: mail::mx: convert cron that sends alias file to ITS to timer [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) [17:24:02] (03PS1) 10Ppchelko: Update eventgate helmfile.d for eventgate 0.5 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/722935 (https://phabricator.wikimedia.org/T291504) [17:24:11] (03CR) 10Dzahn: [C: 03+2] mail::mx: convert cron that sends alias file to ITS to timer [puppet] - 10https://gerrit.wikimedia.org/r/721350 (https://phabricator.wikimedia.org/T122144) (owner: 10Dzahn) [17:30:22] (03CR) 10Jforrester: Revert "Drop i18n messages for removed token API" (031 comment) [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722901 (owner: 10Legoktm) [17:30:30] (03Merged) 10jenkins-bot: Revert "Drop i18n messages for removed token API" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722901 (owner: 10Legoktm) [17:30:37] (03Merged) 10jenkins-bot: Revert "Drop action api token methods deprecated in 1.24" [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722933 (https://phabricator.wikimedia.org/T291202) (owner: 10Legoktm) [17:31:14] (03CR) 10Ppchelko: Eventgate: Symlink _helpers and _tls_helpers (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [17:33:10] I'm not going to scap for the i18n messages, it's not really worth it [17:33:57] * hauskatze read scalp, should check his eyes [17:34:11] :P [17:34:46] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/includes/api/ApiTokens.php: Restore deprecated API token methods (1/3) (duration: 01m 05s) [17:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:36:21] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/autoload.php: Restore deprecated API token methods (2/3) (duration: 01m 05s) [17:36:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:09] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/includes/api/: Restore deprecated API token methods (3/3) (duration: 01m 07s) [17:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:29] !log volans@cumin1001 START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet [17:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:34] Let's scalp all the servers! [17:40:34] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:19] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:43:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:32] (03CR) 10Ebernhardson: "Hmm, wdqs shouldn't be sending any UI requests through the backend servers. Will have to review the trafficserver configuration" [puppet] - 10https://gerrit.wikimedia.org/r/719502 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [18:00:04] RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800). [18:00:05] mewoph: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:05] dduvall and hashar: Dear deployers, time to do the Train log triage with CPT deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1800). [18:01:06] hi mewoph, I'd deploy, but...I'm currently in a train about to arrive :-). Maybe legoktm can? 🙂 [18:01:22] (03PS3) 10Ssingh: durum: add and set CSP headers for check.wikimedia-dns.org [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) [18:01:39] sure [18:01:49] I have my own patch to go out as well [18:01:56] thanks [18:02:47] mewoph: let me know when you're ready to test it [18:02:54] (03CR) 10Legoktm: [C: 03+2] Post-edit Panel: Set task.pageviews to null rather than undefined [extensions/GrowthExperiments] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722932 (https://phabricator.wikimedia.org/T291510) (owner: 10MewOphaswongse) [18:03:03] legoktm: I'm ready now [18:03:30] !log volans@cumin1001 END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet [18:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:54] ok, just waiting on CI now [18:04:18] (03CR) 10Legoktm: [C: 03+2] ProductionServices: Add new Shellboxes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722737 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [18:04:20] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31210/console" [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [18:05:06] (03Merged) 10jenkins-bot: ProductionServices: Add new Shellboxes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722737 (https://phabricator.wikimedia.org/T289226) (owner: 10Legoktm) [18:05:41] I will add https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/722819 , to send DuplicateParse logging to logstash [18:06:59] !log legoktm@deploy1002 Synchronized wmf-config/ProductionServices.php: Add new Shellboxes (duration: 01m 16s) [18:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:04] (03CR) 10Legoktm: durum: add and set CSP headers for check.wikimedia-dns.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [18:09:37] (03PS2) 10Legoktm: logging: send DuplicateParse bucket to Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722819 (owner: 10Hashar) [18:09:44] (03CR) 10Legoktm: [C: 03+2] logging: send DuplicateParse bucket to Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722819 (owner: 10Hashar) [18:09:49] legoktm: thx ;) [18:10:05] :) [18:10:42] (03Merged) 10jenkins-bot: logging: send DuplicateParse bucket to Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722819 (owner: 10Hashar) [18:11:49] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:11:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:58] !log legoktm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: logging: send DuplicateParse bucket to Logstash (duration: 01m 05s) [18:13:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:04] hashar: ^ [18:13:52] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:12] will check logstash [18:17:59] (03PS1) 10Dzahn: mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) [18:18:29] (03CR) 10jerkins-bot: [V: 04-1] mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:20:01] (03PS2) 10Dzahn: mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) [18:20:02] legoktm: it works. Thank you for the deployment [18:20:10] :) [18:20:18] now we're just waiting on CI for GrowthExperiments... [18:21:07] (03PS1) 10Herron: Revert "Revert "slo_dashboard: switch etcd request slo query to recording rule metrics"" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/722904 [18:21:40] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:12] (03Merged) 10jenkins-bot: Post-edit Panel: Set task.pageviews to null rather than undefined [extensions/GrowthExperiments] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722932 (https://phabricator.wikimedia.org/T291510) (owner: 10MewOphaswongse) [18:23:39] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:23:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:59] (03PS3) 10Dzahn: mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) [18:24:01] (03PS4) 10Ssingh: durum: add and set CSP headers for check.wikimedia-dns.org [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) [18:24:21] (03CR) 10Ssingh: [V: 03+1] durum: add and set CSP headers for check.wikimedia-dns.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [18:27:13] (03PS4) 10Dzahn: mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) [18:28:36] mewoph: it should be live on mwdebug1001 now [18:28:41] checking [18:30:16] legoktm: looks good to me, thanks [18:30:19] (03PS5) 10Dzahn: mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) [18:31:07] syncing [18:31:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:31:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:09] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEditPanel.js: Post-edit Panel: Set task.pageviews to null rather than undefined (T291510) (duration: 01m 05s) [18:32:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:14] T291510: [betalabs]Post-edit dialog: mw-ge-small-task-card-metadata-container attempts to load pageviews info - https://phabricator.wikimedia.org/T291510 [18:32:44] mewoph: should be everywhere now [18:32:57] legoktm: thanks! [18:33:03] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/31215/mx2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:33:05] (03CR) 10Krinkle: "Verified in beta by editing /srv/mediawiki/php-master/opensearch_desc.php and putting foo(); in the top line after php-open tag. I'm using" [puppet] - 10https://gerrit.wikimedia.org/r/722483 (https://phabricator.wikimedia.org/T253781) (owner: 10Krinkle) [18:33:07] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:18] anytime :) [18:33:40] (03CR) 10Dzahn: "Do you think I should move all of this into its own class that is included in a mx role and turn the $recipient, sbject, alias_file into p" [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:39:39] (03CR) 10Ssingh: [C: 03+2] durum: add and set CSP headers for check.wikimedia-dns.org [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [18:40:09] (03CR) 10Ottomata: Update eventgate helmfile.d for eventgate 0.5 chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/722935 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [18:41:12] (03PS1) 10Dzahn: swift: convert swift-drive-audit cron to timer [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) [18:41:44] (03CR) 10jerkins-bot: [V: 04-1] swift: convert swift-drive-audit cron to timer [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:43:25] (03CR) 10Zabe: "See https://gerrit.wikimedia.org/r/c/operations/puppet/+/715597" [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:44:53] (03PS2) 10Dzahn: swift::storage: convert swift-drive-audit cron to timer [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) [18:45:28] (03CR) 10jerkins-bot: [V: 04-1] swift::storage: convert swift-drive-audit cron to timer [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:47:12] (03CR) 10Ebernhardson: query_service: support multiple variants of wdqs microsite (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/719502 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [18:50:23] (03Abandoned) 10Dzahn: swift::storage: convert swift-drive-audit cron to timer [puppet] - 10https://gerrit.wikimedia.org/r/722945 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [18:55:52] (03PS13) 10Ottomata: Eventgate: Symlink _helpers and _tls_helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [18:57:54] hashar: i plan to merge/deploy https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/722896 just prior to group1 promotion [18:58:03] cc DannyS712 [18:58:45] could have done that during the backport window i suppose, but... meetings [18:59:21] (03PS2) 10Herron: Revert "Revert "slo_dashboard: switch etcd request slo query to recording rule metrics"" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/722904 [19:00:04] dduvall and hashar: Your horoscope predicts another unfortunate MediaWiki train - American+European Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1900). [19:00:21] (03CR) 10Dduvall: [C: 03+2] Avoid $wgUser deprecation warnings [extensions/CentralAuth] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722896 (https://phabricator.wikimedia.org/T291515) (owner: 10Dduvall) [19:00:33] (03CR) 10Ottomata: "I think this will work. I will file some irrelevant naming objections on https://phabricator.wikimedia.org/T282148 :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [19:00:41] (03CR) 10Ottomata: [C: 03+1] Eventgate: Symlink _helpers and _tls_helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/722654 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [19:01:55] (03CR) 10Herron: "here's a grr preview for PS2 https://grafana.wikimedia.org/dashboard/snapshot/Yofie740XJd79mapc61xBxMcA9BgyPWb" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/722904 (owner: 10Herron) [19:02:55] (03PS1) 10Dzahn: profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) [19:03:25] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [19:04:36] (03Merged) 10jenkins-bot: Avoid $wgUser deprecation warnings [extensions/CentralAuth] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722896 (https://phabricator.wikimedia.org/T291515) (owner: 10Dduvall) [19:04:39] (03PS2) 10Dzahn: profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) [19:05:40] (03PS3) 10Dzahn: profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) [19:05:44] (03PS6) 10Ryan Kemper: blazegraph: LVS for WCQS step 1 [puppet] - 10https://gerrit.wikimedia.org/r/713959 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [19:05:51] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/713959 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [19:06:02] 10SRE, 10serviceops, 10Patch-For-Review: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) [19:08:15] (03PS5) 10Jelto: modules::gitlab add missing fields from ansible gitlab.rb template [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) [19:10:48] (03CR) 10Ssingh: "For reference, test results for CSP: https://observatory.mozilla.org/analyze/check.wikimedia-dns.org" [puppet] - 10https://gerrit.wikimedia.org/r/722926 (https://phabricator.wikimedia.org/T289536) (owner: 10Ssingh) [19:10:53] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:10:54] !log dduvall@deploy1002 Synchronized php-1.38.0-wmf.1/extensions/CentralAuth: Backport: [[gerrit:722896|Avoid $wgUser deprecation warnings (T291515)]] (duration: 01m 06s) [19:10:56] (03CR) 10Ladsgroup: [C: 03+1] "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [19:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:08] T291515: PHP Deprecated: $wgUser reassignment detected [Called from MediaWiki\Extension\CentralAuth\Special\SpecialCentralLogin::execute] - https://phabricator.wikimedia.org/T291515 [19:12:51] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:12:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:37] (03PS1) 10Dduvall: group1 wikis to 1.38.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722949 [19:14:39] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.38.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722949 (owner: 10Dduvall) [19:15:04] (03PS1) 10Dzahn: etcd::backup: convert backup cron to timer job [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) [19:15:46] (03Merged) 10jenkins-bot: group1 wikis to 1.38.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722949 (owner: 10Dduvall) [19:18:35] (03CR) 10Ladsgroup: etcd::backup: convert backup cron to timer job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [19:18:50] !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.1 [19:18:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:02] !log dduvall@deploy1002 Synchronized php: group1 wikis to 1.38.0-wmf.1 (duration: 01m 11s) [19:20:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:20] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:05] (03CR) 10Jelto: "thanks for the detailed explanation. I implemented the additional scruct in patch set 5. Could you take a look again if it's matching with" [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [19:22:19] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:22:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:36] (03PS6) 10Jelto: modules::gitlab add missing fields from ansible gitlab.rb template [puppet] - 10https://gerrit.wikimedia.org/r/722370 (https://phabricator.wikimedia.org/T283076) [19:29:57] !log 1.38.0-wmf.1 promoted to group1. no new errors or rising error rates (T281165) [19:30:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:03] T281165: 1.38.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T281165 [19:30:09] (03CR) 10Ppchelko: Update eventgate helmfile.d for eventgate 0.5 chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/722935 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [19:33:50] (03PS2) 10Ppchelko: Update eventgate helmfile.d for eventgate 0.5 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/722935 (https://phabricator.wikimedia.org/T291504) [19:38:07] (03CR) 10RLazarus: [C: 03+1] profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [19:42:58] (03CR) 10Dzahn: [C: 03+2] profile::mediawiki: remove comment references to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/722946 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [19:58:55] PROBLEM - SSH on sretest1002 is CRITICAL: connect to address 10.64.48.139 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:00:05] dduvall and hashar: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - American+European Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T1900). [20:00:05] chrisalbon and accraze: How many deployers does it take to do Services – Graphoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T2000). [20:00:19] sretest is stuck in d-i for an issue that will be fixed tomorrow, I'll take care of the alerts [20:05:15] (03CR) 10RLazarus: [C: 03+1] "LGTM, thanks!" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/722904 (owner: 10Herron) [20:05:32] (03CR) 10RLazarus: [C: 03+1] Revert "Revert "slo_dashboard: switch etcd request slo query to recording rule metrics"" (031 comment) [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/722904 (owner: 10Herron) [20:07:41] (03CR) 10Herron: opensearch: fork elasticsearch module into opensearch module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [20:08:02] (03PS1) 10Daniel Kinzler: Generate a .env file for use by ratelimiter [deployment-charts] - 10https://gerrit.wikimedia.org/r/722956 [20:09:16] (03CR) 10jerkins-bot: [V: 04-1] Generate a .env file for use by ratelimiter [deployment-charts] - 10https://gerrit.wikimedia.org/r/722956 (owner: 10Daniel Kinzler) [20:15:44] (03PS7) 10Ryan Kemper: blazegraph: LVS for WCQS step 1 [puppet] - 10https://gerrit.wikimedia.org/r/713959 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [20:16:08] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/713959 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [20:17:37] (03PS11) 10Dave Pifke: webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) [20:18:06] !log `[WCQS]` `wcqs1001` is ssh unreachable (https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wcqs1001&service=SSH), will try restarting from mgmt console [20:18:07] (03PS1) 10Ebernhardson: query_service: Adjust httpd conf to account for Alias's [puppet] - 10https://gerrit.wikimedia.org/r/722958 [20:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:28] !log `[WCQS]` Ran `racadm>>racadm serveraction powercycle` on `wcqs1001.mgmt.eqiad.wmnet` [20:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:02] (03CR) 10Ebernhardson: "Followup to I3159acd446c18c21533d0f54f664573cfc23f207 where it was noted that the patch turned /querybuilder which was a redirect to /quer" [puppet] - 10https://gerrit.wikimedia.org/r/722958 (owner: 10Ebernhardson) [20:25:23] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [20:32:36] (03PS2) 10Ryan Kemper: query_service: account for aliases in httpd conf [puppet] - 10https://gerrit.wikimedia.org/r/722958 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [20:32:52] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/722958 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [20:37:04] (03PS1) 10Kosta Harlan: GrowthExperiments: Place new dewiki accounts in control group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722961 (https://phabricator.wikimedia.org/T288420) [20:37:25] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [20:38:40] !log `[WCQS]` `wcqs1001.eqiad.wmnet` is reachable again following the powercycle [20:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:39:00] (03CR) 10Ottomata: [C: 03+1] Update eventgate helmfile.d for eventgate 0.5 chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/722935 (https://phabricator.wikimedia.org/T291504) (owner: 10Ppchelko) [20:39:01] RECOVERY - Check systemd state on wcqs1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:39:52] !log [WDQS] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/722958/ which should (hopefully) resolve an issue where https://query.wikidata.org/querybuilder gives a 404, whereas https://query.wikidata.org/querybuilder/ works (due to the trailing slash avoiding the rewrite regex) [20:39:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:43] (03CR) 10Ryan Kemper: [C: 03+1] "Looks great. Made some minor changes (commit subject line, adding bug, s/alias's/aliases, etc)" [puppet] - 10https://gerrit.wikimedia.org/r/722958 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [20:40:46] (03CR) 10Ryan Kemper: [C: 03+2] query_service: account for aliases in httpd conf [puppet] - 10https://gerrit.wikimedia.org/r/722958 (https://phabricator.wikimedia.org/T280247) (owner: 10Ebernhardson) [20:48:05] !log [WDQS] After puppet-merging, running puppet on `miscweb*`, and doing a `ryankemper@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder' | mwscript purgeList.php`, https://query.wikidata.org/querybuilder is working properly again [20:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:29] (03PS8) 10Michael DiPietro: create role to deploy staging instance for quarry [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) [20:49:03] (03CR) 10jerkins-bot: [V: 04-1] create role to deploy staging instance for quarry [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) (owner: 10Michael DiPietro) [20:50:11] (03CR) 10Cwhite: schemas - metrics: Add puppet keys to the metrics name space (031 comment) [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [20:52:53] (03PS9) 10Michael DiPietro: create role to deploy staging instance for quarry [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) [21:00:18] (03PS1) 10Cwhite: move version handling and template rendering to build step [software/ecs] - 10https://gerrit.wikimedia.org/r/722965 [21:00:20] (03PS1) 10Cwhite: add dynamic_templates template rendering [software/ecs] - 10https://gerrit.wikimedia.org/r/722966 [21:02:49] (03CR) 10Michael DiPietro: create role to deploy staging instance for quarry (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) (owner: 10Michael DiPietro) [21:13:30] (03CR) 10Cwhite: opensearch: fork elasticsearch module into opensearch module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [21:18:12] (03PS1) 10Sharvaniharan: Stream config changes for android_daily_stats schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722969 [21:19:49] (03CR) 10jerkins-bot: [V: 04-1] Stream config changes for android_daily_stats schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722969 (owner: 10Sharvaniharan) [21:20:12] (03CR) 10Sharvaniharan: "Please review the stream config changes for https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/722964" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722969 (owner: 10Sharvaniharan) [21:29:04] (03PS1) 10Sharvaniharan: Stream config changes for android_daily_stats schema Bug: T286000 Change-Id: Icbc8465a97fe9713b8321314d407573f0967488f [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722970 (https://phabricator.wikimedia.org/T286000) [21:29:44] (03CR) 10Cwhite: search-platform: add flink alerts (031 comment) [alerts] - 10https://gerrit.wikimedia.org/r/720066 (https://phabricator.wikimedia.org/T276467) (owner: 10DCausse) [21:31:05] (03PS1) 10Legoktm: services_proxy: Fix timeout for new Shellbox envoy proxies [puppet] - 10https://gerrit.wikimedia.org/r/722971 [21:31:35] (03PS1) 10Legoktm: mwdebug: Add new Shellbox envoyproxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/722972 [21:34:42] (03PS1) 10Jdlrobson: Hiding fallback button depends on HTML order [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722988 (https://phabricator.wikimedia.org/T291272) [21:54:26] (03CR) 10Legoktm: [C: 03+2] services_proxy: Fix timeout for new Shellbox envoy proxies [puppet] - 10https://gerrit.wikimedia.org/r/722971 (owner: 10Legoktm) [21:57:18] 10SRE, 10ops-eqiad, 10DC-Ops, 10Elasticsearch, 10Discovery-Search (Current work): Q4:(Need By: TBD) rack/setup/install elastic10[68-83].eqiad.wmnet - https://phabricator.wikimedia.org/T281989 (10Jclark-ctr) [21:57:56] 10SRE, 10ops-eqiad, 10DC-Ops, 10Elasticsearch, 10Discovery-Search (Current work): Q4:(Need By: TBD) rack/setup/install elastic10[68-83].eqiad.wmnet - https://phabricator.wikimedia.org/T281989 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [22:12:04] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Jclark-ctr) an-db1001 A6 U26 cableid1951 port 28 an-db1002 C5 U21 cableid1842 port 13 [22:12:13] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q1:(Need By: TBD) rack/setup/install puppetmaster100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T289732 (10Jclark-ctr) puppetmaster1004 A6 U27 cableid1950 port27 puppetmaster1005 C5 U22 cableid3409 port 23 [22:12:16] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q1:(Need By: TBD) rack/setup/install puppetmaster100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T289732 (10Jclark-ctr) [22:12:27] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q1:(Need By: TBD) rack/setup/install puppetmaster100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T289732 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [22:12:39] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Jclark-ctr) [22:12:58] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [22:14:23] (03CR) 10Sharvaniharan: "Please review the stream config changes for https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/722964" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722970 (https://phabricator.wikimedia.org/T286000) (owner: 10Sharvaniharan) [22:17:35] (03Abandoned) 10Sharvaniharan: Stream config changes for android_daily_stats schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/722969 (owner: 10Sharvaniharan) [22:17:39] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [22:29:27] (03PS2) 10Krinkle: Remove $wmgLogstashServers (step 1/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720385 [22:29:46] (03PS2) 10Krinkle: Remove $wmgLogstashServers (step 2/2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720386 [22:30:37] (03CR) 10Krinkle: Remove $wmgLogstashServers (step 1/2) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720385 (owner: 10Krinkle) [22:35:37] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [22:39:05] (03CR) 10Legoktm: [C: 03+2] mwdebug: Add new Shellbox envoyproxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/722972 (owner: 10Legoktm) [22:43:13] (03Merged) 10jenkins-bot: mwdebug: Add new Shellbox envoyproxies [deployment-charts] - 10https://gerrit.wikimedia.org/r/722972 (owner: 10Legoktm) [22:46:22] (03CR) 10Dzahn: etcd::backup: convert backup cron to timer job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [22:46:36] (03PS2) 10Dzahn: etcd::backup: convert backup cron to timer job [puppet] - 10https://gerrit.wikimedia.org/r/722950 (https://phabricator.wikimedia.org/T273673) [22:49:08] (03CR) 10Dzahn: [C: 03+2] mx: turn mail-exim-aliases into proper shell script with config file [puppet] - 10https://gerrit.wikimedia.org/r/722942 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [22:51:16] !log mx2001 - re-enabled puppet [22:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:55:44] (03CR) 10Cwhite: [C: 03+1] "Change looks good to me and seems sensible given the recent(ish) changes to the logging pipeline." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720385 (owner: 10Krinkle) [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for Evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210922T2300). [23:00:04] Seddon: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:02:36] (03PS1) 10Dzahn: mail::mx: move mail-alias-file config to class parameters [puppet] - 10https://gerrit.wikimedia.org/r/722981 (https://phabricator.wikimedia.org/T273673) [23:05:27] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: hdfs-cleaner-tmp.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:08:09] (03PS2) 10Dzahn: mail::mx: move mail-alias-file config to class parameters and hiera [puppet] - 10https://gerrit.wikimedia.org/r/722981 (https://phabricator.wikimedia.org/T273673) [23:12:29] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/31221/mx2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/722981 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [23:18:35] @urbanecm are you about? [23:25:23] \o/ [23:25:46] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/722974 [23:25:48] uh, hi [23:25:54] Hi [23:26:15] i guess you're backporting that? :o [23:27:41] OK so can someone catch me up on what's going on? [23:28:01] Based on Slack threads it seems that there's a double-parsing bug in MediaSearch that is somehow exposed by DiscussionTools [23:28:15] And there seems to be discussion as to whether we should revert the DT change or backport the MS change that Seddon linked [23:28:26] This just got merged: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/722974/ [23:29:02] so we should verify that things are working properly on beta commons in the next 10 min or so (may take a moment to show up there) [23:29:26] if so, we should be safe to backport that patch (and don't have to revert the DT one) [23:31:19] on the longer term, I have some questions about the correct way to handle messages here, and it also seems like some of the DT behavior is not really appropriate; we were parsing MediaSearch messages for an extra layer of security based on a recommendation from the security team, even though most did not contain HTML. Now that is apparently causing DT to insert comment tags which garbles them. Doesn't seem like the way [23:31:19] it should work, but I'm not very familiar with the feature [23:31:53] (03PS1) 10Dzahn: mail::mx: add missing = in mail-exim-aliases-config.erb [puppet] - 10https://gerrit.wikimedia.org/r/722985 [23:32:53] For correct behavior, you have to either use ->text() and then treat it as text (e.g. using {{ foo }} ), or use ->parse() and then treat it as HTML (e.g. using {{{ foo }}} or v-html="foo") [23:33:05] (03CR) 10Dzahn: [C: 03+2] mail::mx: add missing = in mail-exim-aliases-config.erb [puppet] - 10https://gerrit.wikimedia.org/r/722985 (owner: 10Dzahn) [23:33:19] If you use ->text() but treating it as HTML you get security problems; if you use ->parse() but treat it as text you get double-escaping [23:33:41] I also suggested eliminating Dwimmerlaik which might fix this as a side-effect: https://phabricator.wikimedia.org/T291590#7373188 [23:33:45] Where is this recommendation from the security team? [23:35:15] RoanKattouw: https://phabricator.wikimedia.org/T266513#6966784 but it's wrong [23:35:49] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaSearch/+/677030/ [23:36:02] (from April of this year) [23:36:10] or maybe, i should say, uninsightful [23:36:26] I guess "So if these are being sent to the browser, as they appear to be" is doing a lot of work there [23:36:50] yeah, if you escape everything, you won't get XSS vulnerabilities. but also, you'll display content incorrectly… [23:37:59] Since most of these messages were just plain strings I think there was no noticeable difference in output; the idea was if a nefarious change made it through the translatewiki process to try and introduce live HTML or something [23:37:59] Anyway -- it seems we have 3 options here [23:38:10] (03PS1) 10BryanDavis: toolhub: Add no_proxy envvar [deployment-charts] - 10https://gerrit.wikimedia.org/r/723006 (https://phabricator.wikimedia.org/T291447) [23:38:12] (03PS1) 10BryanDavis: toolhub: text-lb egress + no_proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/723007 (https://phabricator.wikimedia.org/T291447) [23:38:29] 1. Deploy Eric's change now [23:38:38] 2. Revert the DT change (which one?) now [23:39:03] 3. Switch RL from Dwimmerlaik to Special:Badtitle now [23:39:41] (03CR) 10jerkins-bot: [V: 04-1] toolhub: text-lb egress + no_proxy [deployment-charts] - 10https://gerrit.wikimedia.org/r/723007 (https://phabricator.wikimedia.org/T291447) (owner: 10BryanDavis) [23:39:56] For short term, let's just do 1 - the patch has been +2ed already and should be visible on Beta soon (where the underlying issue is also present) [23:40:10] #3 seems risky to me to do as an urgent deployment, I'd rather it go through the normal process [23:40:17] so we should be able to confirm that this patch fixes the problem on beta before deploying it to prod via backport [23:40:55] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 129 probes of 624 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:45:59] If you're going to output messages as raw HTML, please also add them to "RawHtmlMessages" in extension.json [23:47:00] I think #1 involves doing the opposite of that [23:47:15] These look like messages that were supposed to be plain text but were being parsed out of paranioa [23:47:32] ack, and I agree with your opinion on #3 [23:47:44] What seems somewhat risky to me about #1 is that we have to verify that these messages really are being used as text [23:47:45] (03CR) 10Albertoleoncio: Temporarily disable article editing by anonymous users on fawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) (owner: 10Huji) [23:47:45] Correct; we have a single real HTML message that wasn't impacted by this issue [23:47:58] I don't want to accidentally introduce an XSS in a backport window [23:48:17] Override the message locally to have some HTML in it, refresh and make sure you don't get pwnd? [23:48:55] that's also why I mentioend adding it to "RawHtmlMessages", which ups the protection level, so even if it is accidentally an XSS, normal admins can't edit it [23:49:05] Well sure, but the patch I'm being asked to backport touches like 20 messages [23:49:32] They all look like they're probably fine, and I'll do some verification [23:49:49] If there are too many concerns about this, the only alternative is to revert the DT change that triggered this problem [23:52:14] (03PS1) 10Dzahn: mail::mx: add host name and fix quoting in mail-exim-aliases [puppet] - 10https://gerrit.wikimedia.org/r/723010 [23:53:58] (03CR) 10Dzahn: [C: 03+2] mail::mx: add host name and fix quoting in mail-exim-aliases [puppet] - 10https://gerrit.wikimedia.org/r/723010 (owner: 10Dzahn) [23:54:27] Here is a search page on Beta commons that exposes the mangled messages bug: https://commons.wikimedia.beta.wmflabs.org/w/index.php?search=cat&title=Special:MediaSearch&type=image [23:54:51] According to Special:Version, the merged patch is not yet live on beta [23:55:03] but that should update in ~10 minutes or so if I understand correctly [23:56:17] (03CR) 10Dzahn: "/usr/local/bin/mail-exim-aliases works fine now in both MXs" [puppet] - 10https://gerrit.wikimedia.org/r/723010 (owner: 10Dzahn) [23:56:54] OK it was actually pretty easy to review all these for safety, they only go to a few places [23:57:08] Either to Vue or to one Mustache template, and both have very limited use of v-html / triple braces [23:57:11] So #1 looks good to me [23:58:03] I will put it on a test server so we don't have to wait for beta [23:58:16] sounds good [23:58:59] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 59 probes of 624 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:59:02] MatmaRex: Which branches do we need this backported to? Just wmf.1? [23:59:24] yes