[00:00:20] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) further update: identified the relevant files in the private repo that hold the userId and license key for... [00:05:19] (03CR) 10Legoktm: [C: 03+2] Hiding fallback button depends on HTML order [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722988 (https://phabricator.wikimedia.org/T291272) (owner: 10Jdlrobson) [00:05:52] (03CR) 10Dzahn: "Looks like it should work to actually use the netmask and get this down to a single line: 'toolhub'@'10.64.64.0/255.255.248.0'' https://" [puppet] - 10https://gerrit.wikimedia.org/r/723329 (https://phabricator.wikimedia.org/T271480) (owner: 10BryanDavis) [00:07:16] (03CR) 10Dzahn: "_if_ we actually need to allow all possible pod IPs which I'm not sure about" [puppet] - 10https://gerrit.wikimedia.org/r/723329 (https://phabricator.wikimedia.org/T271480) (owner: 10BryanDavis) [00:09:17] (03CR) 10Dzahn: production-m5.sql.erb: Update toolhub grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/723329 (https://phabricator.wikimedia.org/T271480) (owner: 10BryanDavis) [00:23:11] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) @phuedx Hi, I would like to compare the UserId, LicenseKey and ProductIds between what I see in production... [00:23:51] (03Merged) 10jenkins-bot: Hiding fallback button depends on HTML order [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/722988 (https://phabricator.wikimedia.org/T291272) (owner: 10Jdlrobson) [00:30:55] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) a:05wkandek→03Dzahn [00:34:31] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) 05Open→03In progress [00:35:06] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) p:05Medium→03High [00:36:34] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [00:36:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:05] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/resources/src/mediawiki.searchSuggest/searchSuggest.js: Hiding fallback button depends on HTML order (T291272) (duration: 00m 57s) [00:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:10] T291272: "Search" button disappears when clicking search box in Monobook - https://phabricator.wikimedia.org/T291272 [00:39:57] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [00:40:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:46:45] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) There is a mechanism that first downloads the database files centrally to the puppetmaster so that appser... [00:58:57] (03CR) 10Krinkle: [C: 03+1] "This LGTM, although it still doesn't explain why setting null direclty on these vars (as done in PS2 in InitialiseSettings-labs) is a prob" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/719619 (owner: 10Jdlrobson) [01:04:57] (03PS1) 10Dzahn: puppetmaster::geoip: temp. install a new MaxMind license in parallel [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) [01:05:37] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::geoip: temp. install a new MaxMind license in parallel [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [01:06:16] (03CR) 10Dzahn: [C: 04-2] "WIP" [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [01:08:18] (03PS1) 10Krinkle: profiler: Use the ProfilerXhprof 'running' option when profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723338 (https://phabricator.wikimedia.org/T247332) [01:08:58] (03CR) 10Dzahn: [C: 04-2] "arr yea, first this would have to become a defined type instead of a class" [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [01:09:07] legoktm: done right? [01:09:14] yep [01:09:21] (03CR) 10Krinkle: [C: 03+2] profiler: Use the ProfilerXhprof 'running' option when profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723338 (https://phabricator.wikimedia.org/T247332) (owner: 10Krinkle) [01:10:07] (03Merged) 10jenkins-bot: profiler: Use the ProfilerXhprof 'running' option when profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723338 (https://phabricator.wikimedia.org/T247332) (owner: 10Krinkle) [01:10:58] (03CR) 10Dzahn: [C: 04-2] "will do another approach then and add a switch to flip just a single canary server to a new license key" [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [01:16:38] !log krinkle@deploy1002 Synchronized wmf-config/profiler.php: I25f4b70b9d4b (duration: 00m 57s) [01:16:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:23:48] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [01:23:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:27:13] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [01:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:04:38] PROBLEM - Check systemd state on cumin2002 is CRITICAL: CRITICAL - degraded: The following units failed: generate_os_reports.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:29:46] 10SRE, 10Analytics, 10Event-Platform, 10Wikimedia-Logstash, and 2 others: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645 (10Marostegui) p:05Triage→03Medium [04:39:34] (03PS1) 10Marostegui: data.yaml: Add NaRay [puppet] - 10https://gerrit.wikimedia.org/r/723370 (https://phabricator.wikimedia.org/T291651) [04:39:36] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for NaRay - https://phabricator.wikimedia.org/T291651 (10Marostegui) p:05Triage→03Medium [04:43:08] (03CR) 10Marostegui: production-m5.sql.erb: Update toolhub grants (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/723329 (https://phabricator.wikimedia.org/T271480) (owner: 10BryanDavis) [04:43:31] (03CR) 10Marostegui: [C: 03+2] production-m5.sql.erb: Update toolhub grants [puppet] - 10https://gerrit.wikimedia.org/r/723329 (https://phabricator.wikimedia.org/T271480) (owner: 10BryanDavis) [04:48:56] PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:07:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1177 T291584', diff saved to https://phabricator.wikimedia.org/P17319 and previous config saved to /var/cache/conftool/dbconfig/20210924-050739-marostegui.json [05:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:47] T291584: Schema change for adding change_object_id index on wb_changes - https://phabricator.wikimedia.org/T291584 [05:10:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17320 and previous config saved to /var/cache/conftool/dbconfig/20210924-051050-root.json [05:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:43] (03CR) 10Marostegui: remove s10 references (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/708632 (https://phabricator.wikimedia.org/T167973) (owner: 10RhinosF1) [05:25:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17321 and previous config saved to /var/cache/conftool/dbconfig/20210924-052554-root.json [05:26:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:26:03] T291584: Schema change for adding change_object_id index on wb_changes - https://phabricator.wikimedia.org/T291584 [05:40:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17322 and previous config saved to /var/cache/conftool/dbconfig/20210924-054057-root.json [05:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:04] T291584: Schema change for adding change_object_id index on wb_changes - https://phabricator.wikimedia.org/T291584 [05:50:00] RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:56:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17323 and previous config saved to /var/cache/conftool/dbconfig/20210924-055601-root.json [05:56:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:56:08] T291584: Schema change for adding change_object_id index on wb_changes - https://phabricator.wikimedia.org/T291584 [06:11:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17324 and previous config saved to /var/cache/conftool/dbconfig/20210924-061105-root.json [06:11:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:12] T291584: Schema change for adding change_object_id index on wb_changes - https://phabricator.wikimedia.org/T291584 [06:21:54] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10serviceops-radar: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 (10akosiaris) >>! In T270071#7371192, @Volans wrote: > Has been a while since we discussed this but the problem still stands and I think we need to get som... [06:26:51] !log restart archiva on archiva1002 to pick up new openjdk upgrades [06:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:48] !log elukey@cumin1001 START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [06:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/723370 (https://phabricator.wikimedia.org/T291651) (owner: 10Marostegui) [06:34:13] (03CR) 10Marostegui: [C: 03+2] data.yaml: Add NaRay [puppet] - 10https://gerrit.wikimedia.org/r/723370 (https://phabricator.wikimedia.org/T291651) (owner: 10Marostegui) [06:37:15] 10SRE, 10LDAP-Access-Requests, 10Patch-For-Review: Grant Access to ldap/wmf for NaRay - https://phabricator.wikimedia.org/T291651 (10Marostegui) 05Open→03Resolved a:03Marostegui This is done [06:41:14] !log elukey@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [06:41:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:14] !log elukey@cumin1001 START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [06:44:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:23] !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [06:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:11] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [06:55:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:06] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210924T0700) [07:00:50] !log elukey@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [07:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:23] !log elukey@cumin1001 END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [07:01:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:01:40] !log elukey@cumin1001 START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [07:01:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:41] !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts mx1002.wikimedia.org [07:11:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:25] (03PS1) 10Muehlenhoff: Fix typos [cookbooks] - 10https://gerrit.wikimedia.org/r/723416 [07:17:19] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [07:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:26] (03PS1) 10Muehlenhoff: ganeti: Use --force in shutdown [software/spicerack] - 10https://gerrit.wikimedia.org/r/723417 [07:28:55] (03CR) 10Muehlenhoff: [C: 03+2] Fix typos [cookbooks] - 10https://gerrit.wikimedia.org/r/723416 (owner: 10Muehlenhoff) [07:33:35] (03CR) 10jerkins-bot: [V: 04-1] ganeti: Use --force in shutdown [software/spicerack] - 10https://gerrit.wikimedia.org/r/723417 (owner: 10Muehlenhoff) [07:34:23] !log elukey@cumin1001 START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [07:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:47] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1002.wikimedia.org [07:42:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:55] 10SRE, 10Infrastructure-Foundations, 10Mail, 10Patch-For-Review: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `mx1002.wikimedia.org` - mx1002.wikimedia.org (**WARN**) - //Host not found on... [07:45:12] (03PS2) 10Muehlenhoff: ganeti: Use --force in shutdown [software/spicerack] - 10https://gerrit.wikimedia.org/r/723417 [07:53:36] (03PS1) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [07:58:32] (03PS2) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [07:59:59] !log jmm@cumin2002 START - Cookbook sre.hosts.decommission for hosts mx2002.wikimedia.org [08:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:02:07] (03PS3) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [08:06:39] (03PS4) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [08:08:54] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2002.wikimedia.org [08:08:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:03] 10SRE, 10Infrastructure-Foundations, 10Mail, 10Patch-For-Review: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: `mx2002.wikimedia.org` - mx2002.wikimedia.org (**PASS**) - Downtimed host on I... [08:13:21] (03PS1) 10Muehlenhoff: Remove mx1002/mx2002 [puppet] - 10https://gerrit.wikimedia.org/r/723421 (https://phabricator.wikimedia.org/T286911) [08:16:00] (03PS1) 10Muehlenhoff: acmechief: Remove mx2002 [puppet] - 10https://gerrit.wikimedia.org/r/723422 (https://phabricator.wikimedia.org/T286911) [08:18:24] (03PS5) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [08:18:56] (03PS1) 10Muehlenhoff: Revert "discard traffic to mx2002 tcp/25" [homer/public] - 10https://gerrit.wikimedia.org/r/723423 [08:20:18] !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [08:20:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:27] 10SRE-tools, 10Infrastructure-Foundations: Introduce Spicerack.kafka module, along with the method to transfer offset state between consumer groups and clusters - https://phabricator.wikimedia.org/T291681 (10Zbyszko) [08:21:42] 10SRE-tools, 10Infrastructure-Foundations: Introduce Spicerack.kafka module, along with the method to transfer offset state between consumer groups and clusters - https://phabricator.wikimedia.org/T291681 (10Zbyszko) Configuration is added in this patch - https://gerrit.wikimedia.org/r/c/operations/puppet/+/72... [08:22:19] !log upgrade and restart db2098 T290868 [08:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:26] T290868: Upgrade s8 to Buster + MariaDB 10.4 - https://phabricator.wikimedia.org/T290868 [08:23:02] 10SRE-tools, 10Infrastructure-Foundations: Introduce Spicerack.kafka module, along with the method to transfer offset state between consumer groups and clusters - https://phabricator.wikimedia.org/T291681 (10Zbyszko) [08:23:58] (03PS6) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [08:25:02] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31256/console" [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [08:26:04] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/723310 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [08:31:37] 10SRE-tools, 10Infrastructure-Foundations: sre.decom.host fails if the mgmt interface's DNS records have already been removed - https://phabricator.wikimedia.org/T268965 (10Volans) 05Open→03Resolved The cookbook is now falling back to the asset-tag based management record in case the hostname-based one fai... [08:33:34] 10SRE, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Radar): Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10fgiunchedi) 05Resolved→03Open I don't think this is resolved, see T275752 for jobrunner on buster slowness in upload [08:33:44] 10SRE, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10fgiunchedi) [08:34:35] (03PS1) 10Muehlenhoff: Remove wiki-mail-codfw [dns] - 10https://gerrit.wikimedia.org/r/723429 [08:35:48] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [08:35:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:13] (03PS13) 10Jbond: P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) [08:42:47] (03CR) 10jerkins-bot: [V: 04-1] P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [08:44:47] (03CR) 10Muehlenhoff: [C: 03+2] Remove wiki-mail-codfw [dns] - 10https://gerrit.wikimedia.org/r/723429 (owner: 10Muehlenhoff) [08:45:45] 10Puppet, 10Infrastructure-Foundations: Temporary failures for prometheus_puppet_agent_stats - https://phabricator.wikimedia.org/T290726 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Yes @jbond I'll resolve this! Let's followup re: removing git_sha altogether from prometheus metrics since it is in logs... [08:46:06] (03PS1) 10Muehlenhoff: Revert "Remove wiki-mail-codfw" [dns] - 10https://gerrit.wikimedia.org/r/723431 [08:48:06] (03CR) 10Muehlenhoff: [C: 03+2] Revert "Remove wiki-mail-codfw" [dns] - 10https://gerrit.wikimedia.org/r/723431 (owner: 10Muehlenhoff) [08:48:45] (03PS3) 10MarcoAurelio: [viwikibooks] Set $wgRestrictDisplayTitle to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721839 (https://phabricator.wikimedia.org/T289837) [08:49:34] (03CR) 10Muehlenhoff: [C: 03+2] Remove mx1002/mx2002 [puppet] - 10https://gerrit.wikimedia.org/r/723421 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [08:58:49] (03PS7) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [08:59:11] (03PS1) 10Muehlenhoff: Remove wiki-mail-codfw [dns] - 10https://gerrit.wikimedia.org/r/723432 [08:59:22] (03CR) 10jerkins-bot: [V: 04-1] profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [09:03:03] !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001 [09:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:34] (03PS1) 10Muehlenhoff: Revert "Prefer codfw wiki smarthost over eqiad one for mx1001 reimage" [puppet] - 10https://gerrit.wikimedia.org/r/723433 (https://phabricator.wikimedia.org/T286911) [09:07:12] (03PS2) 10Muehlenhoff: Revert "Prefer codfw wiki smarthost over eqiad one for mx1001 reimage" [puppet] - 10https://gerrit.wikimedia.org/r/723433 (https://phabricator.wikimedia.org/T286911) [09:09:18] !log upgrade and restart db2139, db2101 [09:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:00] (03PS8) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [09:11:17] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31259/console" [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [09:13:27] (03PS1) 10Muehlenhoff: Revert "Prefer mx2001 over mx1001 for internal smarthosts" [puppet] - 10https://gerrit.wikimedia.org/r/723434 (https://phabricator.wikimedia.org/T286911) [09:18:48] (03PS2) 10Muehlenhoff: Revert "Prefer mx2001 over mx1001 for internal smarthosts" [puppet] - 10https://gerrit.wikimedia.org/r/723434 (https://phabricator.wikimedia.org/T286911) [09:25:03] !log elukey@cumin1001 START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [09:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:14] !log Rename flaggedimages on db1096(ruwiki) and db1098(arwiki) T290340 [09:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:19] T290340: Drop the flaggedimages table from Wikimedia production - https://phabricator.wikimedia.org/T290340 [09:29:23] !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001 [09:29:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:10] !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001 [09:32:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:20] (03PS1) 10Jbond: P:base: move git configuration to standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/723436 [09:37:36] (03PS2) 10Jbond: P:base: move git configuration to standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/723436 [09:39:24] !log upgrade and restart db2099 [09:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:04] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/723436 (owner: 10Jbond) [09:43:45] (03CR) 10Muehlenhoff: [C: 03+2] Revert "discard traffic to mx2002 tcp/25" [homer/public] - 10https://gerrit.wikimedia.org/r/723423 (owner: 10Muehlenhoff) [09:46:10] 10SRE, 10Acme-chief, 10Patch-For-Review: acme-chief is down: ValueError: OCSP response status is not successful so the property has no value - https://phabricator.wikimedia.org/T282490 (10Marostegui) Is this ok to close this? [09:46:32] 10SRE, 10SRE-swift-storage, 10Wikimedia-Incident: ms-be1062 fell off the network, causing swift timeouts - https://phabricator.wikimedia.org/T281107 (10Marostegui) Can this be closed? Or at least lowered its priority? [09:49:07] 10SRE, 10Data-Services, 10Infrastructure-Foundations, 10Traffic, and 2 others: wikireplicas last-minute infra work to discuss / resolve - https://phabricator.wikimedia.org/T273248 (10Marostegui) p:05High→03Medium Can this be closed? [09:50:57] (03CR) 10Hashar: [V: 03+2 C: 03+2] Merge branch 'stable-3.3' into wmf/stable-3.3 [software/gerrit/plugins/gitiles] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716484 (owner: 10Hashar) [09:51:50] (03CR) 10Elukey: "Left two nits but overall it is really good :) Nice DRYed hiera config!" [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [09:52:42] 10SRE, 10Icinga, 10SRE Observability, 10observability, 10serviceops: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10Marostegui) p:05High→03Medium What should we do with this task? (I don't think this is high anymore) [09:54:46] 10SRE: Connecting to https://api.svc.codfw.wmnet/ does not work - https://phabricator.wikimedia.org/T285517 (10Marostegui) What's the status of this? [09:55:38] 10SRE, 10serviceops: Put rdb20[09|10] into service - https://phabricator.wikimedia.org/T281225 (10Marostegui) @akosiaris was this completed? [09:56:19] (03PS2) 10Hashar: Merge tag 'v3.3.6' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716485 (https://phabricator.wikimedia.org/T290236) [09:57:36] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10observability: HP RAID failed on ms-be1054 didn't open a task - https://phabricator.wikimedia.org/T269563 (10Marostegui) 05Open→03Resolved a:03jbond I am closing this for now - reopen if it is not fixed [09:58:42] (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v3.3.6' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716485 (https://phabricator.wikimedia.org/T290236) (owner: 10Hashar) [10:04:17] (03PS14) 10Jbond: P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) [10:04:19] (03PS1) 10Jbond: P:debdeploy: make ensureable and set absent on cloud [puppet] - 10https://gerrit.wikimedia.org/r/723466 [10:04:48] (03CR) 10jerkins-bot: [V: 04-1] P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [10:04:54] (03PS2) 10Jbond: P:debdeploy: make ensureable and set absent on cloud [puppet] - 10https://gerrit.wikimedia.org/r/723466 [10:04:56] (03PS15) 10Jbond: P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) [10:05:42] (03CR) 10jerkins-bot: [V: 04-1] P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [10:05:53] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31261/console" [puppet] - 10https://gerrit.wikimedia.org/r/723466 (owner: 10Jbond) [10:07:36] 10SRE, 10Gerrit: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Marostegui) @Dzahn did this ever happen? Or should we maybe ping specific people to make it happen? [10:09:03] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila: refresh configuration file [puppet] - 10https://gerrit.wikimedia.org/r/723470 (https://phabricator.wikimedia.org/T291257) [10:09:19] (03PS1) 10Effie Mouzeli: Add missing PTR records for mwdebug and tegola-vector-tiles [dns] - 10https://gerrit.wikimedia.org/r/723471 [10:10:30] (03PS4) 10Effie Mouzeli: conftool-data: add tegola-vector-tiles discovery 1 [puppet] - 10https://gerrit.wikimedia.org/r/704949 (https://phabricator.wikimedia.org/T283159) [10:11:50] !log btullis@cumin1001 END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001 [10:11:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:45] (03PS3) 10Jbond: P:debdeploy: make ensureable and set absent on cloud [puppet] - 10https://gerrit.wikimedia.org/r/723466 [10:14:25] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31262/console" [puppet] - 10https://gerrit.wikimedia.org/r/723466 (owner: 10Jbond) [10:15:10] (03CR) 10Jbond: [V: 03+1 C: 03+2] P:debdeploy: make ensureable and set absent on cloud [puppet] - 10https://gerrit.wikimedia.org/r/723466 (owner: 10Jbond) [10:15:57] (03PS1) 10Muehlenhoff: Configure a few domains with equal weights for mx1001/mx2001 [dns] - 10https://gerrit.wikimedia.org/r/723473 (https://phabricator.wikimedia.org/T286911) [10:16:16] !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [10:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:27] (03PS16) 10Jbond: P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) [10:17:41] (03CR) 10jerkins-bot: [V: 04-1] P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [10:19:02] (03PS5) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 1 [puppet] - 10https://gerrit.wikimedia.org/r/704949 (https://phabricator.wikimedia.org/T283159) [10:19:10] (03PS17) 10Jbond: P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) [10:19:52] (03CR) 10jerkins-bot: [V: 04-1] P:base: move production specific code to there own profile [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [10:20:05] (03PS1) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 2 [puppet] - 10https://gerrit.wikimedia.org/r/723476 (https://phabricator.wikimedia.org/T283159) [10:21:43] (03PS1) 10Jbond: P:debdeploy::client: bass through ensure [puppet] - 10https://gerrit.wikimedia.org/r/723480 [10:21:45] (03PS1) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 3 [puppet] - 10https://gerrit.wikimedia.org/r/723481 (https://phabricator.wikimedia.org/T283159) [10:22:22] (03CR) 10David Caro: [C: 03+1] P:debdeploy::client: bass through ensure [puppet] - 10https://gerrit.wikimedia.org/r/723480 (owner: 10Jbond) [10:23:45] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila: refresh configuration file [puppet] - 10https://gerrit.wikimedia.org/r/723470 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [10:24:56] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/723480 (owner: 10Jbond) [10:25:18] (03PS1) 10Muehlenhoff: Configure remaining domains with equal weights for mx1001/mx2001 [dns] - 10https://gerrit.wikimedia.org/r/723482 (https://phabricator.wikimedia.org/T286911) [10:26:20] (03CR) 10Jbond: [C: 03+2] P:debdeploy::client: bass through ensure [puppet] - 10https://gerrit.wikimedia.org/r/723480 (owner: 10Jbond) [10:26:42] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila: typo in parameter name [puppet] - 10https://gerrit.wikimedia.org/r/723483 [10:26:55] (03PS1) 10Effie Mouzeli: Discovery record for tegola-vector-tile [dns] - 10https://gerrit.wikimedia.org/r/723484 [10:27:18] (03PS1) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 4 [puppet] - 10https://gerrit.wikimedia.org/r/723485 (https://phabricator.wikimedia.org/T283159) [10:28:17] (03PS1) 10Muehlenhoff: profile::mail::mx: Remove OS checks [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) [10:30:08] (03CR) 10jerkins-bot: [V: 04-1] profile::mail::mx: Remove OS checks [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [10:30:44] (03PS2) 10Muehlenhoff: profile::mail::mx: Remove OS checks [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) [10:31:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila: typo in parameter name [puppet] - 10https://gerrit.wikimedia.org/r/723483 (owner: 10Arturo Borrero Gonzalez) [10:31:13] (03CR) 10Giuseppe Lavagetto: [C: 03+1] conftool-data: tegola-vector-tiles LVS 1 [puppet] - 10https://gerrit.wikimedia.org/r/704949 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [10:31:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] conftool-data: tegola-vector-tiles LVS 2 [puppet] - 10https://gerrit.wikimedia.org/r/723476 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [10:32:00] (03CR) 10Giuseppe Lavagetto: [C: 03+1] conftool-data: tegola-vector-tiles LVS 3 [puppet] - 10https://gerrit.wikimedia.org/r/723481 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [10:32:13] (03CR) 10Giuseppe Lavagetto: [C: 03+1] conftool-data: tegola-vector-tiles LVS 4 [puppet] - 10https://gerrit.wikimedia.org/r/723485 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [10:33:23] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [10:35:19] (03CR) 10Effie Mouzeli: [C: 03+2] conftool-data: tegola-vector-tiles LVS 1 [puppet] - 10https://gerrit.wikimedia.org/r/704949 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [10:36:03] (03PS1) 10David Caro: stdlib::ensure: Add package support [puppet] - 10https://gerrit.wikimedia.org/r/723488 [10:38:18] PROBLEM - Widespread puppet agent failures- no resources reported on alert1001 is CRITICAL: 0.1221 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [10:39:48] (03CR) 10Hashar: "recheck after adding python3-distutils to provide distutils.spawn ( https://gerrit.wikimedia.org/r/c/integration/config/+/723477 )" [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716485 (https://phabricator.wikimedia.org/T290236) (owner: 10Hashar) [10:40:18] (03PS1) 10Jbond: debdeploy: fix ensure type [puppet] - 10https://gerrit.wikimedia.org/r/723489 [10:41:45] (03CR) 10jerkins-bot: [V: 04-1] debdeploy: fix ensure type [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [10:42:10] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [10:43:04] (03PS1) 10Btullis: Increase the number of service handler threads [puppet] - 10https://gerrit.wikimedia.org/r/723490 (https://phabricator.wikimedia.org/T275767) [10:44:19] !log corrupting and fixing image metadata on testwiki before running script on commons T290462 [10:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:26] T290462: Certain TIFF files get no thumbnail and misidentified dimensions ("0 × 0 pixels") and number of pages (0 pages) - https://phabricator.wikimedia.org/T290462 [10:44:57] (03PS1) 10Jbond: debdeploy: fix ensure paramerter [puppet] - 10https://gerrit.wikimedia.org/r/723491 [10:45:20] (03CR) 10Jbond: "tried to be too cheecky, this fails pcc https://puppet-compiler.wmflabs.org/compiler1002/31264/" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [10:45:44] (03CR) 10Jbond: [V: 03+2 C: 03+2] debdeploy: fix ensure paramerter [puppet] - 10https://gerrit.wikimedia.org/r/723491 (owner: 10Jbond) [10:46:02] (03CR) 10David Caro: [C: 03+1] debdeploy: fix ensure paramerter [puppet] - 10https://gerrit.wikimedia.org/r/723491 (owner: 10Jbond) [10:48:09] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) [10:48:39] (03CR) 10jerkins-bot: [V: 04-1] openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [10:49:32] (03PS3) 10Muehlenhoff: profile::mail::mx: Remove OS checks [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) [10:49:39] (03PS1) 10Jbond: P:debdeploy: wmflib not stdlib [puppet] - 10https://gerrit.wikimedia.org/r/723493 [10:50:47] 10SRE, 10Infrastructure-Foundations, 10Mail, 10Patch-For-Review: Upgrade MXes to Bullseye - https://phabricator.wikimedia.org/T286911 (10MoritzMuehlenhoff) The two VMs (mx1002/mx2002) which were used to test the Bullseye setup have been taken down. [10:51:42] (03CR) 10Jbond: [C: 03+2] P:debdeploy: wmflib not stdlib [puppet] - 10https://gerrit.wikimedia.org/r/723493 (owner: 10Jbond) [10:53:54] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [10:59:12] (03PS9) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [11:02:48] (03PS2) 10Jbond: debdeploy: fix ensure type [puppet] - 10https://gerrit.wikimedia.org/r/723489 [11:04:21] (03CR) 10jerkins-bot: [V: 04-1] debdeploy: fix ensure type [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [11:05:22] (03PS3) 10Jbond: debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 [11:05:22] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-heavy-queries_8888: Servers wdqs1007.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1004.eqiad.wmnet are marked down but pooled: wdqs-ssl_443: Servers wdqs1007.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1004.eqiad.wmnet are marked down but pooled: wdqs_80: Servers wdqs1007.eqiad.wmnet, wdqs1012.eqiad.wmnet, wdqs1004.eqiad.wmnet are marked down but pooled ht [11:05:22] kitech.wikimedia.org/wiki/PyBal [11:05:28] (03PS10) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [11:05:48] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-heavy-queries_8888: Servers wdqs1012.eqiad.wmnet, wdqs1013.eqiad.wmnet are marked down but pooled: wdqs-ssl_443: Servers wdqs1012.eqiad.wmnet, wdqs1013.eqiad.wmnet are marked down but pooled: wdqs_80: Servers wdqs1012.eqiad.wmnet, wdqs1013.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:06:04] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31266/console" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [11:06:40] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31267/console" [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [11:07:16] (03CR) 10jerkins-bot: [V: 04-1] debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [11:07:58] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:08:55] (03CR) 10Effie Mouzeli: [C: 03+2] conftool-data: tegola-vector-tiles LVS 2 [puppet] - 10https://gerrit.wikimedia.org/r/723476 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [11:09:11] (03PS2) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 2 [puppet] - 10https://gerrit.wikimedia.org/r/723476 (https://phabricator.wikimedia.org/T283159) [11:09:24] (03PS2) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 3 [puppet] - 10https://gerrit.wikimedia.org/r/723481 (https://phabricator.wikimedia.org/T283159) [11:09:35] (03PS2) 10Effie Mouzeli: conftool-data: tegola-vector-tiles LVS 4 [puppet] - 10https://gerrit.wikimedia.org/r/723485 (https://phabricator.wikimedia.org/T283159) [11:09:36] RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:13:52] (03PS4) 10Jbond: debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 [11:14:27] (03CR) 10Jbond: "ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [11:15:24] (03CR) 10jerkins-bot: [V: 04-1] debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [11:17:22] ACKNOWLEDGEMENT - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.60:4105]) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:22] ACKNOWLEDGEMENT - PyBal connections to etcd on lvs1015 is CRITICAL: CRITICAL: 72 connections established with conf1004.eqiad.wmnet:4001 (min=73) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:22] ACKNOWLEDGEMENT - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.60:4105]) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:22] ACKNOWLEDGEMENT - PyBal IPVS diff check on lvs2009 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.60:4105]) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:22] ACKNOWLEDGEMENT - PyBal connections to etcd on lvs2009 is CRITICAL: CRITICAL: 64 connections established with conf2004.codfw.wmnet:4001 (min=65) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:23] ACKNOWLEDGEMENT - PyBal IPVS diff check on lvs2010 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.60:4105]) Effie Mouzeli Setting up tegola https://wikitech.wikimedia.org/wiki/PyBal [11:17:57] !log restart pybal in low traffic load balancers [11:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:54] RECOVERY - Widespread puppet agent failures- no resources reported on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [11:21:53] (03CR) 10David Caro: P:base: move production specific code to there own profile (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [11:25:00] (03CR) 10Effie Mouzeli: [C: 03+2] conftool-data: tegola-vector-tiles LVS 3 [puppet] - 10https://gerrit.wikimedia.org/r/723481 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [11:25:30] (03PS2) 10Arturo Borrero Gonzalez: openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) [11:26:04] (03CR) 10jerkins-bot: [V: 04-1] openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [11:29:53] (03CR) 10Effie Mouzeli: [C: 03+2] conftool-data: tegola-vector-tiles LVS 4 [puppet] - 10https://gerrit.wikimedia.org/r/723485 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [11:30:16] (03PS3) 10Arturo Borrero Gonzalez: openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) [11:30:42] (03PS1) 10Jbond: C:base::monitoring::host: Add type definisions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [11:31:16] (03CR) 10jerkins-bot: [V: 04-1] C:base::monitoring::host: Add type definisions [puppet] - 10https://gerrit.wikimedia.org/r/723494 (owner: 10Jbond) [11:32:45] !log uploading scap-4.0.0 to buster-wikimedia and stretch-wikimedia [11:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:30] (03CR) 10Effie Mouzeli: [C: 03+2] Discovery record for tegola-vector-tile [dns] - 10https://gerrit.wikimedia.org/r/723484 (owner: 10Effie Mouzeli) [11:38:27] (03PS2) 10Jbond: C:base::monitoring::host: Add type definisions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [11:39:50] !log jiji@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=tegola-vector-tiles [11:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:24] jouncebot: now [11:55:25] For the next 19 hour(s) and 4 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210924T0700) [11:56:14] (03PS3) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [11:56:16] (03PS1) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [11:58:35] !log upgrading scap on canaries - T291095 [11:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:42] T291095: Deploy Scap version 4.0.0 - https://phabricator.wikimedia.org/T291095 [11:58:46] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:00:42] PROBLEM - Host mx2002 is DOWN: PING CRITICAL - Packet loss = 100% [12:01:00] (03PS1) 10BBlack: Switch all edge unified certs to digicert-2020 [puppet] - 10https://gerrit.wikimedia.org/r/723510 [12:02:18] (03CR) 10BBlack: [C: 04-1] "The -1 here is just a preventive in case someone mis-clicks or doesn't read the commit message and related information thoroughly. Feel f" [puppet] - 10https://gerrit.wikimedia.org/r/723510 (owner: 10BBlack) [12:05:15] 10SRE, 10serviceops: Put rdb20[09|10] into service - https://phabricator.wikimedia.org/T281225 (10akosiaris) 05Open→03Resolved a:03akosiaris >>! In T281225#7376168, @Marostegui wrote: > @akosiaris was this completed? Looks like it. Resolving. [12:11:16] (03PS1) 10Alexandros Kosiaris: Remove rdb200[3456] cruft [puppet] - 10https://gerrit.wikimedia.org/r/723511 (https://phabricator.wikimedia.org/T273140) [12:13:07] (03PS4) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [12:13:20] (03PS2) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [12:15:06] (03CR) 10Jbond: [C: 03+2] P:base: move git configuration to standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/723436 (owner: 10Jbond) [12:15:48] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:17:38] (03CR) 10Jbond: [C: 03+1] "LGTM" [dns] - 10https://gerrit.wikimedia.org/r/723473 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [12:17:50] (03CR) 10Alexandros Kosiaris: [C: 03+2] Remove rdb200[3456] cruft [puppet] - 10https://gerrit.wikimedia.org/r/723511 (https://phabricator.wikimedia.org/T273140) (owner: 10Alexandros Kosiaris) [12:18:24] (03CR) 10Jbond: [C: 03+1] Configure remaining domains with equal weights for mx1001/mx2001 [dns] - 10https://gerrit.wikimedia.org/r/723482 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [12:18:45] (03PS1) 10Muehlenhoff: Extend access for aikochou [puppet] - 10https://gerrit.wikimedia.org/r/723512 [12:20:04] (03PS5) 10Jbond: debdeploy: set autostarts to include debdeploy client [puppet] - 10https://gerrit.wikimedia.org/r/723489 [12:20:39] (03PS5) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [12:20:48] (03PS3) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [12:23:20] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:27:36] (03PS4) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [12:28:27] (03CR) 10Jbond: [C: 03+1] Extend access for aikochou [puppet] - 10https://gerrit.wikimedia.org/r/723512 (owner: 10Muehlenhoff) [12:29:45] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:30:05] (03PS5) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [12:30:51] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31273/console" [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:32:04] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:32:34] (03PS2) 10Hashar: Gerrit v3.3.6 and rebuild plugins [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716317 (https://phabricator.wikimedia.org/T290236) [12:32:47] (03CR) 10Hashar: [C: 03+2] Merge tag 'v3.3.6' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716485 (https://phabricator.wikimedia.org/T290236) (owner: 10Hashar) [12:35:19] (03CR) 10Muehlenhoff: [C: 03+2] Extend access for aikochou [puppet] - 10https://gerrit.wikimedia.org/r/723512 (owner: 10Muehlenhoff) [12:36:33] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/723489 (owner: 10Jbond) [12:38:36] (03PS6) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [12:38:38] (03PS6) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [12:38:40] (03PS1) 10Jbond: apereo_cas: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [12:39:19] 10SRE, 10MW-on-K8s, 10Performance-Team, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10akosiaris) >>! In T290536#7371552, @jijiki wrote: >>>! In T290536#7364817, @Joe wrote: >> I have some alternative ideas. Specifically, right now we have a... [12:40:06] (03Merged) 10jenkins-bot: Merge tag 'v3.3.6' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716485 (https://phabricator.wikimedia.org/T290236) (owner: 10Hashar) [12:41:21] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [12:44:00] (03CR) 10Hashar: "I have rebuild all the plugins and pushed the .jar to Archiva. Gerrit boots locally and the plugin list looks correct ;)" [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/716317 (https://phabricator.wikimedia.org/T290236) (owner: 10Hashar) [12:45:15] (03CR) 10Btullis: [C: 03+2] Enable the kerberos auto-renew service for stat nodes [puppet] - 10https://gerrit.wikimedia.org/r/722352 (https://phabricator.wikimedia.org/T268985) (owner: 10Btullis) [12:51:26] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Merge already 😊" [puppet] - 10https://gerrit.wikimedia.org/r/712123 (https://phabricator.wikimedia.org/T288028) (owner: 10Muehlenhoff) [12:52:50] (03PS1) 10Kosta Harlan: GrowthExperiments: Enable AddLink for next round of wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723517 (https://phabricator.wikimedia.org/T290011) [12:53:35] (03PS4) 10Alexandros Kosiaris: docker: add security updates to Bullseye base image [puppet] - 10https://gerrit.wikimedia.org/r/720241 (https://phabricator.wikimedia.org/T283165) (owner: 10Hashar) [12:54:03] (03PS5) 10Alexandros Kosiaris: docker: add security updates to Bullseye base image [puppet] - 10https://gerrit.wikimedia.org/r/720241 (owner: 10Hashar) [12:54:58] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10akosiaris) >>! In T283165#7376396, @gerritbot wrote: > Change 720241 had a related patch set uploaded (by Alexandros Kosiaris; a... [12:57:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila: separate manila-share service into a different role [puppet] - 10https://gerrit.wikimedia.org/r/723492 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [13:00:30] (03CR) 10Alexandros Kosiaris: [C: 04-2] "stdlib is an upstream project (https://github.com/puppetlabs/puppetlabs-stdlib) and we import it as is without doing modifications to it. " [puppet] - 10https://gerrit.wikimedia.org/r/723488 (owner: 10David Caro) [13:00:53] (03PS2) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:00:55] (03PS1) 10Jbond: apt::ping: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723519 [13:01:11] (03PS2) 10Jbond: apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723519 [13:01:37] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:02:17] (03PS3) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:02:42] (03CR) 10Alexandros Kosiaris: [C: 03+1] ganeti: Use --force in shutdown [software/spicerack] - 10https://gerrit.wikimedia.org/r/723417 (owner: 10Muehlenhoff) [13:03:04] (03CR) 10jerkins-bot: [V: 04-1] apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723519 (owner: 10Jbond) [13:03:08] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:03:13] (03CR) 10jerkins-bot: [V: 04-1] apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723519 (owner: 10Jbond) [13:03:34] (03Abandoned) 10Jbond: apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723519 (owner: 10Jbond) [13:05:01] (03PS4) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:05:03] (03PS1) 10Jbond: apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723520 [13:06:19] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:07:28] (03CR) 10jerkins-bot: [V: 04-1] apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723520 (owner: 10Jbond) [13:07:32] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila_sharecontroller: require Debian Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/723521 (https://phabricator.wikimedia.org/T291257) [13:08:58] (03PS5) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:09:50] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:10:45] moritzm: I see you decommissioned mx2002, but it still alerted? :-/ [13:11:44] (03CR) 10Elukey: "LGTM! Little nit -t maybe let's add a reference to hadoop in the first line of the commit msg (like Increase the number of the Hadoop HDFS" [puppet] - 10https://gerrit.wikimedia.org/r/723490 (https://phabricator.wikimedia.org/T275767) (owner: 10Btullis) [13:12:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] create role to deploy staging instance for quarry [puppet] - 10https://gerrit.wikimedia.org/r/721585 (https://phabricator.wikimedia.org/T291204) (owner: 10Michael DiPietro) [13:12:15] (03PS11) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [13:12:55] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila_sharecontroller: require Debian Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/723521 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [13:15:07] ACKNOWLEDGEMENT - Host mx2002 is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T286911#7376246 [13:15:24] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila_sharecontroller: fix typo in profile name [puppet] - 10https://gerrit.wikimedia.org/r/723522 [13:15:46] (03PS6) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:16:30] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila_sharecontroller: fix typo in profile name [puppet] - 10https://gerrit.wikimedia.org/r/723522 (owner: 10Arturo Borrero Gonzalez) [13:16:45] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:21:14] (03PS2) 10Jbond: apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723520 [13:22:53] (03PS7) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:23:07] (03PS8) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:23:19] (03PS7) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [13:23:32] (03CR) 10jerkins-bot: [V: 04-1] apt::pin: include apt [puppet] - 10https://gerrit.wikimedia.org/r/723520 (owner: 10Jbond) [13:24:15] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [13:24:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:15] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:25:29] (03PS7) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [13:27:30] (03PS9) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:27:51] (03PS8) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [13:27:57] (03PS8) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [13:29:59] (03CR) 10jerkins-bot: [V: 04-1] spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [13:31:02] !log start of rebuilding metadata of images in commons to make them use json [13:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:18] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [13:33:48] 10SRE, 10SRE-swift-storage, 10Wikimedia-Incident: ms-be1062 fell off the network, causing swift timeouts - https://phabricator.wikimedia.org/T281107 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi I'll resolve, we haven't seen this occurring again. [13:36:25] (03Abandoned) 10Jbond: stdlib::ensure: Add package support [puppet] - 10https://gerrit.wikimedia.org/r/723488 (owner: 10David Caro) [13:37:40] (03CR) 10Jbond: "need to update:" [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [13:43:49] (03CR) 10Herron: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/968/" [puppet] - 10https://gerrit.wikimedia.org/r/723487 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [13:45:12] (03PS10) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [13:46:11] (03CR) 10Herron: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/723434 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [13:46:13] (03PS1) 10Arturo Borrero Gonzalez: openstack: manila_sharecontroller: fix typo in class name [puppet] - 10https://gerrit.wikimedia.org/r/723531 [13:46:29] (03CR) 10Herron: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/723433 (https://phabricator.wikimedia.org/T286911) (owner: 10Muehlenhoff) [13:52:22] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: manila_sharecontroller: fix typo in class name [puppet] - 10https://gerrit.wikimedia.org/r/723531 (owner: 10Arturo Borrero Gonzalez) [13:57:46] (03PS1) 10Giuseppe Lavagetto: Fix typo [labs/private] - 10https://gerrit.wikimedia.org/r/723533 [13:58:05] (03PS9) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [13:58:12] (03PS12) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [13:58:14] (03PS9) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [14:01:35] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [14:03:07] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [14:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:19] 10SRE, 10Traffic: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10BBlack) This is also being covered (for the public-facing side of things) in https://wikitech.wikimedia.org/wiki/HTTPS/2021_Let%27s_Encrypt_root_expiry , which Johan has kindly copied out to the upcoming Monda... [14:04:45] 10SRE, 10Traffic: Let's Encrypt issuance chains update - https://phabricator.wikimedia.org/T283164 (10BBlack) [14:05:24] (03CR) 10David Caro: [C: 03+1] "Awesome! \o/" [puppet] - 10https://gerrit.wikimedia.org/r/723515 (owner: 10Jbond) [14:07:41] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [14:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:33] (03CR) 10Herron: "From an architecture perspective I'm thinking we should seriously consider eliminating the current local ES/OS coordinating node from the " [puppet] - 10https://gerrit.wikimedia.org/r/721395 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [14:14:26] (03CR) 10Elukey: [V: 03+2 C: 03+2] Fix typo [labs/private] - 10https://gerrit.wikimedia.org/r/723533 (owner: 10Giuseppe Lavagetto) [14:20:13] (03Abandoned) 10Elukey: role::deployment_server: add revscoring-editquality-deploy k8s user [puppet] - 10https://gerrit.wikimedia.org/r/723077 (https://phabricator.wikimedia.org/T286791) (owner: 10Elukey) [14:23:50] (03PS10) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [14:24:37] (03PS1) 10Ema: admin: set krb attribute to 'present' for ema [puppet] - 10https://gerrit.wikimedia.org/r/723536 [14:28:35] (03CR) 10Elukey: [C: 03+1] admin: set krb attribute to 'present' for ema [puppet] - 10https://gerrit.wikimedia.org/r/723536 (owner: 10Ema) [14:32:43] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [14:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:21] (03PS10) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [14:38:14] (03PS13) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [14:38:55] (03PS11) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [14:39:08] (03PS12) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [14:40:02] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31278/console" [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [14:43:37] (03PS14) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [14:45:48] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31280/console" [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [14:56:26] (03CR) 10Elukey: [V: 03+1] "LGTM! Left two documentation suggestions (hopefully I understood the code correctly)." [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [15:05:41] (03CR) 10David Caro: [C: 03+1] "👍 for types!" [puppet] - 10https://gerrit.wikimedia.org/r/723494 (owner: 10Jbond) [15:05:51] (03CR) 10David Caro: [C: 03+1] C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 (owner: 10Jbond) [15:09:18] !log sudo cumin -m async -b2 "c:profile::analytics::cluster::hdfs_mount" "umount /mnt/hdfs" "mount /mnt/hdfs" - T288625 [15:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:03] 10SRE, 10ops-eqiad, 10DC-Ops, 10Elasticsearch, 10Discovery-Search (Current work): Q4:(Need By: TBD) rack/setup/install elastic10[68-83].eqiad.wmnet - https://phabricator.wikimedia.org/T281989 (10Cmjohnson) [15:13:15] 10SRE, 10ops-eqiad, 10DC-Ops, 10Elasticsearch, 10Discovery-Search (Current work): Q4:(Need By: TBD) rack/setup/install elastic10[68-83].eqiad.wmnet - https://phabricator.wikimedia.org/T281989 (10Cmjohnson) BIOS/Idrac setup [15:13:31] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q1:(Need By: TBD) rack/setup/install puppetmaster100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T289732 (10Cmjohnson) [15:13:40] 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations: Q1:(Need By: TBD) rack/setup/install puppetmaster100[45].eqiad.wmnet - https://phabricator.wikimedia.org/T289732 (10Cmjohnson) BIOS/iDrac setup [15:13:43] (03CR) 10Hashar: [C: 03+1] "Ok thanks Brooke! Lets go for it and we can always adjust if we catch something going wrong when provisioning a new instance." [puppet] - 10https://gerrit.wikimedia.org/r/722476 (https://phabricator.wikimedia.org/T277078) (owner: 10Krinkle) [15:13:59] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Cmjohnson) [15:14:10] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops, 10Data-Engineering: Q1:(Need By: ASAP) rack/setup/install an-db100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T289632 (10Cmjohnson) BIOS and iDrac setup [15:16:52] (03PS3) 10Jbond: Disable the "long running screen/tmux session" check by default [puppet] - 10https://gerrit.wikimedia.org/r/712123 (https://phabricator.wikimedia.org/T288028) (owner: 10Muehlenhoff) [15:17:36] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:30] (03CR) 10Jbond: [C: 03+2] Disable the "long running screen/tmux session" check by default [puppet] - 10https://gerrit.wikimedia.org/r/712123 (https://phabricator.wikimedia.org/T288028) (owner: 10Muehlenhoff) [15:21:18] (03CR) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [15:21:41] (03PS15) 10Giuseppe Lavagetto: profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 [15:22:23] (03CR) 10Elukey: [C: 03+1] profile::kuberentes_deployment_server: re-think user management [puppet] - 10https://gerrit.wikimedia.org/r/723419 (owner: 10Giuseppe Lavagetto) [15:22:43] 10SRE, 10Observability-Alerting, 10Patch-For-Review: Remove the "Long running screen/tmux" Icinga check - https://phabricator.wikimedia.org/T288028 (10Marostegui) 05Open→03Resolved a:03jbond Change has been merged by John so closing this. [15:23:26] marostegui: isn't there still cleanup work for eventually [15:23:38] (03PS3) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [15:23:44] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:23:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:58] (03CR) 10ZPapierski: Added spicerack.kafka with offset transfer function (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [15:25:41] 10SRE, 10Privacy Engineering, 10Research, 10Security-Team, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) @Vgutierrez This site was setup by Brandon. Could you maybe ask him about that last question? [15:29:57] (03CR) 10jerkins-bot: [V: 04-1] Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [15:34:03] (03PS1) 10Jbond: monitoring: drop monitor_screens parameter [puppet] - 10https://gerrit.wikimedia.org/r/723543 [15:35:06] (03PS2) 10Jbond: monitoring: drop monitor_screens parameter [puppet] - 10https://gerrit.wikimedia.org/r/723543 (https://phabricator.wikimedia.org/T288028) [15:36:07] 10SRE, 10Observability-Alerting, 10Patch-For-Review: Remove the "Long running screen/tmux" Icinga check - https://phabricator.wikimedia.org/T288028 (10jbond) 05Resolved→03In progress the merged change was just to make it disabled by default there is another coming to remove it [15:36:16] (03PS11) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [15:36:18] (03PS11) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [15:36:20] (03PS13) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [15:36:22] (03PS1) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [15:36:43] Ty jbond [15:37:39] (03CR) 10jerkins-bot: [V: 04-1] P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [15:38:14] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [15:40:03] (03PS12) 10Jbond: spec tests: drop pre_conditions as its not needed [puppet] - 10https://gerrit.wikimedia.org/r/723515 [15:40:05] (03PS12) 10Jbond: C:base::monitoring::host: Add type definitions [puppet] - 10https://gerrit.wikimedia.org/r/723494 [15:40:07] (03PS14) 10Jbond: P:base: make notifications_enabled a boolean [puppet] - 10https://gerrit.wikimedia.org/r/723509 (https://phabricator.wikimedia.org/T289661) [15:40:09] (03PS2) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [15:42:36] (03PS1) 10Effie Mouzeli: php7.2 images: include php7.2 patch (T291052) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/723545 [15:43:01] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [15:46:30] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [15:46:33] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:46:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:17] (03PS12) 10Dave Pifke: webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) [15:52:39] (03PS13) 10Dave Pifke: webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) [15:52:41] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:52:44] (03CR) 10Effie Mouzeli: [V: 03+2 C: 03+2] php7.2 images: include php7.2 patch (T291052) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/723545 (owner: 10Effie Mouzeli) [15:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:18] !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:53:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:58] (03PS14) 10Dave Pifke: webperf: connect to Kafka using TLS [puppet] - 10https://gerrit.wikimedia.org/r/721047 (https://phabricator.wikimedia.org/T290131) [15:59:26] !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons. - elukey@cumin1001 [15:59:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:39] (03PS3) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:01:26] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:01:58] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:02:05] 10SRE-swift-storage, 10Arc-Lamp, 10Performance-Team, 10Patch-For-Review: Swift container for performance flame graphs (ArcLamp) - https://phabricator.wikimedia.org/T244776 (10Krinkle) p:05Triage→03High [16:04:57] (03PS4) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:05:26] (03CR) 10Cwhite: profile: fork elasticsearch::logstash into opensearch::logstash (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/721395 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [16:05:41] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31283/console" [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:06:04] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:07:46] (03PS5) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:09:07] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:12:02] (03PS6) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:13:00] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31286/console" [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:13:17] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:15:26] (03PS7) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:16:01] 10SRE, 10Gerrit: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) @Marostegui No, it's still open. I'll follow-up on it soon. [16:16:11] 10SRE, 10Gerrit: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) a:03Dzahn [16:16:17] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31287/console" [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:16:20] PROBLEM - LVS linkrecommendation eqiad port 4005/tcp - Link Recommendation- linkrecommendation.svc.eqiad.wmnet IPv4 on linkrecommendation.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.23 and port 4005: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [16:16:39] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:18:30] RECOVERY - LVS linkrecommendation eqiad port 4005/tcp - Link Recommendation- linkrecommendation.svc.eqiad.wmnet IPv4 on linkrecommendation.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 193 bytes in 1.056 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [16:19:08] (03PS8) 10Jbond: P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 [16:20:29] (03CR) 10jerkins-bot: [V: 04-1] P:base: move base::monitoring::host to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/723544 (owner: 10Jbond) [16:21:59] (03PS1) 10BBlack: Add wikiworkshop.org to HSTS regex [puppet] - 10https://gerrit.wikimedia.org/r/723590 (https://phabricator.wikimedia.org/T251732) [16:22:32] 10SRE, 10Privacy Engineering, 10Research, 10Security-Team, and 3 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10BBlack) >>! In T251732#7376732, @Dzahn wrote: > @Vgutierrez This site was setup by Brandon. Could you ma... [16:26:29] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) >>! In T288844#7375732, @Dzahn wrote: > Could you share the new license information with me in a secure w... [16:35:46] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [16:35:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:29] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) cc'ing @dom_walden and @imaigwilo, the QTEs that work with Anti-Harassment Tools. Are the MaxMind databa... [16:47:02] (03CR) 10Dzahn: [C: 03+1] "thanks:)" [puppet] - 10https://gerrit.wikimedia.org/r/723590 (https://phabricator.wikimedia.org/T251732) (owner: 10BBlack) [16:49:37] (03PS2) 10Ryan Kemper: query_service: Add monitoring::groups for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/723314 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [16:49:46] (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/723314 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [16:52:34] (03CR) 10Ryan Kemper: [C: 03+2] query_service: Add monitoring::groups for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/723314 (https://phabricator.wikimedia.org/T280001) (owner: 10Ebernhardson) [16:55:23] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) >>! In T288844#7376870, @phuedx wrote: > If you have a GPG key, then we could exchange public keys and I c... [16:58:26] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723593 [17:01:06] RECOVERY - Check correctness of the icinga configuration on alert1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [17:02:10] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723595 [17:02:39] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001 [17:02:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:28] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) >>! In T288844#7376887, @phuedx wrote: > Are the MaxMind databases deployed to the Beta Cluster or will th... [17:04:06] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-heavy-queries_8888: Servers wdqs1006.eqiad.wmnet are marked down but pooled: wdqs-ssl_443: Servers wdqs1006.eqiad.wmnet are marked down but pooled: wdqs_80: Servers wdqs1006.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [17:05:18] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [17:06:22] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723597 [17:13:32] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723598 [17:17:24] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723599 [17:20:58] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001 [17:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:28] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723600 [17:25:59] (03PS1) 10Legoktm: MovePage: don't create a recent change for a redirect [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723582 (https://phabricator.wikimedia.org/T291677) [17:29:21] PROBLEM - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:29:33] * legoktm looks [17:29:52] here, looking [17:31:19] RECOVERY - LVS zotero eqiad port 4969/tcp - Zotero- zotero.svc.eqiad.wmnet IPv4 #page on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 196 bytes in 1.067 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:32:24] <_joe_> it was a higher cpu spike compared to the ones we seem to have regularly [17:32:29] <_joe_> sadly zotero is a dark box [17:32:38] <_joe_> see https://grafana.wikimedia.org/d/2oPtfvXWk/zotero?viewPanel=28&orgId=1&refresh=1m&from=now-24h&to=now [17:32:54] <_joe_> it doesn't seem like the usual memleak is causing this though [17:33:04] yeah [17:33:35] memory's a little high but not crazy [17:33:46] <_joe_> I think the cure might be raising the number of replicas if this keeps happening [17:33:57] <_joe_> or maybe someone is just trying to cite a url that causes this :/ [17:34:08] sounds plausible - when's the last time we re-provisioned it? [17:34:16] <_joe_> I have no idea :) [17:34:38] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [17:35:06] <_joe_> but I they seem to have all been started 30 hours ago [17:35:20] <_joe_> and I see several restarts for most pods [17:35:43] <_joe_> due to failing liveness probes [17:37:35] <_joe_> rzl: I would try to raise the number of pods allocated by say 10%, and /or a rolling restart of the pods [17:38:16] _joe_: I was interested in whether the CPU usage is uniform between the pods, and it looks like it's not: https://grafana.wikimedia.org/goto/e5npVTN7k [17:38:29] most of them are fine, a few are spiking [17:38:33] so I'm not sure more pods would help [17:38:42] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [17:39:01] <_joe_> at least to keep the service available I'd say :P [17:39:06] <_joe_> but yes, possibly not :/ [17:39:42] <_joe_> you can check the times zotero went into a cpu spike and find the urls that were requested to url-downloader in that timespan from the pod ip [17:39:46] <_joe_> in the squid logs [17:41:22] looking -- I don't know my way around the squid logs super well so I may not get there faster than the spike ends on its own at this point :P but let's see [17:42:58] man I'm not even sure if I can *find* the squid logs faster than the spike ends on its own :/ [17:43:05] more practice for me later [17:43:18] are they not on the url-downloader host itself? [17:43:44] increasing the # of replicas should at least let some more pods keep serving requests while others are tied up with the massive requests, right? [17:43:44] zotero 5xxs and cpu both look recovered [17:44:15] <_joe_> yes the spike is recovered [17:44:32] <_joe_> and yes the squid logs should be under /var/log/squid-something on the host :) [17:44:48] <_joe_> they have the "client" ip, and ua iirc [17:45:02] <_joe_> so you should be able to find the calls to specific zotero pods from their IP address [17:45:09] git blame says the # of replicas at least the same since July 2019, when it was committed to git [17:45:11] <_joe_> (it's in kubectl describe pods) [17:45:17] _joe_: ahh okay cool [17:45:29] <_joe_> legoktm: yes normally we have issues with zotero due to memleaks [17:46:06] yeah I think I just haven't happened to look at this file before, I'm going to keep digging around for this even though it looks like the incident is over [17:46:13] <_joe_> and in the past we had such cpu problems when zotero was used to cite certain wiki pages - in which case you won't find nothing in the squid logs [17:46:32] so throwing more "hardware" at this seems reasonable [17:46:42] yeah I'll buy that [17:56:50] so add 2 more replicas? 4? [17:57:34] okay out of the three pods that were spiking at 17:25, none of their IP addresses appears in the squid log as far as I can tell [17:57:50] (although I got to them post-restart, and I'm not sure if they get new IPs when that happens) [17:58:46] legoktm: there's what, 10 now? +4 sounds reasonable to me but I don't have a great sense of what it costs resource-wise [17:59:17] like I can look up the allocation, I just don't know if it's "a lot" on our infra :P [18:00:04] or rather, we have 10 in eqiad and 10 in codfw, we're talking about +4 in each [18:00:54] I don't have a good sense of that either, I struggled a bit with that for Shellbox too [18:01:14] (03PS1) 10Legoktm: zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 [18:01:35] (03PS2) 10Legoktm: zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 [18:02:27] something to talk about at Monday's svcops meeting but let's run with +4 in the meantime [18:02:34] (03CR) 10RLazarus: [C: 03+1] zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 (owner: 10Legoktm) [18:02:48] (03PS1) 10Legoktm: Revert "Remove deprecated date.js library" [extensions/PageTriage] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723583 (https://phabricator.wikimedia.org/T291675) [18:03:01] (03CR) 10Legoktm: [C: 03+2] zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 (owner: 10Legoktm) [18:05:31] (03PS3) 10Legoktm: zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 (https://phabricator.wikimedia.org/T291707) [18:05:44] (03CR) 10Legoktm: [C: 03+2] "..." [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 (https://phabricator.wikimedia.org/T291707) (owner: 10Legoktm) [18:08:39] (03CR) 10Ottomata: [C: 03+1] "+1 from me after you figure out the sorted hash thing. Merge at will!" [puppet] - 10https://gerrit.wikimedia.org/r/712974 (https://phabricator.wikimedia.org/T266641) (owner: 10Btullis) [18:09:58] (03Merged) 10jenkins-bot: zotero: Add 4 more replicas (total 14) [deployment-charts] - 10https://gerrit.wikimedia.org/r/723604 (https://phabricator.wikimedia.org/T291707) (owner: 10Legoktm) [18:12:29] rzl, _joe_: I filed https://phabricator.wikimedia.org/T291707 and added some open questions [18:12:39] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' . [18:12:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:56] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' . [18:14:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:43] (03PS2) 10Ebernhardson: query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 [18:19:57] (03PS3) 10Ebernhardson: query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 [18:21:25] (03CR) 10Jforrester: "❤️" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723046 (owner: 10Legoktm) [18:21:40] :))) [18:31:53] (03CR) 10Legoktm: [C: 03+2] MovePage: don't create a recent change for a redirect [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723582 (https://phabricator.wikimedia.org/T291677) (owner: 10Legoktm) [18:31:59] (03CR) 10Legoktm: [C: 03+2] Revert "Remove deprecated date.js library" [extensions/PageTriage] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723583 (https://phabricator.wikimedia.org/T291675) (owner: 10Legoktm) [18:34:55] legoktm: looks good, thanks for writing that up! [18:38:51] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: Q2) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10Jclark-ctr) installed ears on all switches. uploaded to netbox rows c,d printed and labeled switches [18:43:25] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723607 [18:51:56] (03Merged) 10jenkins-bot: MovePage: don't create a recent change for a redirect [core] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723582 (https://phabricator.wikimedia.org/T291677) (owner: 10Legoktm) [18:52:01] (03Merged) 10jenkins-bot: Revert "Remove deprecated date.js library" [extensions/PageTriage] (wmf/1.38.0-wmf.1) - 10https://gerrit.wikimedia.org/r/723583 (https://phabricator.wikimedia.org/T291675) (owner: 10Legoktm) [18:53:31] !log legoktm@deploy1002 sync-file aborted: (no justification provided) (duration: 00m 00s) [18:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:00] (hit enter before I could finish typing the reason) [18:54:48] makes me think of Chad [18:54:52] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/extensions/PageTriage/: Revert "Remove deprecated date.js library" (T291675) (duration: 00m 59s) [18:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:58] T291675: Time on New Pages Feed is displayed in browser timezone instead of timezone set in preferences - https://phabricator.wikimedia.org/T291675 [18:57:30] !log legoktm@deploy1002 Synchronized php-1.38.0-wmf.1/includes/MovePage.php: MovePage: don't create a recent change for a redirect (T291677) (duration: 00m 57s) [18:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:36] T291677: Do not generate RecentChanges or page creation log entries for redirects from page moves - https://phabricator.wikimedia.org/T291677 [19:04:55] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:51] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:09] !log volker-e@deploy1002 Started deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487) [19:33:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:16] !log volker-e@deploy1002 Finished deploy [design/style-guide@6585e79]: Deploy design/style-guide: 6585e79 “Apps”: Add Apps x Design System section (#487) (duration: 00m 07s) [19:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:41] (03PS4) 10Ryan Kemper: query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 (owner: 10Ebernhardson) [19:38:59] (03CR) 10Ryan Kemper: [C: 03+1] query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 (owner: 10Ebernhardson) [19:39:08] (03CR) 10Ryan Kemper: [V: 03+2 C: 03+1] query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 (owner: 10Ebernhardson) [19:39:11] (03CR) 10Ryan Kemper: [V: 03+2 C: 03+2] query_service: Add dummy credentials for query_service oauth [labs/private] - 10https://gerrit.wikimedia.org/r/723307 (owner: 10Ebernhardson) [20:00:05] !log volker-e@deploy1002 Started deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489) [20:00:11] !log volker-e@deploy1002 Finished deploy [design/style-guide@362c6b1]: Deploy design/style-guide: 362c6b1 “Components”: Fix index link (#489) (duration: 00m 06s) [20:00:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:12] (03PS4) 10Cwhite: opensearch: fork elasticsearch module into opensearch module [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) [22:04:32] (03PS2) 10BryanDavis: kubernetes::deployment_server: add general data for mcrouter + nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/722833 (https://phabricator.wikimedia.org/T291530) (owner: 10Giuseppe Lavagetto) [22:12:13] (03PS1) 10BryanDavis: toolhub: Get mcrouter image tags from upstream config [deployment-charts] - 10https://gerrit.wikimedia.org/r/723618 (https://phabricator.wikimedia.org/T291530) [22:36:29] (03PS5) 10Cwhite: opensearch: fork elasticsearch module into opensearch module [puppet] - 10https://gerrit.wikimedia.org/r/721359 (https://phabricator.wikimedia.org/T288618) [22:36:31] (03PS4) 10Cwhite: opensearch_dashboards: fork kibana module into opensearch_dashboards module [puppet] - 10https://gerrit.wikimedia.org/r/721385 (https://phabricator.wikimedia.org/T288618) [22:36:33] (03PS4) 10Cwhite: icinga: fork icinga::monitor::elasticsearch::base_checks [puppet] - 10https://gerrit.wikimedia.org/r/721386 (https://phabricator.wikimedia.org/T288618) [22:36:35] (03PS3) 10Cwhite: profile: fork elasticsearch profile into opensearch::server [puppet] - 10https://gerrit.wikimedia.org/r/721388 (https://phabricator.wikimedia.org/T288618) [22:36:37] (03PS4) 10Cwhite: profile: fork elasticsearch base_checks for opensearch [puppet] - 10https://gerrit.wikimedia.org/r/721389 (https://phabricator.wikimedia.org/T288618) [22:36:39] (03PS3) 10Cwhite: profile: fork kibana profile into opensearch::dashboards [puppet] - 10https://gerrit.wikimedia.org/r/721391 (https://phabricator.wikimedia.org/T288618) [22:36:41] (03PS4) 10Cwhite: profile: fork elasticsearch::logstash into opensearch::logstash [puppet] - 10https://gerrit.wikimedia.org/r/721395 (https://phabricator.wikimedia.org/T288618) [22:36:43] (03PS1) 10Cwhite: hiera: add minimal logstash-beta-next hiera configuration [puppet] - 10https://gerrit.wikimedia.org/r/723619 (https://phabricator.wikimedia.org/T288618) [22:50:36] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:56:52] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [23:21:34] (03PS2) 10Jforrester: TimedMediaHandler: Make videojs the only player on all group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612348 (https://phabricator.wikimedia.org/T248418) [23:21:36] (03PS2) 10Jforrester: TimedMediaHandler: Make videojs the only player everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612349 (https://phabricator.wikimedia.org/T248418) [23:21:38] (03PS2) 10Jforrester: TimedMediaHandler: Drop Beta Feature, no longer usable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612350 (https://phabricator.wikimedia.org/T248418) [23:21:40] (03PS2) 10Jforrester: TimedMediaHandler: Don't read wmgTmhWebPlayer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612351 (https://phabricator.wikimedia.org/T248418) [23:21:42] (03PS2) 10Jforrester: TimedMediaHandler: Drop pre-switch config, no longer read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612352 (https://phabricator.wikimedia.org/T248418) [23:43:05] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10Dzahn) I received the new license info from @phuedx , encrypted with GPG. I decrypted it and added it to the pri...