[00:35:09] (03PS1) 10BryanDavis: toolhub: Crawler CronJob concurrencyPolicy back to Forbid [deployment-charts] - 10https://gerrit.wikimedia.org/r/729891 (https://phabricator.wikimedia.org/T292861) [00:57:36] (03CR) 10GeoffreyT2000: "Abandon this patch please as it is now superseded by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/729892/." [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/728537 (owner: 10Daniel Kinzler) [01:23:02] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:47:32] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:14:46] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [05:18:56] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [05:49:33] (03CR) 10Ayounsi: [C: 03+1] "Great, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/727594 (https://phabricator.wikimedia.org/T292792) (owner: 10CDanis) [05:52:37] 10SRE, 10Infrastructure-Foundations, 10netops: TATA SKY Broadband (AS134674) issues with connecting to upload.wikimedia.org - https://phabricator.wikimedia.org/T275234 (10ayounsi) 05Open→03Resolved a:03ayounsi No more news from Tata Sky and nothing we can do at our network layer neither. To be reopened... [06:17:23] (03PS3) 10Giuseppe Lavagetto: Allow /w/static.php to serve assets for static/current [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) [06:26:13] 10SRE, 10MediaWiki-General, 10Platform Engineering Code Jam, 10Platform Engineering Roadmap Decision Making, 10Performance-Team (Radar): Allow easier ICU transitions in MediaWiki (change how sortkey collation is managed in the categorylinks table) - https://phabricator.wikimedia.org/T263437 (10Joe) >>! I... [06:43:50] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [06:47:58] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [06:54:08] (03Abandoned) 10Muehlenhoff: Don't include rsync::server for absented rsync modules [puppet] - 10https://gerrit.wikimedia.org/r/726851 (owner: 10Muehlenhoff) [06:56:16] (03CR) 10Muehlenhoff: "Looks good patch-wise. New access groups are being discussed/approved in the weekly SRE IF meeting (which happens later the day)" [puppet] - 10https://gerrit.wikimedia.org/r/728648 (owner: 10Dzahn) [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211011T0700) [07:02:02] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:04:10] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:14:24] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 235, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:14:30] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:26:32] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 236, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:26:38] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:32:29] Anyone having any issue with gerrit this morning? [07:32:33] (03CR) 10Muehlenhoff: modules::gitlab::ssh explicitly add git user and enable login (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [07:41:29] Seddon: works fine for me [07:48:07] 10SRE, 10observability: Loading https://graphite.wikimedia.org/ throws an envoy error - https://phabricator.wikimedia.org/T292877 (10fgiunchedi) I can't reproduce this at the moment on https://graphite.wikimedia.org, can you @Urbanecm ? [07:50:39] 10SRE, 10observability: Loading https://graphite.wikimedia.org/ throws an envoy error - https://phabricator.wikimedia.org/T292877 (10Urbanecm) >>! In T292877#7415830, @fgiunchedi wrote: > I can't reproduce this at the moment on https://graphite.wikimedia.org, can you @Urbanecm ? Yes. Loading https://graphite.... [07:53:31] (03PS2) 10Volans: dhcp: remove all physical hosts hardcoded config [puppet] - 10https://gerrit.wikimedia.org/r/727387 (https://phabricator.wikimedia.org/T269855) [07:53:33] (03PS2) 10Volans: cumin: remove wmf-auto-reimage scripts [puppet] - 10https://gerrit.wikimedia.org/r/727415 (https://phabricator.wikimedia.org/T269855) [07:54:53] volans: how will physical hosts get their addresses after the first boot, if there is no longer a dhcp block for them? or does the installer add static addresses that they will use on subsequent boots? [07:55:19] maryum: that's how it's worked up till now [07:55:44] (so i don't expect that bit is changing) [07:56:11] kormat: i think you intended to ping majavah 🙂 [07:56:26] er. yes. sorry maryum :) [07:56:50] we need to expand the alphabet, so everyone can have a unique 2-letter prefix. ;) [07:57:24] 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Test dhcp-option 82 - https://phabricator.wikimedia.org/T221388 (10Volans) 05Open→03Resolved The test of the option 82 has been successful and we're switching to this system for all physical hosts DHCP settings in T269855. In the final... [07:57:56] !log start kafka topics rebalancing for main-codfw (long running maintenance) - T288825 [07:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:04] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [07:58:17] !log migrating physical hosts DHCP to the new reimage process - T269855 [07:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:22] T269855: Manage DHCP from Netbox - https://phabricator.wikimedia.org/T269855 [07:59:39] (03PS2) 10Volans: sre.experimental.reimage: remove legacy code [cookbooks] - 10https://gerrit.wikimedia.org/r/727411 (https://phabricator.wikimedia.org/T269855) [08:00:22] 10SRE, 10Infrastructure-Foundations: Migrate OpenLDAP to MDB backend - https://phabricator.wikimedia.org/T292942 (10MoritzMuehlenhoff) [08:01:00] (03CR) 10Volans: [C: 03+2] dhcp: remove all physical hosts hardcoded config [puppet] - 10https://gerrit.wikimedia.org/r/727387 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:01:22] (03CR) 10Volans: [C: 03+2] cumin: remove wmf-auto-reimage scripts [puppet] - 10https://gerrit.wikimedia.org/r/727415 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:01:39] (03PS1) 10Filippo Giunchedi: graphite: bump uwsgi buffer space [puppet] - 10https://gerrit.wikimedia.org/r/729903 (https://phabricator.wikimedia.org/T292877) [08:01:49] urbanecm: ^ [08:02:18] that should fix your issue with graphite, will deploy shortly [08:02:26] thanks godog. Unable to judge whether that will help though :-). [08:02:31] wondering why it affects me, but not you [08:02:36] any soul available for a quick +1 ? [08:02:57] urbanecm: I'm guessing your request is sending larger headers for some reason [08:03:08] just over the limit [08:03:25] i see [08:04:43] 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, 10netops: externally-hosted NEL report forwarders for more timely report reception - https://phabricator.wikimedia.org/T292870 (10ayounsi) I'd wary of the complexity of the setup. As I'm not quite familiar with NEL setup, is there a downside of puttin... [08:05:45] (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: bump uwsgi buffer space [puppet] - 10https://gerrit.wikimedia.org/r/729903 (https://phabricator.wikimedia.org/T292877) (owner: 10Filippo Giunchedi) [08:06:54] !log bounce uwsgi on graphite hosts to bump request size limit - T292877 [08:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:00] T292877: Loading https://graphite.wikimedia.org/ throws an envoy error - https://phabricator.wikimedia.org/T292877 [08:07:34] urbanecm: better now? [08:07:57] (03CR) 10Volans: [C: 03+2] sre.experimental.reimage: remove legacy code [cookbooks] - 10https://gerrit.wikimedia.org/r/727411 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:08:09] (03PS2) 10Volans: sre.hosts.reimage: renamed from experimental [cookbooks] - 10https://gerrit.wikimedia.org/r/727412 (https://phabricator.wikimedia.org/T269855) [08:08:22] godog: yup, works now. Thanks [08:09:26] mmhh strange, I managed to botch the option name [08:09:35] I'll fix it [08:09:39] godog: blame your reviewer. ;) [08:09:45] :D [08:10:09] kormat: I will! I want my money back [08:10:28] * godog looks at the brown paperbag [08:10:31] (03Merged) 10jenkins-bot: sre.experimental.reimage: remove legacy code [cookbooks] - 10https://gerrit.wikimedia.org/r/727411 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:10:47] (03CR) 10Volans: [C: 03+2] sre.hosts.reimage: renamed from experimental [cookbooks] - 10https://gerrit.wikimedia.org/r/727412 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:11:25] (03PS1) 10Filippo Giunchedi: graphite: fix buffer-size option name [puppet] - 10https://gerrit.wikimedia.org/r/729904 (https://phabricator.wikimedia.org/T292877) [08:11:43] (03CR) 10jerkins-bot: [V: 04-1] graphite: fix buffer-size option name [puppet] - 10https://gerrit.wikimedia.org/r/729904 (https://phabricator.wikimedia.org/T292877) (owner: 10Filippo Giunchedi) [08:12:20] (03PS2) 10Filippo Giunchedi: graphite: fix buffer-size option name [puppet] - 10https://gerrit.wikimedia.org/r/729904 (https://phabricator.wikimedia.org/T292877) [08:13:10] (03CR) 10Filippo Giunchedi: [C: 03+2] graphite: fix buffer-size option name [puppet] - 10https://gerrit.wikimedia.org/r/729904 (https://phabricator.wikimedia.org/T292877) (owner: 10Filippo Giunchedi) [08:14:03] (03Merged) 10jenkins-bot: sre.hosts.reimage: renamed from experimental [cookbooks] - 10https://gerrit.wikimedia.org/r/727412 (https://phabricator.wikimedia.org/T269855) (owner: 10Volans) [08:14:46] urbanecm: mind trying one more time? [08:15:14] godog: still works! [08:15:29] nice, I'll resolve the task [08:15:34] thanks [08:15:51] 10SRE, 10observability, 10Patch-For-Review: Loading https://graphite.wikimedia.org/ throws an envoy error - https://phabricator.wikimedia.org/T292877 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Confirmed working by @Urbanecm [08:16:09] sure [08:19:14] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Joe) In the above patch, I implemented the following approach: - We will rewrite `static/current` to go to `/w/stat... [08:19:45] (03CR) 10Giuseppe Lavagetto: mediawiki: Add rsyslog sidecar (035 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/725892 (https://phabricator.wikimedia.org/T288851) (owner: 10Giuseppe Lavagetto) [08:20:34] (03PS4) 10Giuseppe Lavagetto: mediawiki: Add rsyslog sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/725892 (https://phabricator.wikimedia.org/T288851) [08:21:20] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [08:22:18] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [08:24:21] !log volans@cumin1001 START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet [08:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:08] !log updated buster d-i image for Buster 10.11 point release T292838 [08:25:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:14] T292838: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 [08:25:15] (03PS1) 10David Caro: cinderutils::ensure: Check falsey instead of empty string [puppet] - 10https://gerrit.wikimedia.org/r/729926 (https://phabricator.wikimedia.org/T292465) [08:25:47] volans: what OS did you use for sretest1001? it might fail for the point releases which happened over the week end (they bumped the kernel ABI) [08:25:56] (03CR) 10jerkins-bot: [V: 04-1] cinderutils::ensure: Check falsey instead of empty string [puppet] - 10https://gerrit.wikimedia.org/r/729926 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [08:26:03] I'm currently updating the d-i images [08:26:20] !log swift eqiad-prod: final weight to ms-be10[64-67] - T290546 [08:26:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:26] T290546: Put ms-be10[64-67] in service - https://phabricator.wikimedia.org/T290546 [08:26:52] moritzm: I used buster as was the old one [08:27:12] and right... I had forgot about your message last week, sorry [08:27:26] but good to test a failure scenario too ;) [08:27:34] you can retry buster shortly, currently running puppet on install servers [08:28:31] * volans attaching to the console to check the current run [08:28:32] ack [08:29:33] happy to test a bullseye reimage for graphite2003 too, please LMK when I'm good to go [08:30:26] (03PS1) 10David Caro: openstack::...vms::common: Use ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/729929 (https://phabricator.wikimedia.org/T292943) [08:31:17] godog: great, depending on how much of a guinea pig you want to you can do it either now or after I reimage sretest1002 that is bullseye :) [08:31:45] moritzm: ok to reimage sretest1002? or that one too needs an update of the d-i image? [08:32:41] volans: hehe ok, I'm fine to wait after sretest1002 [08:33:26] (03PS5) 10Giuseppe Lavagetto: mediawiki: Add rsyslog sidecar [deployment-charts] - 10https://gerrit.wikimedia.org/r/725892 (https://phabricator.wikimedia.org/T288851) [08:33:27] ack, I'll ping you back briefly [08:33:50] volans: buster is fixed now, so you if you reimage sretest1002 with buster, it'll be fine, bullseye will be updated in a bit [08:34:21] sretest1001 with buster seems to have worked btw [08:34:28] nots ure if it picked already the new image [08:35:00] 10SRE, 10MediaWiki-extensions-CentralNotice, 10MediaWiki-extensions-Translate, 10Wikimedia-Fundraising, and 7 others: DBPerformance warning "Query returned XXXX rows: query: SELECT * FROM `translate_metadata`" - https://phabricator.wikimedia.org/T204026 (10Nikerabbit) Thanks for merging the CentralNotice p... [08:35:01] it would have failed with the old one (in d-i, when fetching the kernel udebs) [08:35:11] so probably my update came in time [08:35:21] 4.19.208-1 (2021-09-29) [08:35:25] that's what it got [08:35:41] yeah, that's the latest new kernel from the 10.11 point release [08:35:52] good timing :) [08:36:06] so bullseye would fail right now? [08:37:03] I'm currently running puppet on install* servers, so will be fixed shortly [08:37:29] great! if you lmk when ready I'll kick that one too, thanks :) [08:38:03] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [08:38:10] !log updated buster d-i image for Buster 10.11 point release T292838 [08:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:16] T292838: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 [08:38:21] !log updated buster d-i image for Bullseye 11.1 point release T292844 [08:38:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:27] T292844: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 [08:39:33] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [08:40:22] volans: you can do 1002/bullseye now, puppet runs complete [08:40:49] ack, thanks! [08:41:06] !log volans@cumin2002 START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet [08:41:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:48] 10SRE, 10Infrastructure-Foundations, 10Traffic: Anycast: Add IPv6 support to bird and anycast-healthchecker (Puppet) - https://phabricator.wikimedia.org/T292737 (10ayounsi) Thanks that's great! Could you update the doc to reflect the new config knobs? And we need to be sure we don't forget to update https:... [08:43:20] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [08:43:50] 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.11 point update - https://phabricator.wikimedia.org/T292838 (10MoritzMuehlenhoff) [08:44:26] (03CR) 10David Caro: [C: 03+2] "Manually tested on paws-acme-chief-01.paws.eqiad.wmflabs and worked, merging:" [puppet] - 10https://gerrit.wikimedia.org/r/729929 (https://phabricator.wikimedia.org/T292943) (owner: 10David Caro) [08:47:50] (03CR) 10Filippo Giunchedi: [C: 03+2] statsite: log instance identifier [puppet] - 10https://gerrit.wikimedia.org/r/727295 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [08:47:56] (03PS2) 10Filippo Giunchedi: statsite: log instance identifier [puppet] - 10https://gerrit.wikimedia.org/r/727295 (https://phabricator.wikimedia.org/T247963) [08:48:19] RECOVERY - Check systemd state on sretest1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:48:34] !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet [08:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:28] !log bounce statsite on graphite1004 to apply unit config changes [08:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:28] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall good chart; My suggestions are basically:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [08:58:49] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) Yes I'm more than happy to help out on this. @Jclark-ctr if you have a suggested time when you'd like to do the work, I'll sort out downtime and shut... [09:00:01] 10SRE, 10Traffic, 10Patch-For-Review, 10User-ema: Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106 (10ema) [09:01:47] !log bounce swift-object-replicator on ms-be2036 [09:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:41] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: move kubernetes_docker parsing to priority 15 [puppet] - 10https://gerrit.wikimedia.org/r/728683 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [09:03:11] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: dot_expander: better handling of field collisions [puppet] - 10https://gerrit.wikimedia.org/r/728682 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [09:03:37] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:03:50] (03PS6) 10Kormat: mariadb: Page for read-only status issues in both DCs [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) [09:04:17] (03CR) 10Filippo Giunchedi: [C: 03+1] warn on idle centrallog mtail instances [alerts] - 10https://gerrit.wikimedia.org/r/724827 (https://phabricator.wikimedia.org/T292051) (owner: 10Herron) [09:05:51] !log volans@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet [09:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:25] godog: you can proceed, see https://wikitech.wikimedia.org/wiki/Server_Lifecycle/Reimage or ping me for any questions! [09:06:34] (03CR) 10Kormat: [V: 03+1 C: 04-2] "PCC SUCCESS (NOOP 8 DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31590/console" [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) (owner: 10Kormat) [09:07:14] volans: thanks! will do [09:08:46] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: add opensearch output config definition [puppet] - 10https://gerrit.wikimedia.org/r/727624 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [09:09:48] !log force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - T288825 [09:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:54] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [09:10:19] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/728378 (owner: 10Muehlenhoff) [09:13:41] !log filippo@cumin1001 START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet [09:13:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:01] RECOVERY - Check systemd state on search-loader2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:20:39] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [09:26:27] PROBLEM - Check correctness of the icinga configuration on alert1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [09:27:20] godog: interesting, looking [09:27:53] volans: I should note that d-i was waiting for a prompt (I've unblocked it since and it is progressing now) [09:28:07] host just rebooted fwiw [09:28:31] so is not yet after d-i [09:28:40] (03CR) 10Effie Mouzeli: [C: 03+1] tegola-vector-tiles: Use envoy for cronjob pods (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726891 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:28:46] not quite yet [09:29:40] (03PS1) 10Filippo Giunchedi: install_server: use standard recipe for graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/729934 (https://phabricator.wikimedia.org/T247963) [09:29:51] PROBLEM - carbon-frontend-relay metric drops on graphite1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [100.0] https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [09:30:14] volans: I'll be running another reimage anyways for graphite2003 because I'm changing the partman recipe, in case you want to test again [09:30:51] (03CR) 10Filippo Giunchedi: [C: 03+2] install_server: use standard recipe for graphite2003 [puppet] - 10https://gerrit.wikimedia.org/r/729934 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [09:30:54] (03PS1) 10Kormat: mariadb: Set mysql_role for primary pc hosts. [puppet] - 10https://gerrit.wikimedia.org/r/729935 [09:31:16] before d-i finishes the reimage doesn't affect icinga directly, it does just remove the host from puppetdb/puppet so the definitions should disappera [09:31:38] I'm checking the puppet run on puppetboard for alert1001 and running it manually to see what triggered it [09:31:41] not sure if related [09:31:42] yet [09:32:23] 10SRE, 10Infrastructure-Foundations, 10LDAP: Migrate OpenLDAP to MDB backend - https://phabricator.wikimedia.org/T292942 (10MoritzMuehlenhoff) [09:32:28] ack [09:32:58] (03PS2) 10Kormat: mariadb: Set mysql_role for primary pc hosts. [puppet] - 10https://gerrit.wikimedia.org/r/729935 [09:34:15] (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (DIFF 2 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31592/console" [puppet] - 10https://gerrit.wikimedia.org/r/729935 (owner: 10Kormat) [09:34:32] (03PS3) 10Kormat: mariadb: Set mysql_role for primary pc hosts. [puppet] - 10https://gerrit.wikimedia.org/r/729935 (https://phabricator.wikimedia.org/T284825) [09:35:20] (03PS5) 10Amire80: Update autonyms for kea, ota, sjd in wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699692 (https://phabricator.wikimedia.org/T284870) [09:35:46] fwiw config is ok now after the puppet run triggered by the reimage [09:36:38] RECOVERY - Check correctness of the icinga configuration on alert1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [09:36:54] (03CR) 10Kormat: [C: 03+2] mariadb: Set mysql_role for primary pc hosts. [puppet] - 10https://gerrit.wikimedia.org/r/729935 (https://phabricator.wikimedia.org/T284825) (owner: 10Kormat) [09:37:25] !log force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825 [09:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:31] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [09:39:36] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) Topics move done so far and their start timings (they coincide with the file creation on kafka-main2001): ` -rw-r--r-- 1 elukey wikidev 111 Oct 11 09:35 codfw.mediawiki.page-... [09:45:56] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet [09:46:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:02] volans: ack, thanks for taking a look [09:47:21] I'll kick off another reimage with the standard partman this time [09:47:39] ack [09:48:16] RECOVERY - carbon-frontend-relay metric drops on graphite1004 is OK: OK: Less than 80.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [09:48:26] 10SRE, 10Performance-Team, 10Traffic, 10User-ema: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10ema) Heads up #performance-team: as with all Varnish upgrades, this may have an impact (positive or negative) on performance. You may want to keep the upgrade process on your ra... [09:50:33] !log filippo@cumin1001 START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet [09:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:26] (03PS5) 10Jelto: modules::gitlab::ssh explicitly add git user with fixed id [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) [09:52:13] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Use envoy for cronjob pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/726891 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:52:43] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Fix config keys for envoy [deployment-charts] - 10https://gerrit.wikimedia.org/r/727134 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:56:05] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) Note - I added two directories to https://gitlab.wikimedia.org/Elukey/kafka_main_rebalance/-/tree/main/main-codfw/topicmappr_json: 1) `rollback` - containing the output of kaf... [09:56:24] (03Merged) 10jenkins-bot: tegola-vector-tiles: Use envoy for cronjob pods [deployment-charts] - 10https://gerrit.wikimedia.org/r/726891 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:56:51] (03Merged) 10jenkins-bot: tegola-vector-tiles: Fix config keys for envoy [deployment-charts] - 10https://gerrit.wikimedia.org/r/727134 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [09:59:17] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31593/console" [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [09:59:30] (03PS6) 10Amire80: Update autonyms in wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699692 (https://phabricator.wikimedia.org/T284870) [09:59:44] PROBLEM - Check correctness of the icinga configuration on alert1001 is CRITICAL: Icinga configuration contains errors https://wikitech.wikimedia.org/wiki/Icinga [09:59:59] godog: AFAICT the icinga config issue was due to the fact that there is a nagios_service check generated from hieradata/common/service.yaml for the graphite service [10:00:46] volans: ah yeah that makes sense [10:00:51] that when the graphite2003 host gets removed becomes "invalid", but that check is generated from the puppet code of the alert host, not the exported resources of the graphite host [10:01:14] the 'monitoring' section of service::catalog is to be revamped this quarter as part of o11y OKRs [10:01:21] can't wait to rip all of that out [10:01:30] ack, sounds like a plan! [10:01:45] also because I don't have any quick solution to propose right now [10:02:11] yeah I don't think there is unfortunately, hardcoding hostnames in service::catalog is asking for pain [10:02:25] indeed [10:03:18] I'll add a section in the FAQ of the Reimage wikitech page for this just as an FYI to people [10:03:22] PROBLEM - carbon-frontend-relay metric drops on graphite1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [100.0] https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [10:04:55] ack, thanks [10:05:04] the drops is expected, I'll ack the alert [10:06:44] (03CR) 10Jelto: [V: 03+1] modules::gitlab::ssh explicitly add git user with fixed id (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [10:10:09] RECOVERY - Check correctness of the icinga configuration on alert1001 is OK: Icinga configuration is correct https://wikitech.wikimedia.org/wiki/Icinga [10:10:35] 10SRE, 10MediaWiki-Uploading: Unexpected upload speed to commons - https://phabricator.wikimedia.org/T288481 (10aborrero) [10:10:39] 10SRE-swift-storage, 10User-Inductiveload: Unable to upload to Commons: uploadstash-file-not-found: Key "187kyl5ozj74.xtav8j.51508.djvu" not found in stash - https://phabricator.wikimedia.org/T278104 (10aborrero) [10:10:47] 10SRE, 10Commons, 10MediaWiki-Uploading, 10Structured Data Engineering, and 3 others: Various errors when trying to upload large files (Could not acquire lock, Service Temporarily Unavailable, 503 Backend fetch failed, 502 Next Hop Connection Failed) - https://phabricator.wikimedia.org/T280926 (10aborrero) [10:10:59] 10SRE-swift-storage, 10Commons, 10Internet-Archive, 10MediaWiki-API, and 3 others: Large PDF upload issue - https://phabricator.wikimedia.org/T254459 (10aborrero) [10:11:54] 10SRE, 10MediaWiki-Uploading: Unexpected upload speed to commons - https://phabricator.wikimedia.org/T288481 (10aborrero) This may be very similar to {T278389} [10:11:55] (03PS4) 10Majavah: apple-search: New chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) [10:14:21] (03PS5) 10Majavah: apple-search: New chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) [10:14:35] (03CR) 10Majavah: apple-search: New chart (0311 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/726933 (https://phabricator.wikimedia.org/T289224) (owner: 10Majavah) [10:18:03] godog: if you want to proof-read (might be a bit confusing) https://wikitech.wikimedia.org/wiki/Server_Lifecycle/Reimage#Icinga_configuration_correctness_alert_fires [10:18:45] volans: ack, I'll take a look [10:18:50] 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10diego) [10:19:20] thanks [10:19:38] 10SRE-Access-Requests: Requesting access to Analytic Cluster for Muniza - https://phabricator.wikimedia.org/T292955 (10diego) @MunizaA please update the task description with your [[ https://wikitech.wikimedia.org/wiki/SRE/Production_access#Generating_your_SSH_key | SSH key ]]. [10:21:45] volans: LGTM [10:23:57] !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet [10:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:51] perfect, thanks [10:27:48] (03PS1) 10Volans: sre.hosts.reimage: add OS to runtime description [cookbooks] - 10https://gerrit.wikimedia.org/r/729942 [10:28:58] 10SRE-tools, 10Infrastructure-Foundations: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884 (10Volans) 05In progress→03Resolved With the completion of the conversion of the reimage script to the sre.hosts.reimage cookbook, all the needed bits that wer... [10:29:04] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:29:11] volans: the reimage worked as expected the second time around too btw, very nice job [10:30:54] 10SRE-tools, 10Infrastructure-Foundations: Cookbooks: convert wmf-auto-reimage scripts to Cookbooks - https://phabricator.wikimedia.org/T205885 (10Volans) 05In progress→03Resolved a:03Volans The conversion to the sre.hosts.reimage cookbook has been completed. The new procedure is outlined in https://wiki... [10:31:00] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:31:43] 10SRE-tools, 10Infrastructure-Foundations: Cookbooks: convert wmf-auto-reimage scripts to Cookbooks - https://phabricator.wikimedia.org/T205885 (10Volans) [10:31:49] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage tries to remove from Debmonitor even with --new - https://phabricator.wikimedia.org/T204789 (10Volans) 05Open→03Resolved a:03Volans The current procedure outlined at https://wikitech.wikimedia.org/wiki/Server_Lifecycle/Reimage doesn't h... [10:31:58] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) [10:32:36] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10Goal: Expand Spicerack library and SRE Cookbooks - Q2 2018-19 Goal - https://phabricator.wikimedia.org/T205867 (10Volans) 05Open→03Resolved With the conversion of the reimage scripts to the sre.hosts.reimage cookbook this has been finally completed. T... [10:32:51] (03PS33) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [10:34:12] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage errors: failure to downtime (w/ no rename), pytho gc whine - https://phabricator.wikimedia.org/T239897 (10Volans) 05Open→03Resolved a:03Volans Resolving as the reimage scripts have been ported to the sre.hosts.reimage cookbook and this... [10:34:26] godog: thanks [10:38:01] RECOVERY - carbon-frontend-relay metric drops on graphite1004 is OK: OK: Less than 80.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Graphite%23Operations_troubleshooting https://grafana.wikimedia.org/dashboard/db/graphite-eqiad?orgId=1&panelId=21&fullscreen https://grafana.wikimedia.org/dashboard/db/graphite-codfw?orgId=1&panelId=21&fullscreen [10:39:00] (03CR) 10jerkins-bot: [V: 04-1] Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [10:39:58] 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage-host on HP gen10 WARNING: unable to verify that BIOS boot parameters are back to normal, got: - https://phabricator.wikimedia.org/T234358 (10Volans) 05Open→03Resolved a:03Volans I'm not sure why I did comment that way back in 2019, but from the... [10:41:21] (03PS1) 10Muehlenhoff: Obsolete role::restbase::base [puppet] - 10https://gerrit.wikimedia.org/r/729943 [10:41:27] 10SRE, 10SRE-tools, 10Infrastructure-Foundations, 10observability: cookbook sre.hosts.downtime: add feature to remove downtimes - https://phabricator.wikimedia.org/T251519 (10Volans) 05Open→03Resolved a:03Volans Since a while there is the `sre.hosts.remove-downtime: Remove the Icinga downtime for the... [10:43:49] (03Abandoned) 10Daniel Kinzler: Revert "Introduce CommentFormatter" [core] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/728537 (owner: 10Daniel Kinzler) [10:44:38] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: Exception raised while executing cookbook sre.hosts.downtime - https://phabricator.wikimedia.org/T259158 (10Volans) The reimage scripts have been converted to the sre.hosts.reimage cookbook and don't have anymore the race condition that was present here, henc... [10:44:55] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [10:45:15] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: Exception raised while executing cookbook sre.hosts.downtime - https://phabricator.wikimedia.org/T259158 (10Volans) 05Open→03Resolved a:03Volans [10:48:11] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage: 'execution expired' on first puppet run - https://phabricator.wikimedia.org/T201317 (10Volans) @ema do you know if this is still happening? If so it seems more of a puppetization issue than a reimage one, should we remove the SRE-tools tag? [10:50:05] 10SRE, 10SRE-tools, 10Infrastructure-Foundations: wmf-auto-reimage should retry on ipmi failures - https://phabricator.wikimedia.org/T201669 (10Volans) 05Open→03Resolved a:03Volans The reimage scripts have been converted to the sre.hosts.reimage cookbook that checks that a working IPMI connection to th... [10:50:51] (03PS2) 10Muehlenhoff: Obsolete role::restbase::base [puppet] - 10https://gerrit.wikimedia.org/r/729943 [10:51:00] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [10:54:16] 10SRE-tools, 10Infrastructure-Foundations: Better detection for "reboot into PXE failed" conditions in wmf-auto-reimage - https://phabricator.wikimedia.org/T261956 (10Volans) p:05Triage→03Medium The reimage scripts have been converted to the sre.hosts.reimage cookbook. While this issue could still happenin... [10:55:57] (03CR) 10Giuseppe Lavagetto: "AIUI these fonts are only needed for thumbnailing, hence they're not really needed anymore on the main application servers. I am thinking " [puppet] - 10https://gerrit.wikimedia.org/r/728568 (https://phabricator.wikimedia.org/T253600) (owner: 10AntiCompositeNumber) [10:59:51] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31595/console" [puppet] - 10https://gerrit.wikimedia.org/r/714975 (https://phabricator.wikimedia.org/T289661) (owner: 10Jbond) [11:02:10] (03PS3) 10Muehlenhoff: Obsolete role::restbase::base [puppet] - 10https://gerrit.wikimedia.org/r/729943 [11:02:35] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [11:03:09] (03CR) 10Volans: "recheck" [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [11:09:35] (03PS7) 10Kormat: mariadb: Page for read-only status issues in both DCs [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) [11:10:26] (03PS1) 10Volans: tox: temporarily limit flake8 concurrency [software/spicerack] - 10https://gerrit.wikimedia.org/r/729948 [11:12:54] 10Puppet, 10Infrastructure-Foundations: Admin module should use systemd-sysuser for syustem accounts - https://phabricator.wikimedia.org/T292965 (10jbond) p:05Triage→03Lowest [11:13:38] (03PS4) 10Muehlenhoff: Obsolete role::restbase::base [puppet] - 10https://gerrit.wikimedia.org/r/729943 [11:18:01] (03CR) 10Jbond: [C: 04-1] "-1 is just for the wrong uid/gid" [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [11:18:33] 10Puppet, 10Infrastructure-Foundations: Admin module should use systemd-sysuser for system accounts - https://phabricator.wikimedia.org/T292965 (10Lucas_Werkmeister_WMDE) [11:20:31] (03CR) 10Volans: [C: 03+2] "Self merging to unblock CI on other CRs" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729948 (owner: 10Volans) [11:21:47] (03PS8) 10Kormat: mariadb: Page for read-only status issues in both DCs [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) [11:24:56] (03CR) 10Kormat: [V: 03+1 C: 04-2] "PCC SUCCESS (NOOP 8 DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31597/console" [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) (owner: 10Kormat) [11:26:34] (03CR) 10Kormat: [V: 03+1 C: 03+2] "Ok, i'm happy with this now, pcc looks sane." [puppet] - 10https://gerrit.wikimedia.org/r/719948 (https://phabricator.wikimedia.org/T290591) (owner: 10Kormat) [11:26:53] (03Merged) 10jenkins-bot: tox: temporarily limit flake8 concurrency [software/spicerack] - 10https://gerrit.wikimedia.org/r/729948 (owner: 10Volans) [11:28:22] (03CR) 10Jelto: [C: 03+1] "lgtm. We are also managing the config file with puppet now, so we can continue here." [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [11:29:54] (03Abandoned) 10Hashar: Enable Content-Security-Policy reporting [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/725900 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [11:30:35] (03Abandoned) 10Hashar: set session lifetime to 604800s (1w) [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/722682 (https://phabricator.wikimedia.org/T288757) (owner: 10Brennen Bearnes) [11:33:11] (03PS6) 10Jelto: modules::gitlab::ssh explicitly add git user with fixed id [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) [11:33:27] (03PS1) 10Btullis: Add the ecs_170 tag to the jupyterjab log pipeline [puppet] - 10https://gerrit.wikimedia.org/r/729957 (https://phabricator.wikimedia.org/T288348) [11:36:15] (03PS1) 10Jgiannelos: tile-pregeneration: Exit envoy sidecar gracefully [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) [11:38:49] (03CR) 10Btullis: [C: 03+2] Correct typo in the name of a hadoop worker [puppet] - 10https://gerrit.wikimedia.org/r/728391 (https://phabricator.wikimedia.org/T275767) (owner: 10Btullis) [11:39:29] (03CR) 10Jelto: modules::gitlab::ssh explicitly add git user with fixed id (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [11:39:49] (03PS2) 10Jgiannelos: tile-pregeneration: Exit envoy sidecar gracefully [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) [11:40:17] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10hashar) @brennen Given the configuration has been moved from Ansible to Puppet may we archive the Gerrit repo? ( https://gerr... [11:40:46] 10SRE, 10DBA, 10Observability-Alerting, 10observability, and 2 others: Database read_only alert has a changing service description - https://phabricator.wikimedia.org/T290591 (10Kormat) 05Open→03Resolved a:03Kormat Ok, the read_only alert now happens to have #p.age for both DCs, so the initial reason... [11:41:44] (03PS3) 10Btullis: Increase the number of the Hadoop HDFS Namenode's service handler threads [puppet] - 10https://gerrit.wikimedia.org/r/723490 (https://phabricator.wikimedia.org/T275767) [11:41:57] (03CR) 10Kormat: sre.switchdc.mediawiki: Downtime read-only checks on the DB primaries (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/718936 (https://phabricator.wikimedia.org/T285803) (owner: 10RLazarus) [11:43:20] (03CR) 10Kormat: [C: 03+1] sre.switchdc.mediawiki: Downtime read-only checks on the DB primaries [cookbooks] - 10https://gerrit.wikimedia.org/r/718936 (https://phabricator.wikimedia.org/T285803) (owner: 10RLazarus) [11:43:59] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/31598/" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [11:44:44] (03PS5) 10Muehlenhoff: Obsolete role::restbase::base [puppet] - 10https://gerrit.wikimedia.org/r/729943 [11:48:50] (03CR) 10Btullis: [C: 03+2] Increase the number of the Hadoop HDFS Namenode's service handler threads [puppet] - 10https://gerrit.wikimedia.org/r/723490 (https://phabricator.wikimedia.org/T275767) (owner: 10Btullis) [11:56:54] (03PS5) 10Arturo Borrero Gonzalez: openstack: cinder: refactor configuration file to its own module [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) [11:57:25] (03PS1) 10Filippo Giunchedi: graphite: disable tags support [puppet] - 10https://gerrit.wikimedia.org/r/729968 (https://phabricator.wikimedia.org/T247963) [11:59:56] (03PS1) 10Jbond: systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 [12:01:33] (03PS2) 10Ema: cache: enable single backend experiment on cp4021 [puppet] - 10https://gerrit.wikimedia.org/r/710244 (https://phabricator.wikimedia.org/T288106) [12:01:52] (03CR) 10jerkins-bot: [V: 04-1] cache: enable single backend experiment on cp4021 [puppet] - 10https://gerrit.wikimedia.org/r/710244 (https://phabricator.wikimedia.org/T288106) (owner: 10Ema) [12:02:11] (03CR) 10jerkins-bot: [V: 04-1] systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 (owner: 10Jbond) [12:02:31] (03PS3) 10Ema: cache: enable single backend experiment on cp4021 [puppet] - 10https://gerrit.wikimedia.org/r/710244 (https://phabricator.wikimedia.org/T288106) [12:04:27] !log install apache security updates on bullseye [12:04:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:34] (03PS6) 10Arturo Borrero Gonzalez: openstack: cinder: refactor configuration file to its own module [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) [12:21:53] (03PS1) 10Filippo Giunchedi: graphite: move production to /srv/carbon as storage directory [puppet] - 10https://gerrit.wikimedia.org/r/729975 (https://phabricator.wikimedia.org/T247963) [12:23:11] (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31603/console" [puppet] - 10https://gerrit.wikimedia.org/r/729975 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [12:23:13] (03PS6) 10Hashar: gitlab: enable Content-Security-Policy reporting [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) [12:23:25] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [12:23:55] (03CR) 10Hashar: "I have amended the patch to add some hiera settings. CSP is disabled on gitlab.wikimedia.org and enabled in report only mode on gitlab-re" [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [12:25:18] (03CR) 10Filippo Giunchedi: [V: 03+1] "cc Jaime for heads up re: changing backup fileset name" [puppet] - 10https://gerrit.wikimedia.org/r/729975 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [12:28:41] (03PS34) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [12:30:29] (03PS2) 10Jbond: systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 [12:31:19] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31604/console" [puppet] - 10https://gerrit.wikimedia.org/r/729970 (owner: 10Jbond) [12:33:02] (03PS7) 10Arturo Borrero Gonzalez: openstack: cinder: refactor configuration file to its own module [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) [12:33:55] (03PS3) 10Jbond: systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 [12:34:18] (03CR) 10Jelto: [C: 03+1] "lgtm. Let me know when you need support for merge and deploy" [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [12:34:44] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31605/console" [puppet] - 10https://gerrit.wikimedia.org/r/729970 (owner: 10Jbond) [12:35:58] (03CR) 10jerkins-bot: [V: 04-1] systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 (owner: 10Jbond) [12:36:55] (03CR) 10Arturo Borrero Gonzalez: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/31606/" [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [12:41:15] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM (though deferring to Hugh)" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [12:45:16] !log cp4027: upgrade varnish to 6.0.8 T292290 [12:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:22] T292290: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 [12:49:27] !log Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams [12:49:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:43] (03CR) 10Volans: "reply to question inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [12:53:47] !log install apache security updates on buster [12:53:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:23] (03CR) 10Volans: Added spicerack.kafka with offset transfer function (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [13:04:03] (03CR) 10David Caro: "Just a question about the PCC, the rest are nits, nice!" [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [13:08:11] (03CR) 10Jelto: [C: 03+2] gitlab: enable Content-Security-Policy reporting [puppet] - 10https://gerrit.wikimedia.org/r/725012 (https://phabricator.wikimedia.org/T285363) (owner: 10Hashar) [13:21:57] (03CR) 10Jcrespo: [C: 03+1] "+1 for the backup changes." [puppet] - 10https://gerrit.wikimedia.org/r/729975 (https://phabricator.wikimedia.org/T247963) (owner: 10Filippo Giunchedi) [13:29:28] (03PS3) 10Effie Mouzeli: mediawiki::mcrouter_wancache: disable ssl listening on mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/727370 [13:37:50] 10SRE-Access-Requests: Requesting access to RESOURCE for USER[S] - https://phabricator.wikimedia.org/T292992 (10KCVelaga_WMF) [13:38:10] 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for KCVelaga (WMF) - https://phabricator.wikimedia.org/T292992 (10KCVelaga_WMF) [13:42:12] !log force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825 [13:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:17] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [13:42:29] (03PS1) 10David Caro: puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) [13:48:14] 10SRE, 10serviceops, 10wikidiff2, 10Community-Tech (CommTech-Sprint-11), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10ldelench_wmf) [13:50:38] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki::mcrouter_wancache: disable ssl listening on mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/727370 (owner: 10Effie Mouzeli) [13:50:53] (03PS4) 10Effie Mouzeli: mediawiki::mcrouter_wancache: disable ssl listening on mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/727370 [13:51:49] (03PS1) 10David Caro: wmcs::instance: use path to allow different systemctl paths [puppet] - 10https://gerrit.wikimedia.org/r/729994 (https://phabricator.wikimedia.org/T292465) [13:52:39] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [13:53:06] (03PS1) 10Jbond: systemd::sysuser: add more error checking [puppet] - 10https://gerrit.wikimedia.org/r/729995 [13:53:56] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31607/console" [puppet] - 10https://gerrit.wikimedia.org/r/729995 (owner: 10Jbond) [13:55:48] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31608/console" [puppet] - 10https://gerrit.wikimedia.org/r/729994 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [13:57:40] (03CR) 10David Caro: [V: 03+1 C: 03+2] wmcs::instance: use path to allow different systemctl paths [puppet] - 10https://gerrit.wikimedia.org/r/729994 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:06:38] (03CR) 10Volans: "question inline" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:12:08] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [14:13:35] (03PS2) 10Jbond: systemd::sysuser: add more error checking [puppet] - 10https://gerrit.wikimedia.org/r/729995 [14:13:42] (03CR) 10Volans: "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/727305 (owner: 10Jbond) [14:14:26] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31609/console" [puppet] - 10https://gerrit.wikimedia.org/r/729995 (owner: 10Jbond) [14:15:08] (03PS10) 10Hnowlan: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [14:15:50] (03PS8) 10Arturo Borrero Gonzalez: openstack: cinder: refactor configuration file to its own module [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) [14:16:12] (03CR) 10Arturo Borrero Gonzalez: openstack: cinder: refactor configuration file to its own module (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [14:19:51] (03CR) 10Hnowlan: Add script to send tile invalidation events (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [14:20:29] (03PS4) 10Jbond: systemd::sysuser: refactor to provide some useful defaults [puppet] - 10https://gerrit.wikimedia.org/r/729970 [14:20:49] (03CR) 10Arturo Borrero Gonzalez: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/31610/" [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [14:21:21] (03CR) 10Volans: [WIP] Add kafka position transfer to wdqs cookbooks (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/727021 (https://phabricator.wikimedia.org/T276469) (owner: 10ZPapierski) [14:23:40] (03CR) 10Jbond: "thanks" [puppet] - 10https://gerrit.wikimedia.org/r/728380 (https://phabricator.wikimedia.org/T283076) (owner: 10Jelto) [14:24:18] (03CR) 10Jbond: [C: 03+1] sre.hosts.reimage: add OS to runtime description [cookbooks] - 10https://gerrit.wikimedia.org/r/729942 (owner: 10Volans) [14:25:15] (03CR) 10David Caro: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [14:26:39] 10SRE, 10SRE-swift-storage, 10ops-codfw: Spontaneous reboot of ms-be2045 - https://phabricator.wikimedia.org/T290881 (10MatthewVernon) a:05Papaul→03MatthewVernon @Papaul system was stable over the weekend, so I'll take this ticket and start restoring this system to the Swift rings. Thanks! [14:26:42] (03PS1) 10MVernon: codfw-prod: start re-adding weight to ms-be2045 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/730000 (https://phabricator.wikimedia.org/T290881) [14:26:54] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/729943 (owner: 10Muehlenhoff) [14:29:49] (03CR) 10MVernon: "Hi," [software/swift-ring] - 10https://gerrit.wikimedia.org/r/730000 (https://phabricator.wikimedia.org/T290881) (owner: 10MVernon) [14:30:03] (03CR) 10David Caro: puppet.PuppetHost.get_ca_server: use only the last line (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:30:08] (03PS1) 10Urbanecm: UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor: Cast IDs to ints [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729914 (https://phabricator.wikimedia.org/T290609) [14:30:55] (03PS1) 10Urbanecm: updateMenteeData: Collect more profiling data [extensions/GrowthExperiments] (wmf/1.38.0-wmf.3) - 10https://gerrit.wikimedia.org/r/729915 (https://phabricator.wikimedia.org/T290609) [14:32:20] (03CR) 10Alexandros Kosiaris: [C: 03+1] logstash: move kubernetes_docker parsing to priority 15 [puppet] - 10https://gerrit.wikimedia.org/r/728683 (https://phabricator.wikimedia.org/T292099) (owner: 10Cwhite) [14:32:31] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [software/swift-ring] - 10https://gerrit.wikimedia.org/r/730000 (https://phabricator.wikimedia.org/T290881) (owner: 10MVernon) [14:32:32] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [14:32:40] (03CR) 10David Caro: puppet.PuppetHost.get_ca_server: use only the last line (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:33:13] (03CR) 10Filippo Giunchedi: [C: 03+1] logstash: kafka input: add manage_truststore parameter [puppet] - 10https://gerrit.wikimedia.org/r/727625 (https://phabricator.wikimedia.org/T288618) (owner: 10Cwhite) [14:34:24] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [14:35:17] (03CR) 10MVernon: [V: 03+2 C: 03+2] codfw-prod: start re-adding weight to ms-be2045 [software/swift-ring] - 10https://gerrit.wikimedia.org/r/730000 (https://phabricator.wikimedia.org/T290881) (owner: 10MVernon) [14:36:10] 10SRE, 10Infrastructure-Foundations: Integrate Bullseye 11.1 point update - https://phabricator.wikimedia.org/T292844 (10MoritzMuehlenhoff) [14:36:30] !log start restoring weight to ms-be2045 T290881 [14:36:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:36] T290881: Spontaneous reboot of ms-be2045 - https://phabricator.wikimedia.org/T290881 [14:36:36] (03PS3) 10Jbond: systemd::sysuser: add more error checking [puppet] - 10https://gerrit.wikimedia.org/r/729995 [14:41:17] (03Abandoned) 10David Caro: cinderutils::ensure: Check falsey instead of empty string [puppet] - 10https://gerrit.wikimedia.org/r/729926 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:42:22] (03PS1) 10David Caro: wmcs-prepare-cinder-volume.py: chown also after mounting [puppet] - 10https://gerrit.wikimedia.org/r/730001 (https://phabricator.wikimedia.org/T292465) [14:44:09] (03CR) 10Jelto: hiera::role::common::kubernetes add helm3 deploy users (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/726862 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [14:44:11] (03PS35) 10ZPapierski: Added spicerack.kafka with offset transfer function [software/spicerack] - 10https://gerrit.wikimedia.org/r/723214 (https://phabricator.wikimedia.org/T276469) [14:44:47] (03PS1) 10Giuseppe Lavagetto: scaffold: bump common templates version [deployment-charts] - 10https://gerrit.wikimedia.org/r/730002 [14:47:15] 10SRE, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10akosiaris) >>! In T242855#7415161, @Aklapper wrote: > Half a year later, is there anyone feeling kind of responsible to p... [14:51:08] (03CR) 10Giuseppe Lavagetto: [C: 04-1] Rename main cluster to services (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/725003 (owner: 10Alexandros Kosiaris) [14:52:16] (03PS4) 10Jbond: systemd::sysuser: add more error checking [puppet] - 10https://gerrit.wikimedia.org/r/729995 [14:54:32] (03CR) 10David Caro: [C: 03+2] wmcs-prepare-cinder-volume.py: chown also after mounting [puppet] - 10https://gerrit.wikimedia.org/r/730001 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [14:56:54] (03CR) 10Giuseppe Lavagetto: [C: 03+1] hiera::role::common::kubernetes add helm3 deploy users (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/726862 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [14:59:08] (03CR) 10Volans: [C: 03+2] sre.hosts.reimage: add OS to runtime description [cookbooks] - 10https://gerrit.wikimedia.org/r/729942 (owner: 10Volans) [14:59:45] !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests [14:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:51] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests [14:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:11] (03Merged) 10jenkins-bot: sre.hosts.reimage: add OS to runtime description [cookbooks] - 10https://gerrit.wikimedia.org/r/729942 (owner: 10Volans) [15:04:25] (03CR) 10Giuseppe Lavagetto: [C: 03+1] toolhub: Crawler CronJob concurrencyPolicy back to Forbid [deployment-charts] - 10https://gerrit.wikimedia.org/r/729891 (https://phabricator.wikimedia.org/T292861) (owner: 10BryanDavis) [15:06:03] RECOVERY - haproxy failover on dbproxy1019 is OK: OK check_failover servers up 15 down 0 https://wikitech.wikimedia.org/wiki/HAProxy [15:10:25] PROBLEM - haproxy failover on dbproxy1019 is CRITICAL: CRITICAL check_failover servers up 15 down 1 https://wikitech.wikimedia.org/wiki/HAProxy [15:11:51] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Create coolest-tool-academy mailing list for Coolest Tool Award - https://phabricator.wikimedia.org/T290511 (10Aklapper) [15:12:25] (03PS1) 10Hashar: role: system::role for all mediawiki roles [puppet] - 10https://gerrit.wikimedia.org/r/730004 [15:12:47] (03CR) 10Hashar: Split canary jobrunner to their own role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/724694 (https://phabricator.wikimedia.org/T291870) (owner: 10Hashar) [15:12:55] (03CR) 10jerkins-bot: [V: 04-1] role: system::role for all mediawiki roles [puppet] - 10https://gerrit.wikimedia.org/r/730004 (owner: 10Hashar) [15:13:22] ACKNOWLEDGEMENT - haproxy failover on dbproxy1019 is CRITICAL: CRITICAL check_failover servers up 15 down 1 Kormat Acking https://wikitech.wikimedia.org/wiki/HAProxy [15:13:23] (03PS2) 10Hashar: Split canary jobrunner to their own role [puppet] - 10https://gerrit.wikimedia.org/r/724694 (https://phabricator.wikimedia.org/T291870) [15:15:12] (03PS2) 10Hashar: role: system::role for all mediawiki roles [puppet] - 10https://gerrit.wikimedia.org/r/730004 [15:15:25] (03CR) 10MSantos: [C: 03+1] tile-pregeneration: Exit envoy sidecar gracefully [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/729959 (https://phabricator.wikimedia.org/T283159) (owner: 10Jgiannelos) [15:19:21] (03PS1) 10David Caro: cinderutils::ensure: give more info when no device found [puppet] - 10https://gerrit.wikimedia.org/r/730005 (https://phabricator.wikimedia.org/T292465) [15:19:25] PROBLEM - Check systemd state on ms-be2043 is CRITICAL: CRITICAL - degraded: The following units failed: session-212429.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:23:25] (03PS1) 10Volans: sre.hosts.reimage: check installed OS version [cookbooks] - 10https://gerrit.wikimedia.org/r/730008 [15:26:39] (03PS1) 10Jbond: systemd::sysuser: also manage a user resource [puppet] - 10https://gerrit.wikimedia.org/r/730012 [15:27:01] (03CR) 10AntiCompositeNumber: mediawiki::packages::fonts: replace fonts-liberation with fonts-liberation2 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/728568 (https://phabricator.wikimedia.org/T253600) (owner: 10AntiCompositeNumber) [15:27:46] (03CR) 10jerkins-bot: [V: 04-1] systemd::sysuser: also manage a user resource [puppet] - 10https://gerrit.wikimedia.org/r/730012 (owner: 10Jbond) [15:28:04] (03CR) 10Jbond: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/730008 (owner: 10Volans) [15:28:57] (03CR) 10Volans: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [15:29:14] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31611/console" [puppet] - 10https://gerrit.wikimedia.org/r/730012 (owner: 10Jbond) [15:29:25] (03CR) 10Jbond: systemd::sysuser: also manage a user resource [puppet] - 10https://gerrit.wikimedia.org/r/730012 (owner: 10Jbond) [15:29:31] (03PS2) 10Jbond: systemd::sysuser: also manage a user resource [puppet] - 10https://gerrit.wikimedia.org/r/730012 [15:32:01] (03PS2) 10Jbond: base::sysctl::core_dumps: move core_dumps to their own class [puppet] - 10https://gerrit.wikimedia.org/r/728457 (owner: 10David Caro) [15:32:38] RECOVERY - Check systemd state on ms-be2043 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:33:20] (03PS4) 10Jelto: hiera::role::common::kubernetes add helm3 deploy users [labs/private] - 10https://gerrit.wikimedia.org/r/726862 (https://phabricator.wikimedia.org/T251305) [15:34:00] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet [15:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:34:15] (03CR) 10Jelto: [C: 03+2] hiera::role::common::kubernetes add helm3 deploy users [labs/private] - 10https://gerrit.wikimedia.org/r/726862 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [15:34:33] (03PS1) 10Vgutierrez: acme_chief: Enable file and systemd watchdogs [puppet] - 10https://gerrit.wikimedia.org/r/730016 (https://phabricator.wikimedia.org/T292619) [15:34:35] (03CR) 10Jelto: [V: 03+2 C: 03+2] hiera::role::common::kubernetes add helm3 deploy users [labs/private] - 10https://gerrit.wikimedia.org/r/726862 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [15:35:10] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Enable file and systemd watchdogs [puppet] - 10https://gerrit.wikimedia.org/r/730016 (https://phabricator.wikimedia.org/T292619) (owner: 10Vgutierrez) [15:36:46] (03CR) 10David Caro: [C: 03+2] puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [15:37:18] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/728457 (owner: 10David Caro) [15:40:16] (03PS2) 10Vgutierrez: acme_chief: Enable file and systemd watchdogs [puppet] - 10https://gerrit.wikimedia.org/r/730016 (https://phabricator.wikimedia.org/T292619) [15:40:44] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet [15:40:46] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Enable file and systemd watchdogs [puppet] - 10https://gerrit.wikimedia.org/r/730016 (https://phabricator.wikimedia.org/T292619) (owner: 10Vgutierrez) [15:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:48] PROBLEM - Check systemd state on ms-be2055 is CRITICAL: CRITICAL - degraded: The following units failed: session-212341.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:41:57] (03PS3) 10Jbond: schemas - metrics: Add puppet keys to the metrics name space [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) [15:42:15] (03CR) 10Jbond: "updated thanks" [software/ecs] - 10https://gerrit.wikimedia.org/r/722873 (https://phabricator.wikimedia.org/T222826) (owner: 10Jbond) [15:43:17] (03PS3) 10Vgutierrez: acme_chief: Enable file and systemd watchdogs [puppet] - 10https://gerrit.wikimedia.org/r/730016 (https://phabricator.wikimedia.org/T292619) [15:49:04] (03CR) 10Alexandros Kosiaris: [C: 03+1] "+1 and thanks for this!" [puppet] - 10https://gerrit.wikimedia.org/r/728648 (owner: 10Dzahn) [16:02:01] (03PS1) 10Jelto: hiera::deployment_server add helm3 deploy user to deployment server [puppet] - 10https://gerrit.wikimedia.org/r/730022 (https://phabricator.wikimedia.org/T251305) [16:03:10] (03PS1) 10Volans: sre.hosts.reimage: update Netbox status [cookbooks] - 10https://gerrit.wikimedia.org/r/730023 [16:04:40] (03CR) 10Jelto: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31614/console" [puppet] - 10https://gerrit.wikimedia.org/r/730022 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [16:06:54] (03PS8) 10Jbond: interface: update rps script to also set the number of queues via ethtool [puppet] - 10https://gerrit.wikimedia.org/r/662688 (https://phabricator.wikimedia.org/T236208) [16:10:02] (03PS3) 10Jbond: P:sre::check_user: add support for wikitech querys [puppet] - 10https://gerrit.wikimedia.org/r/720056 [16:11:32] (03CR) 10jerkins-bot: [V: 04-1] P:sre::check_user: add support for wikitech querys [puppet] - 10https://gerrit.wikimedia.org/r/720056 (owner: 10Jbond) [16:15:21] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: cinder: refactor configuration file to its own module [puppet] - 10https://gerrit.wikimedia.org/r/728390 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [16:18:57] (03PS5) 10Arturo Borrero Gonzalez: cloudbackup: deploy cinder-backup service [puppet] - 10https://gerrit.wikimedia.org/r/728400 (https://phabricator.wikimedia.org/T292546) [16:19:28] (03CR) 10jerkins-bot: [V: 04-1] cloudbackup: deploy cinder-backup service [puppet] - 10https://gerrit.wikimedia.org/r/728400 (https://phabricator.wikimedia.org/T292546) (owner: 10Arturo Borrero Gonzalez) [16:19:40] (03PS1) 10Volans: Revert "tox: temporarily limit flake8 concurrency" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729918 [16:20:04] RECOVERY - Check systemd state on ms-be2055 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:20:53] (03CR) 10JMeybohm: [C: 03+1] hiera::deployment_server add helm3 deploy user to deployment server [puppet] - 10https://gerrit.wikimedia.org/r/730022 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [16:28:29] (03PS1) 10David Caro: p:base: Move the reboot-host script there [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) [16:29:12] (03PS4) 10Jbond: P:sre::check_user: add support for wikitech query [puppet] - 10https://gerrit.wikimedia.org/r/720056 [16:30:42] (03CR) 10jerkins-bot: [V: 04-1] P:sre::check_user: add support for wikitech query [puppet] - 10https://gerrit.wikimedia.org/r/720056 (owner: 10Jbond) [16:30:44] (03CR) 10Jbond: [C: 03+1] "was quick 😊" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729918 (owner: 10Volans) [16:31:00] (03CR) 10Volans: [C: 03+2] Revert "tox: temporarily limit flake8 concurrency" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729918 (owner: 10Volans) [16:32:27] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31615/console" [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [16:32:56] (03PS5) 10Jbond: P:sre::check_user: add support for wikitech query [puppet] - 10https://gerrit.wikimedia.org/r/720056 [16:33:34] (03PS2) 10David Caro: puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) [16:34:46] (03PS6) 10Jbond: P:sre::check_user: add support for wikitech query [puppet] - 10https://gerrit.wikimedia.org/r/720056 [16:35:03] (03CR) 10David Caro: [C: 03+1] puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [16:35:07] (03CR) 10David Caro: [C: 03+2] puppet.PuppetHost.get_ca_server: use only the last line [software/spicerack] - 10https://gerrit.wikimedia.org/r/729990 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [16:39:00] (03CR) 10Jbond: [C: 03+1] "LGTM but see comment" [puppet] - 10https://gerrit.wikimedia.org/r/730025 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [16:39:19] (03CR) 10Jbond: [C: 03+2] P:sre::check_user: add support for wikitech query [puppet] - 10https://gerrit.wikimedia.org/r/720056 (owner: 10Jbond) [16:39:24] (03Merged) 10jenkins-bot: Revert "tox: temporarily limit flake8 concurrency" [software/spicerack] - 10https://gerrit.wikimedia.org/r/729918 (owner: 10Volans) [16:40:39] (03CR) 10Jbond: "volans: post-review would be welcome, one thing I'm curious about is if you think this should be a cookbook and if so should we try and ge" [puppet] - 10https://gerrit.wikimedia.org/r/720056 (owner: 10Jbond) [16:43:48] jbond: ack I'll have a look tomorrow [16:44:06] thanks <3 [16:48:05] (03PS1) 10Jbond: check_user: fix f-string [puppet] - 10https://gerrit.wikimedia.org/r/730028 [16:55:14] (03CR) 10Jbond: [C: 03+2] check_user: fix f-string [puppet] - 10https://gerrit.wikimedia.org/r/730028 (owner: 10Jbond) [16:58:49] (03PS1) 10Volans: dhcp: add support for MAC address based config [software/spicerack] - 10https://gerrit.wikimedia.org/r/730030 (https://phabricator.wikimedia.org/T269855) [17:00:21] 10SRE, 10Anti-Harassment, 10IP Info, 10serviceops, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) >>! In T288844#7407062, @Dzahn wrote: > It would now be possible to test the IPInfo extension using that... [17:08:37] !log force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - T288825 [17:08:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:08:44] T288825: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 [17:13:31] 10SRE, 10User-herron: Rebalance kafka partitions in main-{eqiad,codfw} clusters - https://phabricator.wikimedia.org/T288825 (10elukey) Topics left to move for main-codfw: ` codfw.change-prop.transcludes.resource-change.json codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.json codfw.cpjobq... [17:30:15] (03PS1) 10Jbond: check_user: add error handeling for user that don't exist [puppet] - 10https://gerrit.wikimedia.org/r/730031 [17:33:22] (03CR) 10Jbond: [C: 03+2] check_user: add error handeling for user that don't exist [puppet] - 10https://gerrit.wikimedia.org/r/730031 (owner: 10Jbond) [17:46:00] (03PS1) 10Jbond: check_user: fix bug [puppet] - 10https://gerrit.wikimedia.org/r/730032 [17:46:36] (03CR) 10Jbond: [C: 03+2] check_user: fix bug [puppet] - 10https://gerrit.wikimedia.org/r/730032 (owner: 10Jbond) [17:46:42] (03CR) 10Jbond: [V: 03+2 C: 03+2] check_user: fix bug [puppet] - 10https://gerrit.wikimedia.org/r/730032 (owner: 10Jbond) [17:48:48] (03PS1) 10Jbond: check_user: need an 'f' for f-strings [puppet] - 10https://gerrit.wikimedia.org/r/730033 [17:49:01] (03CR) 10Jbond: [V: 03+2 C: 03+2] check_user: need an 'f' for f-strings [puppet] - 10https://gerrit.wikimedia.org/r/730033 (owner: 10Jbond) [17:57:47] (03CR) 10Krinkle: [C: 03+1] "The idea was that if $_SERVER is misconfigured, from the external client POV, it is a server error." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728552 (owner: 10Giuseppe Lavagetto) [18:27:32] (03PS4) 10Krinkle: static.php: Add support for /static/current rewrites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [18:27:38] (03CR) 10Krinkle: [C: 03+1] static.php: Add support for /static/current rewrites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [18:28:15] (03PS5) 10Krinkle: static.php: Add support for /static/current rewrites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/728553 (https://phabricator.wikimedia.org/T285232) (owner: 10Giuseppe Lavagetto) [18:32:42] 10SRE, 10MediaWiki-General, 10Platform Engineering Code Jam, 10Platform Engineering Roadmap Decision Making, 10Performance-Team (Radar): Allow easier ICU transitions in MediaWiki (change how sortkey collation is managed in the categorylinks table) - https://phabricator.wikimedia.org/T263437 (10Krinkle) [19:01:09] (03PS1) 10Zabe: build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730038 [19:01:36] (03CR) 10jerkins-bot: [V: 04-1] build: Upgrade composer testing stack to latest as used Wikimedia-wide [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730038 (owner: 10Zabe) [19:24:27] (03CR) 10Zabe: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/730038 (owner: 10Zabe) [20:37:33] 10SRE, 10ops-eqiad, 10Analytics-Clusters: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10Jclark-ctr) @BTullis I am available tomorrow morning 2:00 PM UTC. 10AM EST [20:58:08] !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001 [20:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:03] (03PS11) 10Huji: Temporarily disable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) [21:05:51] (03CR) 10Huji: "I did further testing and realized that this requires a right, not a group, to work. The key right in MediaWiki that all logged-in users h" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) (owner: 10Huji) [21:25:14] !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001 [21:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:05:50] PROBLEM - Check systemd state on ms-be1043 is CRITICAL: CRITICAL - degraded: The following units failed: session-210692.scope https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:16:16] RECOVERY - Check systemd state on ms-be1043 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:22:12] 10SRE, 10Performance-Team, 10Traffic, 10User-ema: Package and deploy Varnish 6.0.8 - https://phabricator.wikimedia.org/T292290 (10Krinkle) I've made some improvements to the by-host dash that may be of use: [22:22:59] Hello folks! [22:23:02] In tomorrow's deployments, is the "Puppet request window" a replacement that does backport as well? And what is the difference between the "UTC evening backport window" and the "Puppet request window"? [22:24:55] the puppet window is for changes to puppet [22:25:12] (not related to MediaWiki) [22:25:16] (the configuration for the software we use to manage server configuration) [22:25:40] Okay [22:25:53] changes to MediaWiki configuration should go in the backport windows, they were renamed from this week [22:26:41] Oh I see, thanks for answering!