[00:00:04] RoanKattouw and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for UTC late backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T0000). [00:00:04] Juan_90264 and Seddon: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:04:43] Am present [00:05:04] I'm present [00:09:44] jouncebot: now [00:09:44] For the next 0 hour(s) and 50 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T0000) [00:19:13] legoktm: ? [00:19:18] tgr: ? [00:19:44] tgr_: ? [00:19:45] sorry, I'm working on some other stuff right now [00:20:18] No problem, legoktm [00:35:20] RECOVERY - Disk space on webperf1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [00:37:34] RECOVERY - Disk space on webperf2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf2002&var-datasource=codfw+prometheus/ops [01:03:44] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [01:09:52] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.70 ms [01:47:00] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:47:26] PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 90.07% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1 [02:00:51] (03PS1) 10Legoktm: planet: Add Tyler Cipriani's blog to en [puppet] - 10https://gerrit.wikimedia.org/r/737533 [02:05:42] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:06:50] (03PS1) 10TrainBranchBot: Branch commit for wmf/1.38.0-wmf.8 [core] (wmf/1.38.0-wmf.8) - 10https://gerrit.wikimedia.org/r/737534 [02:06:52] (03CR) 10TrainBranchBot: [C: 03+2] Branch commit for wmf/1.38.0-wmf.8 [core] (wmf/1.38.0-wmf.8) - 10https://gerrit.wikimedia.org/r/737534 (owner: 10TrainBranchBot) [02:09:12] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:17:27] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [02:19:30] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [02:24:24] (03CR) 10Jforrester: Extract reused dblists code into function (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737210 (owner: 10Awight) [02:29:14] (03Merged) 10jenkins-bot: Branch commit for wmf/1.38.0-wmf.8 [core] (wmf/1.38.0-wmf.8) - 10https://gerrit.wikimedia.org/r/737534 (owner: 10TrainBranchBot) [02:34:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:34:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:36:39] (03CR) 10Legoktm: php: allow installing multiple php versions at the same time (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [02:37:36] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:40:51] (03CR) 10Legoktm: profile::mediawiki::php: Allow running multiple php versions in parallel (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [02:45:51] (03CR) 10Legoktm: mediawiki::php: support multiple php version in monitoring too (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [02:52:55] 10SRE, 10Wikimedia-Mailing-lists: Wikipedia-l list needs owners - https://phabricator.wikimedia.org/T295244 (10Legoktm) a:03Quiddity >>! In T295244#7488212, @Quiddity wrote: > Yes, I'm willing to become an owner for the list, but I request a 2nd (and ideally 3rd) owner join to avoid SPOF. {{done}} - maybe r... [03:10:18] PROBLEM - snapshot of s3 in codfw on alert1001 is CRITICAL: snapshot for s3 at codfw taken more than 3 days ago: Most recent backup 2021-11-06 03:01:09 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [03:41:49] (03PS1) 10Razzi: presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/737538 (https://phabricator.wikimedia.org/T292087) [03:42:41] (03PS2) 10Razzi: presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) [03:49:06] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:49:48] 10SRE, 10Wikimedia-Mailing-lists: Request to create new mailing lists for ZHAFC Project - https://phabricator.wikimedia.org/T294676 (10Legoktm) >>! In T294676#7481421, @Jonathan5566 wrote: > To be clear, what kind of on-wiki dissociation will SRE like to see? Will we need to do that on VP? Or consensus between... [04:23:34] PROBLEM - Disk space on stat1006 is CRITICAL: DISK CRITICAL - free space: / 1459 MB (1% inode=92%): /tmp 1459 MB (1% inode=92%): /var/tmp 1459 MB (1% inode=92%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=stat1006&var-datasource=eqiad+prometheus/ops [04:24:50] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event_sanitized_analytics_delayed.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:28:02] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 81, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:28:17] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 233, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:35:14] PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:37:32] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 237, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [04:38:50] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:19:04] RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:25:26] PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:32:07] (03Abandoned) 10Razzi: presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/737538 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [05:34:54] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): wikitech-static down - https://phabricator.wikimedia.org/T295266 (10Andrew) I've seen that host struggle with memory issues in the past, so we may just be seeing organic growth of mediawiki resource needs. It's probably worth figuring out what... [05:38:06] RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:42:28] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:44:27] PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:48:36] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.73 ms [06:05:42] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [06:14:04] RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:20:22] PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:24:27] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.16 ms [06:26:38] RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:52:04] (03CR) 10Razzi: "Patch looks good! Idea for the future of readable yaml inline - splitting these long config options over multiple lines on the yaml file!" [puppet] - 10https://gerrit.wikimedia.org/r/732573 (owner: 10Elukey) [06:56:22] 10SRE, 10serviceops: Clean up old Docker images on deneb - https://phabricator.wikimedia.org/T287222 (10razzi) My home directory's down to 2.5M now too :) [07:04:52] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:09:38] (03CR) 10Giuseppe Lavagetto: php: allow installing multiple php versions at the same time (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [07:19:18] RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:23:12] !log `apt-get clean` on stat1006 to free some space (root partition full) [07:23:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:08] RECOVERY - Disk space on stat1006 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=stat1006&var-datasource=eqiad+prometheus/ops [07:56:54] (03PS1) 10Arturo Borrero Gonzalez: cloud: networktests: runner: use expanded_cmd [puppet] - 10https://gerrit.wikimedia.org/r/737613 (https://phabricator.wikimedia.org/T294955) [07:58:29] (03PS1) 10Arturo Borrero Gonzalez: cloud: networktests: ssh: use -q [puppet] - 10https://gerrit.wikimedia.org/r/737614 (https://phabricator.wikimedia.org/T294955) [07:58:38] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: networktests: runner: use expanded_cmd [puppet] - 10https://gerrit.wikimedia.org/r/737613 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [07:59:13] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: networktests: ssh: use -q [puppet] - 10https://gerrit.wikimedia.org/r/737614 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [08:01:45] (03PS1) 10Arturo Borrero Gonzalez: cloud: networktests: use -q [puppet] - 10https://gerrit.wikimedia.org/r/737615 (https://phabricator.wikimedia.org/T294955) [08:02:40] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: networktests: use -q [puppet] - 10https://gerrit.wikimedia.org/r/737615 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [08:23:06] (03CR) 10Vgutierrez: "tests are still happy:" [puppet] - 10https://gerrit.wikimedia.org/r/737474 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [08:23:45] (03CR) 10Ema: [C: 03+1] varnish: Check remote.ip for local tls terminator detection [puppet] - 10https://gerrit.wikimedia.org/r/737474 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [08:27:53] (03CR) 10Ema: [C: 03+2] varnish: remove ensure-absent for varnishmtail [puppet] - 10https://gerrit.wikimedia.org/r/737417 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [08:30:53] (03PS1) 10Ema: Revert "varnish: remove ensure-absent for varnishmtail" [puppet] - 10https://gerrit.wikimedia.org/r/737450 [08:31:07] (03PS1) 10Arturo Borrero Gonzalez: cloud: networktests: fix some testcases [puppet] - 10https://gerrit.wikimedia.org/r/737620 (https://phabricator.wikimedia.org/T294955) [08:31:30] (03CR) 10Ema: [C: 03+2] Revert "varnish: remove ensure-absent for varnishmtail" [puppet] - 10https://gerrit.wikimedia.org/r/737450 (owner: 10Ema) [08:38:52] (03PS1) 10Ema: varnish: notify varnishmtail@default [puppet] - 10https://gerrit.wikimedia.org/r/737621 (https://phabricator.wikimedia.org/T293879) [08:40:52] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737621 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [08:43:00] !log drop istio 1.6.* and kubeflow-kfserving-build images from the docker registry [08:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:00] (03CR) 10Vgutierrez: [C: 03+2] varnish: Check remote.ip for local tls terminator detection [puppet] - 10https://gerrit.wikimedia.org/r/737474 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [08:48:20] 10SRE, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) Over all the steps you listed seem pretty comprehensive already. Ideally, all this would be handled by the still-to-be written k8s maintenance cookbook (T277677). For downtime... [08:48:49] (03CR) 10Volans: [C: 03+1] "Looks sane to me although I'm not fully familiar with the snowflakes setup" [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/736987 (https://phabricator.wikimedia.org/T289241) (owner: 10Ayounsi) [08:59:59] (03PS1) 10Ema: varnish: remove ensure-absent for varnishmtail [puppet] - 10https://gerrit.wikimedia.org/r/737451 (https://phabricator.wikimedia.org/T293879) [09:01:48] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737451 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [09:03:20] (03CR) 10Vgutierrez: profile::base::certificates: add sslcert::trusted_ca options (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737403 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [09:03:34] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS buster [09:03:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:44] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6002.drmrs.wmnet with OS buster [09:08:56] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: networktests: fix some testcases [puppet] - 10https://gerrit.wikimedia.org/r/737620 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [09:13:03] (03PS7) 10Ideophagous: Bug:T291737 Squashed two commits into one, previous commit comments follow: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735713 [09:13:34] (03CR) 10Vgutierrez: [C: 03+1] varnish: notify varnishmtail@default [puppet] - 10https://gerrit.wikimedia.org/r/737621 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [09:14:05] (03CR) 10Ema: [C: 03+2] varnish: notify varnishmtail@default [puppet] - 10https://gerrit.wikimedia.org/r/737621 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [09:14:07] (03Abandoned) 10Ideophagous: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735712 (owner: 10Ideophagous) [09:14:28] (03PS8) 10Ideophagous: Bug:T291737 Squashed two commits into one, previous commit comments follow: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735713 [09:16:50] (03PS2) 10Ema: varnish: remove ensure-absent for varnishmtail [puppet] - 10https://gerrit.wikimedia.org/r/737451 (https://phabricator.wikimedia.org/T293879) [09:20:59] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737451 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [09:23:52] (03CR) 10David Caro: [C: 03+2] Add codfw cloudvirts ceph::auth::deploy profile [puppet] - 10https://gerrit.wikimedia.org/r/737345 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [09:24:34] (03CR) 10David Caro: [V: 03+1 C: 03+2] libvirt|ceph: small refactor and remove keyrings [puppet] - 10https://gerrit.wikimedia.org/r/737410 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [09:25:36] (03CR) 10Ema: [C: 03+2] varnish: remove ensure-absent for varnishmtail [puppet] - 10https://gerrit.wikimedia.org/r/737451 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [09:25:59] dcaro: ok to puppet-merge your changes? [09:26:10] 9357e2b1f1 and f5ab39ddfb [09:28:03] ema: ack [09:28:12] I was waiting for you to finish xd, did not see the ping [09:28:19] done [09:28:25] thanks [09:28:39] ty! [09:31:11] (03PS1) 10Lucas Werkmeister (WMDE): Add termbox language codes aqg and mcn [extensions/Wikibase] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737453 (https://phabricator.wikimedia.org/T288335) [09:31:13] !log pool cp4026 - T290005 [09:31:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:17] T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 [09:31:28] (03CR) 10Jbond: [C: 03+1] "lgtm minor nit" [puppet] - 10https://gerrit.wikimedia.org/r/737470 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [09:31:52] (03CR) 10Jbond: "LGTM and seems i forgot to hit send yesterday 😊" [puppet] - 10https://gerrit.wikimedia.org/r/737403 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [09:31:57] (03CR) 10Jbond: [C: 03+1] profile::base::certificates: add sslcert::trusted_ca options [puppet] - 10https://gerrit.wikimedia.org/r/737403 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [09:32:23] (03Abandoned) 10Lucas Werkmeister (WMDE): Add termbox language codes aqg and mcn [extensions/Wikibase] (wmf/1.38.0-wmf.7) - 10https://gerrit.wikimedia.org/r/737453 (https://phabricator.wikimedia.org/T288335) (owner: 10Lucas Werkmeister (WMDE)) [09:33:16] (03CR) 10Arturo Borrero Gonzalez: "is this the role a 'localdisk' hypervisor would use?" [puppet] - 10https://gerrit.wikimedia.org/r/737437 (owner: 10David Caro) [09:34:21] (03Abandoned) 10Arturo Borrero Gonzalez: openstack: keystone: allow manila-share auth as novaadmin [puppet] - 10https://gerrit.wikimedia.org/r/724445 (https://phabricator.wikimedia.org/T291257) (owner: 10Arturo Borrero Gonzalez) [09:34:42] (03CR) 10JMeybohm: [C: 04-1] tile-pregeneration: Wait for envoy to get ready (032 comments) [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [09:36:16] (03CR) 10David Caro: r:wmcs::openstack::codfw1dev::virt: delete unused role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737437 (owner: 10David Caro) [09:37:03] (03CR) 10Jbond: "i don't have strong objections here but a hash dose seem like it would be the better data type here?" [puppet] - 10https://gerrit.wikimedia.org/r/737364 (owner: 10Ssingh) [09:44:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] r:wmcs::openstack::codfw1dev::virt: delete unused role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737437 (owner: 10David Caro) [09:48:13] (03PS3) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [10:01:25] 10SRE, 10Cloud-VPS, 10Infrastructure-Foundations, 10netops, 10cloud-services-team (Kanban): cr-codfw: set up static route for 185.15.57.8/30 - https://phabricator.wikimedia.org/T295288 (10ayounsi) 05Open→03Resolved Good catch! Added. ` ayounsi@bast1003:~$ ping -c1 virt.cloudgw.codfw1dev.wikimediaclo... [10:01:48] (03PS2) 10Elukey: profile::base::certificates: add sslcert::trusted_ca options [puppet] - 10https://gerrit.wikimedia.org/r/737403 (https://phabricator.wikimedia.org/T291905) [10:01:50] (03PS4) 10Elukey: profile::kafka::broker: move to profile::base::certificates for pki [puppet] - 10https://gerrit.wikimedia.org/r/737470 (https://phabricator.wikimedia.org/T291905) [10:02:08] (03PS4) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [10:02:15] (03PS1) 10David Caro: ceph::auth: remove some unneeded data from codfw [puppet] - 10https://gerrit.wikimedia.org/r/737627 (https://phabricator.wikimedia.org/T293752) [10:02:17] (03PS1) 10David Caro: ceph::auth::deploy: enable on virt_ceph nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) [10:03:53] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32242/console" [puppet] - 10https://gerrit.wikimedia.org/r/737470 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [10:05:54] (03PS5) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [10:06:40] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [10:11:45] (03PS6) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [10:12:49] (03PS2) 10David Caro: ceph::auth: remove some unneeded data from codfw [puppet] - 10https://gerrit.wikimedia.org/r/737627 (https://phabricator.wikimedia.org/T293752) [10:12:51] (03PS2) 10David Caro: ceph::auth::deploy: enable on virt_ceph nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) [10:12:53] (03PS1) 10David Caro: ceph::auth::delpoy: allow not passing the keyring path [puppet] - 10https://gerrit.wikimedia.org/r/737630 (https://phabricator.wikimedia.org/T293752) [10:14:15] (03PS5) 10Jbond: O:puppetmaster::puppetdb: rename role to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) [10:16:56] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32245/console" [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [10:17:49] (03CR) 10Ayounsi: [V: 03+2 C: 03+2] "Tested locally and works as expected." [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/736987 (https://phabricator.wikimedia.org/T289241) (owner: 10Ayounsi) [10:18:19] (03PS1) 10Arturo Borrero Gonzalez: hiera: cloud: codfw: relocate hiera keys out of the common namespace [puppet] - 10https://gerrit.wikimedia.org/r/737631 [10:18:34] (03PS10) 10Giuseppe Lavagetto: php: allow installing multiple php versions at the same time [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) [10:18:36] (03PS9) 10Giuseppe Lavagetto: profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) [10:18:38] (03PS7) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [10:18:40] (03PS2) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [10:18:42] (03PS1) 10Giuseppe Lavagetto: profile::phabricator::main: update usage of php::extension [puppet] - 10https://gerrit.wikimedia.org/r/737632 [10:18:44] (03PS1) 10Giuseppe Lavagetto: php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 [10:19:19] (03CR) 10Jbond: [V: 03+1 C: 03+2] O:puppetmaster::puppetdb: rename role to puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/701931 (https://phabricator.wikimedia.org/T285666) (owner: 10Jbond) [10:21:18] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [10:21:46] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [10:21:53] (03CR) 10jerkins-bot: [V: 04-1] profile::phabricator::main: update usage of php::extension [puppet] - 10https://gerrit.wikimedia.org/r/737632 (owner: 10Giuseppe Lavagetto) [10:22:17] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [10:22:41] !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS buster [10:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:50] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6002.drmrs.wmnet with OS buster completed: - ganeti6002 (**... [10:23:18] (03CR) 10jerkins-bot: [V: 04-1] php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 (owner: 10Giuseppe Lavagetto) [10:24:01] (03PS2) 10David Caro: ceph::auth::delpoy: allow not passing the keyring path [puppet] - 10https://gerrit.wikimedia.org/r/737630 (https://phabricator.wikimedia.org/T293752) [10:24:03] (03PS3) 10David Caro: ceph::auth: remove some unneeded data from codfw [puppet] - 10https://gerrit.wikimedia.org/r/737627 (https://phabricator.wikimedia.org/T293752) [10:24:05] (03PS3) 10David Caro: ceph::auth::deploy: enable on virt_ceph nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) [10:27:02] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 4 NOOP 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32247/console" [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [10:29:25] (03PS3) 10David Caro: ceph::auth::delpoy: allow not passing the keyring path [puppet] - 10https://gerrit.wikimedia.org/r/737630 (https://phabricator.wikimedia.org/T293752) [10:29:27] (03PS4) 10David Caro: ceph::auth: remove some unneeded data from codfw [puppet] - 10https://gerrit.wikimedia.org/r/737627 (https://phabricator.wikimedia.org/T293752) [10:29:29] (03PS4) 10David Caro: ceph::auth::deploy: enable on virt_ceph nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) [10:31:36] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.2.9 [software/homer] - 10https://gerrit.wikimedia.org/r/737635 [10:31:38] (03PS1) 10Ema: varnish::logging: add new parameter mtail_programs [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) [10:33:59] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC NOOP: https://puppet-compiler.wmflabs.org/compiler1003/32246/" [puppet] - 10https://gerrit.wikimedia.org/r/737631 (owner: 10Arturo Borrero Gonzalez) [10:34:15] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [10:37:35] (03CR) 10Ayounsi: [C: 03+1] CHANGELOG: add changelogs for release v0.2.9 [software/homer] - 10https://gerrit.wikimedia.org/r/737635 (owner: 10Volans) [10:38:11] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.2.9 [software/homer] - 10https://gerrit.wikimedia.org/r/737635 (owner: 10Volans) [10:42:12] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.2.9 [software/homer] - 10https://gerrit.wikimedia.org/r/737635 (owner: 10Volans) [10:43:08] (03PS1) 10ArielGlenn: Move enterprise dump download creds to more canonical dir [labs/private] - 10https://gerrit.wikimedia.org/r/737637 (https://phabricator.wikimedia.org/T273585) [10:44:01] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 4 NOOP 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32249/console" [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [10:45:28] (03CR) 10David Caro: [V: 03+1 C: 03+2] "PCC now looks good, only changing the parameters but none of the core resources (refactor)" [puppet] - 10https://gerrit.wikimedia.org/r/737628 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [10:45:33] (03CR) 10David Caro: [C: 03+2] ceph::auth: remove some unneeded data from codfw [puppet] - 10https://gerrit.wikimedia.org/r/737627 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [10:45:38] (03CR) 10David Caro: [C: 03+2] ceph::auth::delpoy: allow not passing the keyring path [puppet] - 10https://gerrit.wikimedia.org/r/737630 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [10:45:43] (03CR) 10ArielGlenn: [V: 03+2 C: 03+2] Move enterprise dump download creds to more canonical dir [labs/private] - 10https://gerrit.wikimedia.org/r/737637 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [10:46:57] apergos: can I merge your change to labs/private? [10:47:20] (03PS3) 10ArielGlenn: add credentials file for downloading enterprise html dumps [puppet] - 10https://gerrit.wikimedia.org/r/736461 (https://phabricator.wikimedia.org/T273585) [10:48:05] (03CR) 10ArielGlenn: add credentials file for downloading enterprise html dumps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/736461 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [10:49:22] (03PS1) 10Majavah: dynamicproxy: fix backup interval [puppet] - 10https://gerrit.wikimedia.org/r/737639 [10:50:49] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] dynamicproxy: fix backup interval [puppet] - 10https://gerrit.wikimedia.org/r/737639 (owner: 10Majavah) [10:51:04] dcaro: I merged it, and sorry I missed the ping in here [10:51:12] or at least I think I merged it, no? [10:51:27] apergos: not no puppetmaster it seems (puppet-merge) [10:51:32] ah woops [10:51:34] yes feel free [10:51:36] my bad [10:51:40] ack, np [10:52:05] arturo: now your patch got in, should I merge it? [10:52:14] dcaro: yes please [10:52:21] done [10:57:54] (03PS2) 10Ema: varnish::logging: add new parameter mtail_programs [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) [10:58:11] zpnzapcsä [10:58:54] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [10:59:17] (03PS1) 10David Caro: ceph::auth: Move codfw1dev params to the right yaml [puppet] - 10https://gerrit.wikimedia.org/r/737641 (https://phabricator.wikimedia.org/T293752) [11:01:41] (03CR) 10David Caro: [C: 03+2] ceph::auth: Move codfw1dev params to the right yaml [puppet] - 10https://gerrit.wikimedia.org/r/737641 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [11:01:52] (03CR) 10Elukey: [C: 03+2] profile::base::certificates: add sslcert::trusted_ca options [puppet] - 10https://gerrit.wikimedia.org/r/737403 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:02:13] elukey: can I merge your patch to puppet? [11:02:17] yep thanks! [11:02:25] done :) [11:02:33] :) [11:04:00] (03PS5) 10Elukey: profile::kafka::broker: move to profile::base::certificates for pki [puppet] - 10https://gerrit.wikimedia.org/r/737470 (https://phabricator.wikimedia.org/T291905) [11:06:57] (03PS1) 10Arturo Borrero Gonzalez: cloud: networktests: rework some of the raw icmp checks [puppet] - 10https://gerrit.wikimedia.org/r/737642 (https://phabricator.wikimedia.org/T294955) [11:08:26] (03CR) 10Elukey: [C: 03+2] profile::kafka::broker: move to profile::base::certificates for pki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737470 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:08:52] (03PS3) 10Ema: varnish::logging: pass list of mtail programs from profile [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) [11:09:31] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: networktests: rework some of the raw icmp checks [puppet] - 10https://gerrit.wikimedia.org/r/737642 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [11:09:45] (03PS1) 10Majavah: P::novaproxy: deploy dhparam [puppet] - 10https://gerrit.wikimedia.org/r/737643 [11:10:42] (03PS1) 10Jbond: P:certificates: add defaults to cloud [puppet] - 10https://gerrit.wikimedia.org/r/737644 (https://phabricator.wikimedia.org/T291905) [11:10:44] (03PS4) 10Ema: varnish::logging: pass list of mtail programs from profile [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) [11:11:02] (03CR) 10Jbond: [C: 03+2] P:certificates: add defaults to cloud [puppet] - 10https://gerrit.wikimedia.org/r/737644 (https://phabricator.wikimedia.org/T291905) (owner: 10Jbond) [11:11:09] (03CR) 10Jbond: [V: 03+2 C: 03+2] P:certificates: add defaults to cloud [puppet] - 10https://gerrit.wikimedia.org/r/737644 (https://phabricator.wikimedia.org/T291905) (owner: 10Jbond) [11:12:14] (03PS1) 10Elukey: sslcert::trusted_ca: fix file title [puppet] - 10https://gerrit.wikimedia.org/r/737645 (https://phabricator.wikimedia.org/T291905) [11:12:51] (03PS1) 10Vgutierrez: varnish: Run nrpe UDS check as root [puppet] - 10https://gerrit.wikimedia.org/r/737646 (https://phabricator.wikimedia.org/T290005) [11:15:06] (03PS1) 10David Caro: ceph::auth::keyring: Add possibility to override file permissions [puppet] - 10https://gerrit.wikimedia.org/r/737647 (https://phabricator.wikimedia.org/T293752) [11:17:05] (03PS1) 10Arturo Borrero Gonzalez: wikimediacloud.org: add A records for cloudgw2001-dev/2002-dev [dns] - 10https://gerrit.wikimedia.org/r/737648 (https://phabricator.wikimedia.org/T294955) [11:18:13] (03CR) 10Jbond: [C: 03+1] sslcert::trusted_ca: fix file title [puppet] - 10https://gerrit.wikimedia.org/r/737645 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:18:27] (03CR) 10Elukey: [C: 03+2] sslcert::trusted_ca: fix file title [puppet] - 10https://gerrit.wikimedia.org/r/737645 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:18:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wikimediacloud.org: add A records for cloudgw2001-dev/2002-dev [dns] - 10https://gerrit.wikimedia.org/r/737648 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [11:19:59] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [11:20:19] (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32254/console" [puppet] - 10https://gerrit.wikimedia.org/r/737646 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:21:19] (03CR) 10David Caro: [C: 03+2] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/32253/" [puppet] - 10https://gerrit.wikimedia.org/r/737647 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [11:22:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] P::novaproxy: deploy dhparam [puppet] - 10https://gerrit.wikimedia.org/r/737643 (owner: 10Majavah) [11:23:59] (03CR) 10Jgiannelos: tile-pregeneration: Wait for envoy to get ready (032 comments) [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [11:24:18] (03PS6) 10Jgiannelos: tile-pregeneration: Wait for envoy to get ready [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) [11:24:31] (03CR) 10Ema: [C: 03+2] varnish::logging: pass list of mtail programs from profile [puppet] - 10https://gerrit.wikimedia.org/r/737634 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [11:27:31] (03PS1) 10Btullis: Update the times at which refine_sanitize monitor jobs are run [puppet] - 10https://gerrit.wikimedia.org/r/737650 [11:27:51] (03CR) 10Ema: [C: 03+1] varnish: Run nrpe UDS check as root [puppet] - 10https://gerrit.wikimedia.org/r/737646 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:28:17] (03CR) 10Btullis: "This should stop monitor jobs running before its corresponding refine job." [puppet] - 10https://gerrit.wikimedia.org/r/737650 (owner: 10Btullis) [11:28:54] (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] varnish: Run nrpe UDS check as root [puppet] - 10https://gerrit.wikimedia.org/r/737646 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:29:02] (03PS1) 10Volans: Upstream release v0.2.9 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/737651 [11:29:23] (03CR) 10Btullis: "I haven't added the monitorin_interval to the jobs on the test cluster. Will update." [puppet] - 10https://gerrit.wikimedia.org/r/737650 (owner: 10Btullis) [11:30:36] (03PS11) 10Giuseppe Lavagetto: php: allow installing multiple php versions at the same time [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) [11:30:38] (03PS2) 10Giuseppe Lavagetto: profile::phabricator::main: update usage of php::extension [puppet] - 10https://gerrit.wikimedia.org/r/737632 [11:30:40] (03PS10) 10Giuseppe Lavagetto: profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) [11:30:42] (03PS2) 10Giuseppe Lavagetto: php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 [11:30:44] (03PS8) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [11:30:46] (03PS3) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [11:32:07] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32255/console" [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [11:32:15] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster [11:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:27] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster [11:32:33] (03CR) 10Volans: [V: 03+2 C: 03+2] Upstream release v0.2.9 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/737651 (owner: 10Volans) [11:32:46] (03PS1) 10Elukey: sslcert::trusted_ca: add explicit ordering for jks [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) [11:34:44] (03PS1) 10Majavah: hieradata: add new project-proxies [puppet] - 10https://gerrit.wikimedia.org/r/737653 (https://phabricator.wikimedia.org/T295235) [11:34:58] (03CR) 10jerkins-bot: [V: 04-1] sslcert::trusted_ca: add explicit ordering for jks [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:36:15] (03PS1) 10Ema: varnish: remove cachestats.py [puppet] - 10https://gerrit.wikimedia.org/r/737655 (https://phabricator.wikimedia.org/T184942) [11:37:33] (03CR) 10Jbond: "see comment have marked the ci complainet" [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:37:41] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737655 (https://phabricator.wikimedia.org/T184942) (owner: 10Ema) [11:38:12] (03CR) 10Arturo Borrero Gonzalez: ceph::auth::keyring: Add possibility to override file permissions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737647 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [11:39:05] !log volans@deploy1002 Started deploy [homer/deploy@c570af3]: Homer release v0.2.9 [11:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:36] (03CR) 10David Caro: ceph::auth::keyring: Add possibility to override file permissions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737647 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [11:40:34] !log volans@deploy1002 Finished deploy [homer/deploy@c570af3]: Homer release v0.2.9 (duration: 01m 29s) [11:40:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:38] (03PS2) 10Elukey: sslcert::trusted_ca: add explicit ordering for jks [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) [11:41:03] (03CR) 10Elukey: "Thanks, should be fixed :)" [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:41:05] (03PS2) 10Btullis: Update the times at which refine_sanitize monitor jobs are run [puppet] - 10https://gerrit.wikimedia.org/r/737650 [11:42:52] (03CR) 10JMeybohm: "The" [deployment-charts] - 10https://gerrit.wikimedia.org/r/732374 (https://phabricator.wikimedia.org/T290966) (owner: 10JMeybohm) [11:43:15] (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32258/console" [puppet] - 10https://gerrit.wikimedia.org/r/737650 (owner: 10Btullis) [11:44:05] (03PS1) 10Vgutierrez: site: Reimage cp5006 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/737656 (https://phabricator.wikimedia.org/T290005) [11:44:08] (03CR) 10Elukey: [C: 03+2] sslcert::trusted_ca: add explicit ordering for jks [puppet] - 10https://gerrit.wikimedia.org/r/737652 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [11:45:37] !log depool cp5006 to be reimaged as cache::upload_haproxy - T290005 [11:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:40] T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 [11:46:52] (03CR) 10Vgutierrez: [C: 03+2] site: Reimage cp5006 as cache::upload_haproxy [puppet] - 10https://gerrit.wikimedia.org/r/737656 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [11:47:30] !log volans@cumin2002 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.2.9 - volans@cumin2002 [11:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:08] !log vgutierrez@cumin1001 START - Cookbook sre.hosts.reimage for host cp5006.eqsin.wmnet with OS buster [11:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:19] !log volans@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.2.9 - volans@cumin2002 [11:48:20] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp5006.eqsin.wmnet with OS buster [11:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:22] (03CR) 10Btullis: [V: 03+1] Update the times at which refine_sanitize monitor jobs are run (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737650 (owner: 10Btullis) [11:48:30] (03PS3) 10Btullis: Update the times at which refine_sanitize monitor jobs are run [puppet] - 10https://gerrit.wikimedia.org/r/737650 [11:50:33] (03PS1) 10Btullis: Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 [11:51:10] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10Vgutierrez) [11:53:03] (03CR) 10Btullis: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32259/console" [puppet] - 10https://gerrit.wikimedia.org/r/737658 (owner: 10Btullis) [11:57:45] (03PS4) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [11:57:47] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [11:57:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:04] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32260/console" [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [12:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for UTC morning backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1200). [12:00:05] Lucas_WMDE, Seddon, and awight: A patch you scheduled for UTC morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:14] o/ [12:00:21] o/ [12:00:51] I can deploy [12:00:55] unless Lucas_WMDE wants to [12:01:12] I can deploy too [12:01:33] I’ll start with my config change, shouldn’t take long [12:01:43] ack [12:01:57] Seddon: your patch doesn't apply cleanly to wmf.7 [12:02:10] and i also assume you want a wmf.8 one too? [12:02:32] There is no train this week is there? [12:02:57] right [12:02:59] so only wmf.8 [12:03:10] (03PS3) 10Lucas Werkmeister (WMDE): Add language codes agq and mcn to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734717 (https://phabricator.wikimedia.org/T288335) (owner: 10Mbch331) [12:03:23] only wmf.7, no? [12:03:29] *wmf.7 [12:03:54] Seddon: can you make the patch apply to wmf.7? You can do so by opening your local clone of MediaSearch, check-outing wmf/1.38.0-wmf.7 and doing `git cherry-pick ` [12:03:58] it will complain about a conflict [12:04:12] you need to fix the conflict, and then run `git review -R wmf/1.38.0-wmf.7` [12:04:12] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "Rebased (conflicted with I55f2d36aa3) and also added the codes to the commonswiki list." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734717 (https://phabricator.wikimedia.org/T288335) (owner: 10Mbch331) [12:05:03] (03Merged) 10jenkins-bot: Add language codes agq and mcn to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734717 (https://phabricator.wikimedia.org/T288335) (owner: 10Mbch331) [12:05:07] urbanecm: I will certainly try :D [12:05:19] ping if there are any issues :)) [12:05:39] testing the first config change on mwdebug1001 [12:06:27] seems fine, syncing [12:07:43] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:53] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734717|Add language codes agq and mcn to wmgExtraLanguageNames (T288335, T293884)]] (duration: 00m 56s) [12:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:57] T293884: Include Massa(mcn) Language support on Wikidata - https://phabricator.wikimedia.org/T293884 [12:07:57] T288335: Include Aghem(agq) Language support on Wikidata - https://phabricator.wikimedia.org/T288335 [12:08:35] (03CR) 10Btullis: "Thanks for the guidance @cwhite." [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:09:15] (03PS3) 10Btullis: Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) [12:09:15] awight: want to deploy that unused global statement removal? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/735769 [12:09:32] otherwise I can also do it [12:09:45] If you have the time, it's trivial! [12:09:50] ok :) [12:09:54] Just a no-op clean-up [12:09:55] (03PS2) 10Lucas Werkmeister (WMDE): Remove unused `global` statement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [12:09:57] Thanks :-) [12:10:31] (03PS12) 10Giuseppe Lavagetto: php: allow installing multiple php versions at the same time [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) [12:10:33] (03PS3) 10Giuseppe Lavagetto: profile::phabricator::main: update usage of php::extension [puppet] - 10https://gerrit.wikimedia.org/r/737632 [12:10:35] (03PS11) 10Giuseppe Lavagetto: profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) [12:10:37] (03PS3) 10Giuseppe Lavagetto: php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 [12:10:39] (03PS9) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [12:10:41] (03PS5) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [12:10:43] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Remove unused `global` statement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [12:11:20] (03CR) 10JMeybohm: [C: 03+1] tile-pregeneration: Wait for envoy to get ready [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [12:11:21] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:11:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:31] (03CR) 10jerkins-bot: [V: 04-1] Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:12:14] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [12:12:29] !log mmandere@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6003.drmrs.wmnet with OS buster [12:12:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:39] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster executed with errors: - gan... [12:13:22] @urbanecm lets leave this for now, I'll try get things sorted for later on [12:13:27] ack [12:14:25] (03Merged) 10jenkins-bot: Remove unused `global` statement [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [12:15:59] quickly testing ^ on mwdebug1001 [12:17:34] no warnings in logstash, looks good to sync [12:17:50] (03PS4) 10Btullis: Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) [12:18:17] ty! [12:18:24] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [12:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:38] PROBLEM - k8s API server requests latencies on kubemaster2001 is CRITICAL: instance=10.192.0.56 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:18:56] !log lucaswerkmeister-wmde@deploy1002 Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:735769|Remove unused `global` statement]] (duration: 00m 55s) [12:18:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:02] (03PS1) 10Elukey: profile::kafka::mirror: add support for PKI-enabled truststore [puppet] - 10https://gerrit.wikimedia.org/r/737661 (https://phabricator.wikimedia.org/T291905) [12:20:02] (03CR) 10jerkins-bot: [V: 04-1] Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:20:11] sounds like we’re done with the window? [12:20:24] and the backport might happen later [12:20:38] RECOVERY - k8s API server requests latencies on kubemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:21:02] (03PS5) 10Btullis: Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) [12:21:15] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:21:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:28] (03PS2) 10Elukey: profile::kafka::mirror: add support for PKI-enabled truststore [puppet] - 10https://gerrit.wikimedia.org/r/737661 (https://phabricator.wikimedia.org/T291905) [12:22:00] !log UTC morning backport+config window done [12:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:10] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 3 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32262/console" [puppet] - 10https://gerrit.wikimedia.org/r/737661 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [12:23:43] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:24:58] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [12:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:25:42] (03CR) 10Awight: siteFromDB returns database suffix (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737209 (owner: 10Awight) [12:27:08] PROBLEM - k8s API server requests latencies on kubemaster1001 is CRITICAL: instance=10.64.0.117 verb={DELETE,LIST} https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:28:08] RECOVERY - k8s API server requests latencies on kubemaster1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [12:29:02] (03PS1) 10Jgiannelos: tegola-vector-tiles: Configure pregeneration retries [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) [12:33:28] (03CR) 10Jgiannelos: [C: 03+2] tile-pregeneration: Wait for envoy to get ready [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [12:34:34] (03Merged) 10jenkins-bot: tile-pregeneration: Wait for envoy to get ready [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/737481 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [12:36:58] (03PS6) 10Btullis: Add the first eventgate alert to Alertmanager [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) [12:37:28] (03CR) 10Btullis: Add the first eventgate alert to Alertmanager (036 comments) [alerts] - 10https://gerrit.wikimedia.org/r/736490 (https://phabricator.wikimedia.org/T293399) (owner: 10Btullis) [12:38:17] (03CR) 10Jgiannelos: "According to docs https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [12:42:03] (03CR) 10Jgiannelos: tegola-vector-tiles: Configure pregeneration retries (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [12:46:05] (03PS1) 10Jbond: admin: extend contract for west1 [puppet] - 10https://gerrit.wikimedia.org/r/737666 [12:46:23] (03CR) 10Jbond: [C: 03+2] admin: extend contract for west1 [puppet] - 10https://gerrit.wikimedia.org/r/737666 (owner: 10Jbond) [12:57:02] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:59:50] (03CR) 10Ema: [C: 03+2] varnish: remove cachestats.py [puppet] - 10https://gerrit.wikimedia.org/r/737655 (https://phabricator.wikimedia.org/T184942) (owner: 10Ema) [13:02:47] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster [13:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:58] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster [13:05:10] (03PS1) 10Arturo Borrero Gonzalez: wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) [13:08:57] (03CR) 10jerkins-bot: [V: 04-1] wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [13:09:27] !log mmandere@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6003.drmrs.wmnet with OS buster [13:09:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:36] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster executed with errors: - gan... [13:15:03] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS buster [13:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:13] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster [13:20:46] (03PS1) 10David Caro: ceph::auth: Add deploy for eqiad virt_ceph hosts too [puppet] - 10https://gerrit.wikimedia.org/r/737668 (https://phabricator.wikimedia.org/T293752) [13:21:21] (03CR) 10jerkins-bot: [V: 04-1] ceph::auth: Add deploy for eqiad virt_ceph hosts too [puppet] - 10https://gerrit.wikimedia.org/r/737668 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [13:21:30] (03CR) 10Elukey: [V: 03+1 C: 03+2] profile::kafka::mirror: add support for PKI-enabled truststore [puppet] - 10https://gerrit.wikimedia.org/r/737661 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:21:47] (03PS1) 10Jgiannelos: tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/737669 [13:22:50] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 7 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32263/console" [puppet] - 10https://gerrit.wikimedia.org/r/737668 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [13:23:56] (03PS2) 10David Caro: ceph::auth: Add deploy for eqiad virt_ceph hosts too [puppet] - 10https://gerrit.wikimedia.org/r/737668 (https://phabricator.wikimedia.org/T293752) [13:26:17] (03CR) 10David Caro: [C: 03+2] "PCC looks good, no real changes" [puppet] - 10https://gerrit.wikimedia.org/r/737668 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [13:26:20] (03PS1) 10Ema: varnish::logging: remove statsd_host and mtail_progs [puppet] - 10https://gerrit.wikimedia.org/r/737670 (https://phabricator.wikimedia.org/T293879) [13:27:38] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737670 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [13:28:30] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/737669 (owner: 10Jgiannelos) [13:29:11] (03PS1) 10Elukey: profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) [13:29:44] (03CR) 10jerkins-bot: [V: 04-1] profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:29:52] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for David Martin - https://phabricator.wikimedia.org/T295264 (10Aklapper) @RLazarus: Hmm, it looks like [adding to the Phabricator group wmf-nda](https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#WMF_Group) was skipped? [13:30:55] (03PS2) 10Elukey: profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) [13:31:28] (03CR) 10jerkins-bot: [V: 04-1] profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:31:56] (03PS1) 10David Caro: ceph::auth: add eqiad counterpart [labs/private] - 10https://gerrit.wikimedia.org/r/737673 [13:32:38] (03CR) 10David Caro: [C: 03+2] ceph::auth: add eqiad counterpart [labs/private] - 10https://gerrit.wikimedia.org/r/737673 (owner: 10David Caro) [13:32:49] (03CR) 10David Caro: [V: 03+2 C: 03+2] ceph::auth: add eqiad counterpart [labs/private] - 10https://gerrit.wikimedia.org/r/737673 (owner: 10David Caro) [13:33:07] (03Merged) 10jenkins-bot: tegola-vector-tiles: Bump image to latest version [deployment-charts] - 10https://gerrit.wikimedia.org/r/737669 (owner: 10Jgiannelos) [13:33:46] (03PS2) 10Ema: varnish::logging: remove statsd_host and mtail_progs [puppet] - 10https://gerrit.wikimedia.org/r/737670 (https://phabricator.wikimedia.org/T293879) [13:33:48] (03PS3) 10Elukey: profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) [13:33:59] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737670 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [13:34:32] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32266/console" [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:37:10] (03PS1) 10David Caro: ceph::auth: move the eqiad setting to the right yaml [puppet] - 10https://gerrit.wikimedia.org/r/737675 [13:39:04] (03CR) 10Elukey: [V: 03+1] "Missed one thing, the kafka ssl properties are not yet working.." [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:44:32] 10SRE, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) Thanks for the feedback. Then instead of using `sre.switchdc.services` I would like to depool the services using conftool directly. I queried `confctl` and checked what services... [13:46:01] (03CR) 10Elukey: [V: 03+1 C: 03+2] profile::base::certificates: add truststore password [puppet] - 10https://gerrit.wikimedia.org/r/737672 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [13:46:42] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 8): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32269/console" [puppet] - 10https://gerrit.wikimedia.org/r/737675 (owner: 10David Caro) [13:47:53] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [13:47:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:51] (03PS1) 10David Caro: pcc: increase timeout [puppet] - 10https://gerrit.wikimedia.org/r/737678 [13:50:00] !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5006.eqsin.wmnet with OS buster [13:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:05] (03PS2) 10David Caro: pcc: increase timeout [puppet] - 10https://gerrit.wikimedia.org/r/737678 [13:50:11] 10SRE, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp5006.eqsin.wmnet with OS buster c... [13:51:46] !log pool cp5006 (upload) running haproxy-tls - T290005 [13:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:49] T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 [13:52:19] (03CR) 10Ema: [C: 03+2] varnish::logging: remove statsd_host and mtail_progs [puppet] - 10https://gerrit.wikimedia.org/r/737670 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [13:58:02] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [14:00:43] (03PS1) 10David Caro: ceph::auth: put on the correct path [labs/private] - 10https://gerrit.wikimedia.org/r/737681 [14:01:27] (03CR) 10David Caro: [C: 03+2] ceph::auth: put on the correct path [labs/private] - 10https://gerrit.wikimedia.org/r/737681 (owner: 10David Caro) [14:01:41] (03CR) 10David Caro: [V: 03+2 C: 03+2] ceph::auth: put on the correct path [labs/private] - 10https://gerrit.wikimedia.org/r/737681 (owner: 10David Caro) [14:04:41] (03CR) 10Hnowlan: [V: 03+1] "One difference in terms of modules that stands out with this change is the removal of mod_deflate - I don't see enough history to indicate" [puppet] - 10https://gerrit.wikimedia.org/r/576913 (https://phabricator.wikimedia.org/T246389) (owner: 10Hnowlan) [14:05:56] (03CR) 10Jbond: [C: 03+1] pcc: increase timeout [puppet] - 10https://gerrit.wikimedia.org/r/737678 (owner: 10David Caro) [14:08:33] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [14:08:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:34] (03PS1) 10Elukey: Reduce verbosity of the log commit message [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 [14:11:13] !log elukey@cumin1001 START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001 [14:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:06] (03CR) 10Volans: [C: 03+1] "LGTM, just for full disclosure, this will not print the task ID. I guess it's ok in most cases." [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 (owner: 10Elukey) [14:17:14] (03PS1) 10Majavah: hieradata: add cloud-cumin04 [puppet] - 10https://gerrit.wikimedia.org/r/737709 [14:19:41] !log mmandere@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS buster [14:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:50] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6003.drmrs.wmnet with OS buster completed: - ganeti6003 (**... [14:20:04] (03CR) 10Ottomata: presto: enable ui over http (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [14:21:23] !log elukey@cumin1001 END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001 [14:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:54] (03CR) 10Elukey: "Adding Ben and Razzi to double check." [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 (owner: 10Elukey) [14:22:12] (03CR) 10Ottomata: Update the times at which refine_sanitize monitor jobs are run (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737650 (owner: 10Btullis) [14:22:32] (03CR) 10Ottomata: [C: 03+1] Enable refine_sanitize_delayed jobs in test [puppet] - 10https://gerrit.wikimedia.org/r/737658 (owner: 10Btullis) [14:23:25] (03PS1) 10Elukey: Enable PKI-based TLS certificate for kafka-test1006 [puppet] - 10https://gerrit.wikimedia.org/r/737711 (https://phabricator.wikimedia.org/T291905) [14:24:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/32261/ shows no actual change in any file and /or actual relation chain. I am going to bo" [puppet] - 10https://gerrit.wikimedia.org/r/736276 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [14:24:59] (03CR) 10Elukey: [C: 03+2] Enable PKI-based TLS certificate for kafka-test1006 [puppet] - 10https://gerrit.wikimedia.org/r/737711 (https://phabricator.wikimedia.org/T291905) (owner: 10Elukey) [14:25:30] _joe_ feel free to merge! [14:25:45] <_joe_> elukey: done [14:27:09] (03PS4) 10Ssingh: dnsdist: allow setting additional custom HTTP response headers [puppet] - 10https://gerrit.wikimedia.org/r/737364 [14:29:36] (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1003/32276/doh1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/737364 (owner: 10Ssingh) [14:30:00] (03CR) 10Ssingh: dnsdist: allow setting additional custom HTTP response headers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737364 (owner: 10Ssingh) [14:30:27] (03CR) 10Andrew Bogott: [C: 03+2] hieradata: add new project-proxies [puppet] - 10https://gerrit.wikimedia.org/r/737653 (https://phabricator.wikimedia.org/T295235) (owner: 10Majavah) [14:32:45] (03CR) 10Jbond: [C: 03+1] dnsdist: allow setting additional custom HTTP response headers [puppet] - 10https://gerrit.wikimedia.org/r/737364 (owner: 10Ssingh) [14:38:15] (03CR) 10Ssingh: [C: 03+2] dnsdist: allow setting additional custom HTTP response headers [puppet] - 10https://gerrit.wikimedia.org/r/737364 (owner: 10Ssingh) [14:44:58] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-test1008 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=test-eqiad&var-kafka_broker=kafka-test1008 [14:47:28] this is me sorry [14:47:33] test cluster [14:52:01] !log rebooting ganeti6003 [14:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:46] (03CR) 10Juan90264: "LGTM :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737082 (https://phabricator.wikimedia.org/T295267) (owner: 10Bodhisattwa) [15:03:52] (03PS1) 10Giuseppe Lavagetto: admin: upgrade my bash functions a bit [puppet] - 10https://gerrit.wikimedia.org/r/737720 [15:03:54] (03CR) 10Juan90264: [C: 03+1] create 2022 namespace for wikimaniawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737082 (https://phabricator.wikimedia.org/T295267) (owner: 10Bodhisattwa) [15:05:01] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetlabs: create puppet 7 environment in WMCS to test code - https://phabricator.wikimedia.org/T294841 (10jbond) [15:06:34] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetlabs: create puppet 7 environment in WMCS to test code - https://phabricator.wikimedia.org/T294841 (10jbond) `Facter[:networking].flush` used in `base/lib/facter/interface_primary.rb` has the following error ` Error: Facter: error while resolving cu... [15:08:54] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster [15:08:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:03] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster [15:09:11] Any deployer will be available on the " [15:09:12] UTC late backport window" of today? If not, would anyone make it? [15:09:20] Any deployer will be available on the "UTC late backport window" of today? If not, would anyone make it? [15:10:03] jouncebot: next [15:10:03] In 1 hour(s) and 49 minute(s): Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1700) [15:10:39] Juan_90264: you're here rather early for that window [15:10:59] In the "UTC evening backport window" I will not be available [15:11:37] majavah: [15:13:09] (03CR) 10David Caro: [C: 03+2] pcc: increase timeout [puppet] - 10https://gerrit.wikimedia.org/r/737678 (owner: 10David Caro) [15:15:12] (03CR) 10David Caro: [V: 03+1 C: 03+2] "PCC looks good, there's some instability on puppetdb side that makes some hosts fail, but different runs show a correct output." [puppet] - 10https://gerrit.wikimedia.org/r/737675 (owner: 10David Caro) [15:15:42] (03CR) 10Giuseppe Lavagetto: [C: 03+2] admin: upgrade my bash functions a bit [puppet] - 10https://gerrit.wikimedia.org/r/737720 (owner: 10Giuseppe Lavagetto) [15:18:11] Juan_90264: if you are not around, you should reschedule your patch [15:18:54] Why b:en [15:19:46] RhinosF1: I will be available in "UTC late", but at this time I hardly find a deployer available, so I ask to see if anyone can be available [15:19:58] Juan_90264: maybe try the evening window [15:20:12] It used to be RoanKattouw that did the late window [15:20:25] Oh you can't do evening [15:21:47] (03PS2) 10Majavah: hieradata: add cloud-cumin04 [puppet] - 10https://gerrit.wikimedia.org/r/737709 [15:24:05] (03PS5) 10Ayounsi: Advertise drmrs from esams [homer/public] - 10https://gerrit.wikimedia.org/r/737395 (https://phabricator.wikimedia.org/T283050) [15:25:30] (03CR) 10Ayounsi: [C: 03+2] Advertise drmrs from esams [homer/public] - 10https://gerrit.wikimedia.org/r/737395 (https://phabricator.wikimedia.org/T283050) (owner: 10Ayounsi) [15:33:19] (03PS4) 10Giuseppe Lavagetto: profile::phabricator::main: update usage of php::extension [puppet] - 10https://gerrit.wikimedia.org/r/737632 [15:34:51] (03PS1) 10Vgutierrez: cache:haproxy: Fix OCSP symlink [puppet] - 10https://gerrit.wikimedia.org/r/737728 (https://phabricator.wikimedia.org/T290005) [15:35:21] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32277/console" [puppet] - 10https://gerrit.wikimedia.org/r/737632 (owner: 10Giuseppe Lavagetto) [15:37:50] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] "pcc says we're good to go." [puppet] - 10https://gerrit.wikimedia.org/r/737632 (owner: 10Giuseppe Lavagetto) [15:38:00] (03CR) 10Vgutierrez: [C: 03+2] cache:haproxy: Fix OCSP symlink [puppet] - 10https://gerrit.wikimedia.org/r/737728 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [15:40:11] (03PS1) 10David Caro: ceph::auth: Deploy on virt and virt_ceph_and_backy hosts too [puppet] - 10https://gerrit.wikimedia.org/r/737729 [15:42:14] (03PS1) 10Jbond: P:kafka::broker: request a server profile cert [puppet] - 10https://gerrit.wikimedia.org/r/737730 [15:43:30] (03PS12) 10Giuseppe Lavagetto: profile::mediawiki::php: Allow running multiple php versions in parallel [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) [15:43:32] (03PS4) 10Giuseppe Lavagetto: php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 [15:43:34] (03PS10) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [15:43:36] (03PS6) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [15:43:52] (03PS7) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [15:44:12] (03CR) 10Elukey: [C: 03+2] P:kafka::broker: request a server profile cert [puppet] - 10https://gerrit.wikimedia.org/r/737730 (owner: 10Jbond) [15:49:09] !log mmandere@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6004.drmrs.wmnet with OS buster [15:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:18] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster executed with errors: - gan... [15:49:51] (03CR) 10JMeybohm: [C: 03+1] tegola-vector-tiles: Configure pregeneration retries (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [15:54:04] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-test1008 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=test-eqiad&var-kafka_broker=kafka-test1008 [15:55:37] (03CR) 10Jgiannelos: [C: 03+2] tegola-vector-tiles: Configure pregeneration retries [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [15:56:45] (03PS1) 10Ema: varnish: remove support for mtail_additional_args [puppet] - 10https://gerrit.wikimedia.org/r/737734 (https://phabricator.wikimedia.org/T293879) [15:56:47] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Replace global with parent scope [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737195 (owner: 10Awight) [15:58:04] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Avoid error suppression [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737192 (owner: 10Awight) [15:59:48] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Don't need to keep all config in memory (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737189 (owner: 10Awight) [15:59:58] (03PS1) 10Volans: administrative: add examples to the documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/737735 [16:00:10] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Anchor relative import [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737187 (owner: 10Awight) [16:00:37] (03Merged) 10jenkins-bot: tegola-vector-tiles: Configure pregeneration retries [deployment-charts] - 10https://gerrit.wikimedia.org/r/737665 (https://phabricator.wikimedia.org/T295290) (owner: 10Jgiannelos) [16:01:30] 10SRE, 10Observability-Metrics, 10Traffic, 10Patch-For-Review, 10User-ema: varnishmtail metric loss due to mtail not reading from pipe fast enough - https://phabricator.wikimedia.org/T293879 (10colewhite) [16:02:37] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::php: Allow running multiple php versions in parallel (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [16:03:12] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] ceph::auth: Deploy on virt and virt_ceph_and_backy hosts too [puppet] - 10https://gerrit.wikimedia.org/r/737729 (owner: 10David Caro) [16:03:25] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/32280/ shows this is a complete noop." [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [16:03:27] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737734 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [16:03:34] (03CR) 10Thiemo Kreuz (WMDE): Only load static configs once (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737188 (owner: 10Awight) [16:06:04] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] "PCC SUCCESS (NOOP 1 DIFF 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32281/console" [puppet] - 10https://gerrit.wikimedia.org/r/736948 (https://phabricator.wikimedia.org/T293450) (owner: 10Giuseppe Lavagetto) [16:07:25] !log jgiannelos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [16:07:26] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster [16:07:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:37] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster [16:07:49] (03PS2) 10Arturo Borrero Gonzalez: wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) [16:09:02] (03CR) 10Ema: [C: 03+2] varnish: remove support for mtail_additional_args [puppet] - 10https://gerrit.wikimedia.org/r/737734 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [16:10:20] (03CR) 10Thiemo Kreuz (WMDE): "I have not been able to validate that all code paths still do the same as before. But there is indeed some duplication going on." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737212 (owner: 10Awight) [16:10:29] (03CR) 10jerkins-bot: [V: 04-1] wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [16:10:39] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] add credentials file for downloading enterprise html dumps [puppet] - 10https://gerrit.wikimedia.org/r/736461 (https://phabricator.wikimedia.org/T273585) (owner: 10ArielGlenn) [16:11:14] (03CR) 10Bearloga: [C: 04-1] "Need to follow up with some teams to figure out what precisely is needed to get email alerts for failed executions." [puppet] - 10https://gerrit.wikimedia.org/r/736916 (https://phabricator.wikimedia.org/T291957) (owner: 10Bearloga) [16:11:56] (03PS2) 10Volans: administrative: add examples to the documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/737735 [16:12:08] !log jgiannelos@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [16:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:28] (03PS1) 10Jbond: fix netbase [puppet] - 10https://gerrit.wikimedia.org/r/737736 [16:13:23] (03CR) 10jerkins-bot: [V: 04-1] fix netbase [puppet] - 10https://gerrit.wikimedia.org/r/737736 (owner: 10Jbond) [16:16:21] !log jgiannelos@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [16:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:18] (03PS8) 10Ema: varnish: add varnish::logging::mtail [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) [16:24:09] (03PS3) 10Arturo Borrero Gonzalez: wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) [16:27:15] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/737424 (https://phabricator.wikimedia.org/T293879) (owner: 10Ema) [16:28:33] (03PS1) 10Razzi: superset: Make sqllab timeout 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/737738 (https://phabricator.wikimedia.org/T294771) [16:29:13] 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetlabs: create puppet 7 environment in WMCS to test code - https://phabricator.wikimedia.org/T294841 (10jbond) We also get the following error from `ipresolve` Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Internal Server... [16:31:12] (03PS1) 10Vgutierrez: cache:haproxy: Monitor HTTPS port [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) [16:31:58] (03CR) 10Btullis: "Looks fine to me." [puppet] - 10https://gerrit.wikimedia.org/r/737738 (https://phabricator.wikimedia.org/T294771) (owner: 10Razzi) [16:32:49] (03CR) 10Razzi: presto: enable ui over http (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [16:33:30] (03CR) 10Btullis: [C: 03+1] superset: Make sqllab timeout 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/737738 (https://phabricator.wikimedia.org/T294771) (owner: 10Razzi) [16:36:49] (03CR) 10David Caro: [C: 03+2] "PCC still flaky, but shows enough: https://puppet-compiler.wmflabs.org/compiler1002/32279/" [puppet] - 10https://gerrit.wikimedia.org/r/737729 (owner: 10David Caro) [16:37:00] (03PS1) 10Majavah: dynamicproxy: Add a switch to make the api read-only [puppet] - 10https://gerrit.wikimedia.org/r/737740 [16:38:06] !log mmandere@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6004.drmrs.wmnet with OS buster [16:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:17] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster executed with errors: - gan... [16:39:57] (03PS2) 10Vgutierrez: cache:haproxy: Monitor HTTPS port [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) [16:40:03] (03PS4) 10David Caro: Show the directory where fact files are searched on error [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/736779 [16:41:20] (03PS1) 10Arturo Borrero Gonzalez: openstack: monitor: cmd-checklist-runner: exit with a different return code [puppet] - 10https://gerrit.wikimedia.org/r/737741 (https://phabricator.wikimedia.org/T294955) [16:41:23] (03CR) 10Razzi: [C: 03+2] superset: Make sqllab timeout 3 minutes [puppet] - 10https://gerrit.wikimedia.org/r/737738 (https://phabricator.wikimedia.org/T294771) (owner: 10Razzi) [16:41:29] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS buster [16:41:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:42] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster [16:42:00] (03PS3) 10Vgutierrez: cache:haproxy: Monitor HTTPS port [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) [16:42:28] (03PS1) 10David Caro: ceph::auth: enable loading the keys on all clusters/dcs [puppet] - 10https://gerrit.wikimedia.org/r/737742 (https://phabricator.wikimedia.org/T293752) [16:42:34] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: monitor: cmd-checklist-runner: exit with a different return code [puppet] - 10https://gerrit.wikimedia.org/r/737741 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [16:43:30] (03CR) 10Mepps: [C: 04-1] "Just need to delete the description key." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [16:44:36] (03CR) 10Mepps: [C: 04-1] "Ignore the accidental double comment :)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [16:44:43] (03PS5) 10Giuseppe Lavagetto: php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 [16:44:45] (03PS11) 10Giuseppe Lavagetto: mediawiki::php: support multiple php version in monitoring too [puppet] - 10https://gerrit.wikimedia.org/r/736949 (https://phabricator.wikimedia.org/T293450) [16:44:47] (03PS7) 10Giuseppe Lavagetto: mediawiki: add support for multiple versions in the web configuration [puppet] - 10https://gerrit.wikimedia.org/r/737330 (https://phabricator.wikimedia.org/T293450) [16:44:52] (03PS3) 10Razzi: presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) [16:44:54] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 5 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32288/console" [puppet] - 10https://gerrit.wikimedia.org/r/737742 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [16:46:39] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/32287/" [puppet] - 10https://gerrit.wikimedia.org/r/736599 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:47:37] (03PS4) 10Vgutierrez: cache:haproxy: Monitor HTTPS port [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) [16:48:51] (03CR) 10Btullis: [C: 03+1] "Looks good." [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [16:49:11] (03CR) 10Razzi: [C: 03+2] presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [16:49:14] (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 11): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32290/console" [puppet] - 10https://gerrit.wikimedia.org/r/737633 (owner: 10Giuseppe Lavagetto) [16:49:16] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-test1009 is CRITICAL: 12 ge 10 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=test-eqiad&var-kafka_broker=kafka-test1009 [16:49:21] (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32291/console" [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [16:50:54] !log snapshot* - disabling puppet - converting some crons [16:50:54] (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] cache:haproxy: Monitor HTTPS port [puppet] - 10https://gerrit.wikimedia.org/r/737739 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez) [16:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:18] (03CR) 10Dzahn: [V: 03+1 C: 03+2] snapshot: convert 2 crons for full and partial dumps into systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/736599 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [16:51:42] vgutierrez: collision, it's ok to merge multiple from my side [16:51:46] ok [16:51:49] :) [16:51:50] merging 88dd65af39 along my change then [16:51:57] ack, thx [16:52:10] done [16:52:25] I had puppet disabled either way. cool [16:52:48] (03CR) 10David Caro: [V: 03+1 C: 03+2] "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/32288/" [puppet] - 10https://gerrit.wikimedia.org/r/737742 (https://phabricator.wikimedia.org/T293752) (owner: 10David Caro) [16:54:52] (03PS1) 10Jbond: P:pki::multiroot: update kafka CA default profile [puppet] - 10https://gerrit.wikimedia.org/r/737745 [16:54:54] (03PS1) 10Jbond: P:kafaka::broker: switch back to default profile [puppet] - 10https://gerrit.wikimedia.org/r/737746 [16:55:14] (03CR) 10Jbond: [C: 03+2] P:pki::multiroot: update kafka CA default profile [puppet] - 10https://gerrit.wikimedia.org/r/737745 (owner: 10Jbond) [16:55:33] !log mmandere@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti6004.drmrs.wmnet with OS buster [16:55:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:42] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host ganeti6004.drmrs.wmnet with OS buster executed with errors: - gan... [16:58:07] (03CR) 10Giuseppe Lavagetto: [V: 03+1 C: 03+2] php::extension: remove "package_name" [puppet] - 10https://gerrit.wikimedia.org/r/737633 (owner: 10Giuseppe Lavagetto) [16:58:54] (03CR) 10Elukey: [C: 03+1] "one nit" [puppet] - 10https://gerrit.wikimedia.org/r/737746 (owner: 10Jbond) [16:59:21] (03PS2) 10Jbond: P:kafka::broker: switch back to default profile [puppet] - 10https://gerrit.wikimedia.org/r/737746 [16:59:26] (03CR) 10Jbond: [V: 03+2 C: 03+2] P:kafka::broker: switch back to default profile [puppet] - 10https://gerrit.wikimedia.org/r/737746 (owner: 10Jbond) [17:00:04] jbond and rzl: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Puppet request window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1700). [17:00:05] dbrant: A patch you scheduled for Puppet request window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:01:18] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:04:37] (03PS3) 10Bearloga: statistics::product_analytics: Update contact group for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/736916 (https://phabricator.wikimedia.org/T295381) [17:06:03] present [17:06:31] (03CR) 10Jbond: "have added denial back to the CR as i would prefer someone from service ops to give the official +1, however happy to merge once done (no " [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) (owner: 10Dbrant) [17:06:45] (03PS1) 10Jgreen: replace jgreen's ssh RSA pubkey with an ECDSA one [puppet] - 10https://gerrit.wikimedia.org/r/737749 [17:08:03] !log mmandere@cumin1001 START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster [17:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:08:12] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host dns6001.wikimedia.org with OS buster [17:09:09] dbrant, jbond: I'm heading afk in a minute but I can merge that one real quick first :) [17:10:21] (03CR) 10Jhernandez: Set up beta test environment for QuickSurvey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [17:10:35] I'll merge on mwdebug1001 first and test via httpbb -- dbrant will you want a minute to test with the mwdebug extension? [17:11:04] rzl: sure, let's do that [17:11:12] cool, stand by [17:11:35] (03CR) 10Jgreen: "Can you cross-check this so it is known that the commit is legit and not malfeasance? Kthx!" [puppet] - 10https://gerrit.wikimedia.org/r/737749 (owner: 10Jgreen) [17:12:35] dbrant: ah, one sec -- you added the httpbb test for en.wikipedia.org, was that intended? [17:12:57] if I understand the commit message right you wanted that behavior on wikipedia.org directly [17:13:09] yea, that should be in the docroot of the "naked domain" [17:13:19] ah yes, that is true [17:14:40] I copied the logic for the Apple site association file, which technically should also be tested under the base domain. [17:16:08] that's right, we can fix that test separately [17:16:25] Shall I update the httpbb file with a new section for the base domain? (I'm not seeing an existing section) [17:16:26] (03PS1) 10Elukey: Revert "Enable PKI-based TLS certificate for kafka-test1006" [puppet] - 10https://gerrit.wikimedia.org/r/737689 [17:16:50] (03CR) 10Andrew Bogott: "lgtm. Could/should this be three patches, one that lints, one that puts things behind nginx, and one that adds the r/o switch?" [puppet] - 10https://gerrit.wikimedia.org/r/737740 (owner: 10Majavah) [17:16:52] (03CR) 10Ottomata: [C: 03+1] presto: enable ui over http [puppet] - 10https://gerrit.wikimedia.org/r/736503 (https://phabricator.wikimedia.org/T292087) (owner: 10Razzi) [17:17:04] dbrant: yes, maybe a the very top before all others? [17:17:31] I can move the Apple part if you want [17:17:42] yeah, order isn't significant but I agree that's clearest -- and I agree let's fix the apple test in a separate patch [17:18:01] yeah Apple can be done separately, I'll confirm later with the iOS team about that [17:18:34] thanks rzl [17:18:45] mutante / jbond: I'm taking a sick day and need to get back out of here and go lie down :) can I ask you to finish this up? [17:19:14] rzl: ack no problem take care [17:19:15] it should just be puppet-merge, then run-puppet-agent on mwdebug1001 and cumin1001 to update the tests, then run httpbb from cumin1001 against mwdebug1001 and test manually [17:19:23] yes, just one question, how did you deploy only to mwdebug [17:19:29] (03PS3) 10Dbrant: Create alias for Android site association file. [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) [17:19:30] ok [17:19:41] you could stop puppet on the other appservers first but I wasn't going to bother in this case [17:19:48] (03CR) 10Majavah: dynamicproxy: Add a switch to make the api read-only (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/737740 (owner: 10Majavah) [17:20:08] (03CR) 10Razzi: Reduce verbosity of the log commit message (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 (owner: 10Elukey) [17:20:43] (03CR) 10Andrew Bogott: [C: 03+1] "ok :)" [puppet] - 10https://gerrit.wikimedia.org/r/737740 (owner: 10Majavah) [17:21:01] I trust it enough even though it's a RewriteRule because it doesn't have any wildcards :P [17:21:24] * mutante remembers the day we merged a redirect loop and it was cached [17:21:39] :D [17:21:59] (03CR) 10Elukey: Reduce verbosity of the log commit message (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 (owner: 10Elukey) [17:22:00] alright, i've updated the test [17:22:16] cant just revert that easily, but yea :) [17:22:27] (03CR) 10Elukey: [C: 03+2] Revert "Enable PKI-based TLS certificate for kafka-test1006" [puppet] - 10https://gerrit.wikimedia.org/r/737689 (owner: 10Elukey) [17:23:15] i'm gonna be paranoid and disable puppet on mw*, sec [17:25:27] (03CR) 10Dzahn: [C: 03+2] Create alias for Android site association file. [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) (owner: 10Dbrant) [17:25:52] elukey: merging both since it says "test" in yours :) [17:26:08] right [17:26:11] mutante: thanks! [17:26:25] done [17:27:22] (03CR) 10Andrew Bogott: [C: 03+2] dynamicproxy: Add a switch to make the api read-only [puppet] - 10https://gerrit.wikimedia.org/r/737740 (owner: 10Majavah) [17:27:35] dbrant: test updated on cumin, deploying config change on mwdebug1001 [17:28:19] dbrant: deployed ^ [17:28:40] mutante: confirming... [17:29:26] mutante: and... works! [17:29:28] RECOVERY - Kafka Broker Under Replicated Partitions on kafka-test1009 is OK: (C)10 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/Kafka/Administration https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=test-eqiad&var-kafka_broker=kafka-test1009 [17:29:34] (03CR) 10Jelto: [C: 03+1] "beside the nits about the image name lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/737169 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [17:30:14] figuring out test syntax [17:30:15] [cumin1001:~] $ httpbb /srv/deployment/httpbb-tests/appserver/test_main* --hosts mwdebug1001.eqiad.wmnet [17:30:20] PASS: 44 requests sent to mwdebug1001.eqiad.wmnet. All assertions passed. [17:30:26] lgtm [17:30:29] (03PS1) 10Volans: sre.hosts.dhcp: add new cookbook to setup DHCP [cookbooks] - 10https://gerrit.wikimedia.org/r/737753 [17:30:50] (03CR) 10Dzahn: "< mutante> [cumin1001:~] $ httpbb /srv/deployment/httpbb-tests/appserver/test_main* --hosts mwdebug1001.eqiad.wmnet" [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) (owner: 10Dbrant) [17:31:57] dbrant: and some other random wiki page is also still working, right?:) with the rewrites best to also test if "other stuff isn't broken" [17:32:26] mutante: yep! checked a few other urls, too [17:32:55] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:05] (03CR) 10Razzi: [C: 03+1] Reduce verbosity of the log commit message (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/737706 (owner: 10Elukey) [17:33:37] :) thanks, sorry, extra paranoid with these because once we had exactly this "yes, it redirects" just that it redirected everything else too, hehe [17:33:44] re-enabling inactive DC [17:35:44] not at all; I would be extra careful, too [17:36:11] rewrite merge is how I earned the t-shirt [17:36:21] well, the fix part was traffic afair [17:36:25] nice! [17:36:27] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:06] !log mmandere@cumin1001 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns6001.wikimedia.org with OS buster [17:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:15] 10SRE, 10Traffic, 10Patch-For-Review: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host dns6001.wikimedia.org with OS buster executed with errors: - dns6... [17:38:26] tested on mw2251, codfw prod, from the other cumin host which can reach it, also passes all 44 assertions after test was updated. all good [17:43:18] well, now I am suddenly seeing an issue with a timeout to mw2251 [17:44:26] and now it works again [17:44:27] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:44:48] (03CR) 10Jhernandez: [C: 04-1] Set up beta test environment for QuickSurvey (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [17:46:30] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:47:45] (03CR) 10Jhernandez: [C: 04-1] Set up beta test environment for QuickSurvey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737503 (https://phabricator.wikimedia.org/T293798) (owner: 10EllenR) [17:51:28] re-enabled puppet on all canary mw hosts [17:52:15] (03PS1) 10Vgutierrez: varnish: Mimick XFF behaviour with UDS + PROXY protocol [puppet] - 10https://gerrit.wikimedia.org/r/737755 (https://phabricator.wikimedia.org/T290005) [17:54:10] PROBLEM - Disk space on webperf1002 is CRITICAL: DISK CRITICAL - free space: /srv 11455 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [17:55:50] !log re-enabled puppet on mw* after deploying and testing gerrit:736595 on canary [17:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:09] jbond: was paranoid, disabled puppet on *, tested on random canary and from cumin2001, codfw, finally re-enabled on mw* now.. letting puppet roll out the refresh [17:57:10] PROBLEM - Disk space on webperf2002 is CRITICAL: DISK CRITICAL - free space: /srv 11240 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf2002&var-datasource=codfw+prometheus/ops [17:57:23] but still keeping an eye because it does refresh apache on everything [17:57:53] mutante: great thanks [17:58:19] ACK, it did pass all 44 assertions (before 43) [17:58:32] cool [17:58:48] not forcing puppet run on all.. just reenabling and be back in a while [17:59:02] yes i think that makes the most senses [17:59:13] *nod* [18:00:05] chrisalbon and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1800). [18:00:14] (03PS1) 10Arturo Borrero Gonzalez: Revert "openstack: monitor: cmd-checklist-runner: exit with a different return code" [puppet] - 10https://gerrit.wikimedia.org/r/737690 [18:00:53] (03PS1) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:02:03] (03CR) 10Volans: "This should help debugging the drmrs issues and in general be useful for any specific manual debugging. Once we get it working I'll factor" [cookbooks] - 10https://gerrit.wikimedia.org/r/737753 (owner: 10Volans) [18:02:08] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:03:05] (03PS2) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:03:59] (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 (owner: 10Jbond) [18:05:12] (03CR) 10Dzahn: "disabled puppet on mw*, re-enabled on canaries, then codfw, then finally eqiad. tested from both cumin1001 and cumin2001 with updated test" [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) (owner: 10Dbrant) [18:06:42] (03CR) 10Dwisehaupt: [C: 03+1] "Verified with jgreen on a secure channel and verified the fingerprint matches. Approved by me." [puppet] - 10https://gerrit.wikimedia.org/r/737749 (owner: 10Jgreen) [18:08:56] (03PS3) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:09:37] (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 (owner: 10Jbond) [18:09:41] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32294/console" [puppet] - 10https://gerrit.wikimedia.org/r/737756 (owner: 10Jbond) [18:12:26] (03PS4) 10Arturo Borrero Gonzalez: wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) [18:12:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Revert "openstack: monitor: cmd-checklist-runner: exit with a different return code" [puppet] - 10https://gerrit.wikimedia.org/r/737690 (owner: 10Arturo Borrero Gonzalez) [18:13:20] (03PS4) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:14:00] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32295/console" [puppet] - 10https://gerrit.wikimedia.org/r/737756 (owner: 10Jbond) [18:15:32] (03CR) 10Jgreen: [C: 03+2] replace jgreen's ssh RSA pubkey with an ECDSA one [puppet] - 10https://gerrit.wikimedia.org/r/737749 (owner: 10Jgreen) [18:15:41] (03CR) 10jerkins-bot: [V: 04-1] wmcs: add openstack network tests cookbook [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/737667 (https://phabricator.wikimedia.org/T294955) (owner: 10Arturo Borrero Gonzalez) [18:15:52] (03PS5) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:17:16] (03PS6) 10Jbond: P:base::certificates: refactor jks trustore [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:20:16] Can somebody do a puppet-merge for me? I seem to be stuck at 2FA for the bastions. [18:21:22] (03PS7) 10Jbond: P:base::certificates: refactor jks trust-store [puppet] - 10https://gerrit.wikimedia.org/r/737756 [18:22:02] mutante, jbond: ^ [18:22:27] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32296/console" [puppet] - 10https://gerrit.wikimedia.org/r/737756 (owner: 10Jbond) [18:23:52] RhinosF1: not right now, in a meeting, in a couple minutes [18:26:51] Jeff_Green: want to do a quick hangout? [18:26:55] just because it's a new ssh key... [18:26:56] sure [18:27:02] sounds good. logging in [18:27:40] pming you a link [18:30:38] (03PS1) 10MSantos: mobileapps: bump to 2021-11-01-203631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/737759 [18:31:32] PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:34:21] (03CR) 10Jbond: [C: 03+1] administrative: add examples to the documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/737735 (owner: 10Volans) [18:36:12] PROBLEM - Disk space on webperf1002 is CRITICAL: DISK CRITICAL - free space: /srv 11206 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [18:36:35] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:03] (03CR) 10Jbond: [C: 03+1] sre.hosts.dhcp: add new cookbook to setup DHCP [cookbooks] - 10https://gerrit.wikimedia.org/r/737753 (owner: 10Volans) [18:38:59] (03CR) 10MSantos: [C: 03+2] mobileapps: bump to 2021-11-01-203631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/737759 (owner: 10MSantos) [18:39:10] PROBLEM - Disk space on webperf2002 is CRITICAL: DISK CRITICAL - free space: /srv 10926 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf2002&var-datasource=codfw+prometheus/ops [18:40:01] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:40:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:27] RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 23669 bytes in 0.248 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:44:56] (03Merged) 10jenkins-bot: mobileapps: bump to 2021-11-01-203631-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/737759 (owner: 10MSantos) [18:45:19] !log mbsantos@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [18:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:57] (03PS3) 10JMeybohm: Add cfssl-issuer and cfssl-issuer-crds chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/737169 (https://phabricator.wikimedia.org/T294560) [18:47:42] (03CR) 10JMeybohm: Add cfssl-issuer and cfssl-issuer-crds chart (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/737169 (https://phabricator.wikimedia.org/T294560) (owner: 10JMeybohm) [18:48:34] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:50:22] So, I've deployed mobileapps to the staging environment but it has no pods. Is that expected? [18:50:34] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:50:49] No status failure has been raised during the deployment [18:52:26] (03PS1) 10Ottomata: Add gitlab support for scap_source [puppet] - 10https://gerrit.wikimedia.org/r/737764 (https://phabricator.wikimedia.org/T295380) [18:52:51] (03PS1) 10BBlack: Update bblack ssh key [homer/public] - 10https://gerrit.wikimedia.org/r/737765 [18:57:20] dbrant: https://wikipedia.org/.well-known/assetlinks.json [18:57:53] (03CR) 10Dzahn: "deployed: https://wikipedia.org/.well-known/assetlinks.json" [puppet] - 10https://gerrit.wikimedia.org/r/736595 (https://phabricator.wikimedia.org/T294776) (owner: 10Dbrant) [18:59:57] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:59:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:02] mutante: i'm still seeing a 301 to www.wikipedia.org [19:00:05] RoanKattouw and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy UTC evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1900). [19:00:05] No Gerrit patches in the queue for this window AFAICS. [19:00:44] some cache need purging? [19:01:12] RECOVERY - Check systemd state on cumin2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:01:55] dbrant: just had to go into a 1:1 meeting, yes, maybe, can ping traffic or you can if you want, be back soon [19:03:26] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:31] thx, will ping traffic [19:04:35] (03CR) 10Ottomata: "Not sure who to add as reviewer, but you all are in the git blame for these files sooooo :)" [puppet] - 10https://gerrit.wikimedia.org/r/737764 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [19:11:40] !log echo "https://wikipedia.org/.well-known/assetlinks.json" | mwscript purgeList.php enwiki [19:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:45] mutante: all done, thanks again! (and thx Reedy) [19:26:23] (03PS1) 10Ottomata: [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [19:27:18] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [19:29:36] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32297/console" [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [19:37:06] (03PS1) 10Jdrewniak: Add mobile logo and wordmark for metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737771 (https://phabricator.wikimedia.org/T295303) [19:46:31] (03PS2) 10Ladsgroup: Disable DPL on Wikibooks where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734421 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:47:12] jouncebot: nowandnext [19:47:12] For the next 0 hour(s) and 12 minute(s): UTC evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211109T1900) [19:47:12] In 4 hour(s) and 12 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211110T0000) [19:47:26] coool [19:47:29] (03CR) 10Ladsgroup: [C: 03+2] Disable DPL on Wikibooks where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734421 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:48:16] (03Merged) 10jenkins-bot: Disable DPL on Wikibooks where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734421 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:50:01] !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734421|Disable DPL on Wikibooks where not in use (T287916)]] (duration: 00m 56s) [19:50:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:50:04] T287916: Disable DPL on wikis that aren't using it - https://phabricator.wikimedia.org/T287916 [19:51:46] 10SRE, 10DBA, 10cloud-services-team (Kanban): db1112 (s3 contribs/rc replica) is down - https://phabricator.wikimedia.org/T294295 (10Cmjohnson) @Marostegui The DIMM arrived, let me know if I can take this server down anytime or if it needs to be scheduled [19:53:01] (03PS2) 10Ladsgroup: Disable DPL on Wikinews where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734422 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:53:12] (03CR) 10Ladsgroup: [C: 03+2] Disable DPL on Wikinews where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734422 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:53:34] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:21] (03Merged) 10jenkins-bot: Disable DPL on Wikinews where not in use [mediawiki-config] - 10https://gerrit.wikimedia.org/r/734422 (https://phabricator.wikimedia.org/T287916) (owner: 10Legoktm) [19:55:50] !log ladsgroup@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734422|Disable DPL on Wikinews where not in use (T287916)]] (duration: 00m 57s) [19:55:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:54] T287916: Disable DPL on wikis that aren't using it - https://phabricator.wikimedia.org/T287916 [19:57:11] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:05] (03PS1) 10Jdrewniak: Add mobile wordmark for foundation-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737773 (https://phabricator.wikimedia.org/T295303) [20:04:00] (03CR) 10jerkins-bot: [V: 04-1] Add mobile wordmark for foundation-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737773 (https://phabricator.wikimedia.org/T295303) (owner: 10Jdrewniak) [20:06:03] dbrant: I think it will just solve itself if we wait. so if it's not urgent right now we could just do that [20:06:43] (03PS1) 10Jbond: P:openstack::base::cloudgw: drop uneeded profiles [puppet] - 10https://gerrit.wikimedia.org/r/737774 [20:06:43] mutante: oh it's already done [20:06:59] Reedy did magic [20:07:08] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [20:07:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:26] (03PS2) 10Jdrewniak: Add mobile wordmark for foundation-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737773 (https://phabricator.wikimedia.org/T295303) [20:08:04] (03PS2) 10Ottomata: [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [20:08:08] oh:) very nice, I was gone for an hour and "Reedy fixed it" is a great solution [20:08:53] (03PS1) 10Ladsgroup: Increase logging level of DBPerformance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737775 [20:08:57] if you leave anything broken long enough, Reedy will fix it eventually >.> [20:09:07] Very true [20:09:10] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [20:10:50] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [20:10:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:37] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32299/console" [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [20:17:08] (03PS2) 10Jbond: P:openstack::base::cloudgw: drop uneeded profiles [puppet] - 10https://gerrit.wikimedia.org/r/737774 [20:17:55] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32300/console" [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [20:20:06] (03PS3) 10Ottomata: [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [20:21:01] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow/data_eng scap source and target for airflow analytics instance. [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [20:42:48] (03PS1) 10Ottomata: Add dummy ssh keypair for deploy_airflow keyholder agent [labs/private] - 10https://gerrit.wikimedia.org/r/737783 (https://phabricator.wikimedia.org/T295380) [20:44:23] (03PS2) 10Ottomata: Add dummy ssh keypair for deploy_airflow keyholder agent [labs/private] - 10https://gerrit.wikimedia.org/r/737783 (https://phabricator.wikimedia.org/T295380) [20:44:35] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add dummy ssh keypair for deploy_airflow keyholder agent [labs/private] - 10https://gerrit.wikimedia.org/r/737783 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [20:48:16] (03PS3) 10Jbond: P:openstack::base::cloudgw: drop uneeded profiles [puppet] - 10https://gerrit.wikimedia.org/r/737774 [20:49:01] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32302/console" [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [20:51:29] (03PS9) 10Ideophagous: Bug:T291737 Squashed two commits into one, previous commit comments follow: Bug:T291737 Change-Id: Ib263a5419c6ace911a597d025b28d6ef13549c10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735713 [20:52:19] (03PS4) 10Ottomata: [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [20:54:09] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [20:56:05] (03PS2) 10Dzahn: snapshop: remove absented cron code [puppet] - 10https://gerrit.wikimedia.org/r/736600 (https://phabricator.wikimedia.org/T273673) [20:56:26] (03CR) 10Dzahn: [C: 03+2] "it has taken effect now" [puppet] - 10https://gerrit.wikimedia.org/r/736600 (https://phabricator.wikimedia.org/T273673) (owner: 10Dzahn) [20:59:56] (03CR) 10Dzahn: "wut..we did not have this? thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/737533 (owner: 10Legoktm) [21:00:11] (03CR) 10Dzahn: [C: 03+2] planet: Add Tyler Cipriani's blog to en [puppet] - 10https://gerrit.wikimedia.org/r/737533 (owner: 10Legoktm) [21:11:09] (03PS5) 10Ottomata: [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [21:12:12] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [21:12:17] (03PS4) 10Jbond: P:openstack::base::cloudgw: drop uneeded profiles [puppet] - 10https://gerrit.wikimedia.org/r/737774 [21:14:22] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32305/console" [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [21:14:29] (03PS5) 10Jbond: P:openstack::base::cloudgw: drop uneeded profiles [puppet] - 10https://gerrit.wikimedia.org/r/737774 [21:15:11] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32306/console" [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [21:17:07] (03PS1) 10Ottomata: Fix snakeoil name in deploy_airflow.pub keyholder [labs/private] - 10https://gerrit.wikimedia.org/r/737789 [21:17:15] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix snakeoil name in deploy_airflow.pub keyholder [labs/private] - 10https://gerrit.wikimedia.org/r/737789 (owner: 10Ottomata) [21:18:02] (03CR) 10Jbond: [V: 03+1 C: 04-1] "this is a wip/poc as such self -1 to not merge" [puppet] - 10https://gerrit.wikimedia.org/r/737774 (owner: 10Jbond) [21:26:29] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Jclark-ctr) Host was still in rack d6 u7 verifed location and relocated to B1 U29 Port20 Cableid#1935 [21:27:32] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [21:37:24] (03CR) 10Volans: [C: 03+2] administrative: add examples to the documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/737735 (owner: 10Volans) [21:43:21] (03Merged) 10jenkins-bot: administrative: add examples to the documentation [software/spicerack] - 10https://gerrit.wikimedia.org/r/737735 (owner: 10Volans) [21:45:48] (03PS1) 10Jdlrobson: Set sampling rate for mobile click tracking to 100% on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737794 (https://phabricator.wikimedia.org/T294738) [21:47:31] (03PS3) 10Krinkle: multiversion: Factor dblist matching into separate method [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737210 (owner: 10Awight) [22:13:42] (03PS6) 10Ottomata: [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [22:14:31] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [22:24:25] (03PS7) 10Ottomata: [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) [22:25:09] (03CR) 10jerkins-bot: [V: 04-1] [WIP] declare airflow-dags/analytics scap source/target for airflow analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/737770 (https://phabricator.wikimedia.org/T295380) (owner: 10Ottomata) [22:25:47] (03CR) 10Awight: [C: 03+1] "PS 4 is an improvement, much appreciated!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737210 (owner: 10Awight) [22:27:28] (03PS1) 10Dzahn: mediawiki: remove mw font packages from mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/737798 [22:30:09] (03Abandoned) 10Dzahn: mediawiki: remove mw font packages from mwmaint servers [puppet] - 10https://gerrit.wikimedia.org/r/737798 (owner: 10Dzahn) [22:36:39] (03PS1) 10Dzahn: parsoid: remove mediawiki font packages from all parsoid servers [puppet] - 10https://gerrit.wikimedia.org/r/737800 (https://phabricator.wikimedia.org/T294378) [22:38:19] (03CR) 10Dzahn: "list of servers having them at f.e.: https://debmonitor.wikimedia.org/packages/fonts-vlgothic" [puppet] - 10https://gerrit.wikimedia.org/r/737800 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [22:38:55] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/32311/" [puppet] - 10https://gerrit.wikimedia.org/r/737800 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [22:43:13] (03CR) 10Awight: "Thanks for the review!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/737188 (owner: 10Awight) [22:46:47] 10SRE, 10Infrastructure-Foundations: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) [22:47:07] 10SRE, 10Infrastructure-Foundations: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) [22:48:42] 10SRE, 10Infrastructure-Foundations: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) This way it can take its time and won't touch existing production and we can also use it a little bit as an example project. [22:50:12] 10SRE, 10Infrastructure-Foundations, 10serviceops: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) [22:50:44] 10SRE, 10Znuny, 10serviceops: rename OTRS role/module/cumin aliases - https://phabricator.wikimedia.org/T293942 (10Dzahn) [22:50:53] 10SRE, 10Infrastructure-Foundations, 10Znuny, 10serviceops: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) [23:13:44] PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1253.92 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:19:08] PROBLEM - MariaDB Replica IO: s6 on db2141 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2026, Errmsg: error reconnecting to master repl@db2129.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: SSL connection error00000000:lib(0):func(0):reason(0) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:19:15] (03PS1) 10Brennen Bearnes: gitlab runners: define an allowlist for images [puppet] - 10https://gerrit.wikimedia.org/r/737801 (https://phabricator.wikimedia.org/T291978) [23:21:10] RECOVERY - MariaDB Replica IO: s6 on db2141 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:35:17] PROBLEM - MariaDB Replica IO: s6 on db2141 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2013, Errmsg: error reconnecting to master repl@db2129.codfw.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Lost connection to MySQL server at reading authorization packet, system error: 71 Protocol error https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [23:41:22] RECOVERY - MariaDB Replica IO: s6 on db2141 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica