[00:01:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T322618)', diff saved to https://phabricator.wikimedia.org/P39144 and previous config saved to /var/cache/conftool/dbconfig/20221111-000118-ladsgroup.json
[00:01:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
[00:01:23] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[00:01:34] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
[00:01:35] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[00:02:00] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
[00:02:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P39145 and previous config saved to /var/cache/conftool/dbconfig/20221111-000206-ladsgroup.json
[00:04:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P39146 and previous config saved to /var/cache/conftool/dbconfig/20221111-000425-ladsgroup.json
[00:10:56] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321123)', diff saved to https://phabricator.wikimedia.org/P39147 and previous config saved to /var/cache/conftool/dbconfig/20221111-001056-marostegui.json
[00:10:58] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
[00:11:01] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[00:11:22] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
[00:11:26] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
[00:11:50] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
[00:11:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2145 (T321123)', diff saved to https://phabricator.wikimedia.org/P39148 and previous config saved to /var/cache/conftool/dbconfig/20221111-001156-marostegui.json
[00:14:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321123)', diff saved to https://phabricator.wikimedia.org/P39149 and previous config saved to /var/cache/conftool/dbconfig/20221111-001406-marostegui.json
[00:15:43] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10serviceops-radar: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Jclark-ctr)
[00:16:03] <wikibugs>	 10SRE, 10ops-eqiad, 10decommission-hardware, 10serviceops-radar: Decommission wtp10[25-48].eqiad.wmnet - https://phabricator.wikimedia.org/T317025 (10Jclark-ctr) 05In progress→03Resolved
[00:19:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P39150 and previous config saved to /var/cache/conftool/dbconfig/20221111-001932-ladsgroup.json
[00:29:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P39151 and previous config saved to /var/cache/conftool/dbconfig/20221111-002913-marostegui.json
[00:31:22] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[00:31:36] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
[00:31:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39152 and previous config saved to /var/cache/conftool/dbconfig/20221111-003141-ladsgroup.json
[00:31:46] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[00:34:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P39153 and previous config saved to /var/cache/conftool/dbconfig/20221111-003438-ladsgroup.json
[00:38:36] <logmsgbot>	 !log jclark@cumin1001 START - Cookbook sre.hosts.provision for host dbprov1004.mgmt.eqiad.wmnet with reboot policy FORCED
[00:38:37] <wikibugs>	 (03PS8) 10Andrew Bogott: wmcs: add socks proxy support to wmcs cookbooks [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/852960 (https://phabricator.wikimedia.org/T319426) (owner: 10David Caro)
[00:38:39] <wikibugs>	 (03PS8) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[00:41:47] <wikibugs>	 (03CR) 10Andrew Bogott: Add cookbook to restart openstack services (032 comments) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[00:41:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[00:42:18] <logmsgbot>	 !log jclark@cumin1001 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbprov1004.mgmt.eqiad.wmnet with reboot policy FORCED
[00:43:19] <logmsgbot>	 !log jclark@cumin1001 START - Cookbook sre.hosts.provision for host dbprov1004.mgmt.eqiad.wmnet with reboot policy FORCED
[00:43:51] <logmsgbot>	 !log jclark@cumin1001 END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbprov1004.mgmt.eqiad.wmnet with reboot policy FORCED
[00:44:20] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P39154 and previous config saved to /var/cache/conftool/dbconfig/20221111-004419-marostegui.json
[00:44:30] <wikibugs>	 (03PS9) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[00:45:09] <logmsgbot>	 !log jclark@cumin1001 START - Cookbook sre.dns.netbox
[00:46:45] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:46:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[00:47:01] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[00:47:07] <logmsgbot>	 !log jclark@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[00:47:25] <icinga-wm>	 PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:47:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q2:rack/setup/install dbprov1004 - https://phabricator.wikimedia.org/T321122 (10Jclark-ctr)
[00:48:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[00:49:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T322618)', diff saved to https://phabricator.wikimedia.org/P39155 and previous config saved to /var/cache/conftool/dbconfig/20221111-004945-ladsgroup.json
[00:49:47] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
[00:49:49] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[00:50:11] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
[00:50:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39156 and previous config saved to /var/cache/conftool/dbconfig/20221111-005017-ladsgroup.json
[00:50:44] <logmsgbot>	 !log jclark@cumin1001 START - Cookbook sre.hosts.provision for host dbprov1004.mgmt.eqiad.wmnet with reboot policy FORCED
[00:52:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39157 and previous config saved to /var/cache/conftool/dbconfig/20221111-005237-ladsgroup.json
[00:59:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321123)', diff saved to https://phabricator.wikimedia.org/P39158 and previous config saved to /var/cache/conftool/dbconfig/20221111-005925-marostegui.json
[00:59:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
[00:59:31] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[00:59:41] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
[00:59:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2146 (T321123)', diff saved to https://phabricator.wikimedia.org/P39159 and previous config saved to /var/cache/conftool/dbconfig/20221111-005947-marostegui.json
[01:01:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321123)', diff saved to https://phabricator.wikimedia.org/P39160 and previous config saved to /var/cache/conftool/dbconfig/20221111-010156-marostegui.json
[01:07:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P39161 and previous config saved to /var/cache/conftool/dbconfig/20221111-010743-ladsgroup.json
[01:17:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P39162 and previous config saved to /var/cache/conftool/dbconfig/20221111-011703-marostegui.json
[01:22:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P39163 and previous config saved to /var/cache/conftool/dbconfig/20221111-012250-ladsgroup.json
[01:31:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39164 and previous config saved to /var/cache/conftool/dbconfig/20221111-013157-ladsgroup.json
[01:32:03] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[01:32:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P39165 and previous config saved to /var/cache/conftool/dbconfig/20221111-013209-marostegui.json
[01:37:57] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39166 and previous config saved to /var/cache/conftool/dbconfig/20221111-013756-ladsgroup.json
[01:37:58] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
[01:38:01] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[01:38:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
[01:38:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39167 and previous config saved to /var/cache/conftool/dbconfig/20221111-013818-ladsgroup.json
[01:38:52] <jinxer-wm>	 (JobUnavailable) firing: (8) Reduced availability for job nginx in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:40:38] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39168 and previous config saved to /var/cache/conftool/dbconfig/20221111-014037-ladsgroup.json
[01:47:05] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P39169 and previous config saved to /var/cache/conftool/dbconfig/20221111-014704-ladsgroup.json
[01:47:12] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T318605)', diff saved to https://phabricator.wikimedia.org/P39170 and previous config saved to /var/cache/conftool/dbconfig/20221111-014712-ladsgroup.json
[01:47:16] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[01:47:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321123)', diff saved to https://phabricator.wikimedia.org/P39171 and previous config saved to /var/cache/conftool/dbconfig/20221111-014722-marostegui.json
[01:47:25] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
[01:47:27] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[01:47:38] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
[01:47:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2153 (T321123)', diff saved to https://phabricator.wikimedia.org/P39172 and previous config saved to /var/cache/conftool/dbconfig/20221111-014744-marostegui.json
[01:48:52] <jinxer-wm>	 (JobUnavailable) firing: (9) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:49:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321123)', diff saved to https://phabricator.wikimedia.org/P39173 and previous config saved to /var/cache/conftool/dbconfig/20221111-014953-marostegui.json
[01:53:52] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:55:45] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P39174 and previous config saved to /var/cache/conftool/dbconfig/20221111-015544-ladsgroup.json
[02:02:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P39175 and previous config saved to /var/cache/conftool/dbconfig/20221111-020211-ladsgroup.json
[02:02:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P39176 and previous config saved to /var/cache/conftool/dbconfig/20221111-020218-ladsgroup.json
[02:05:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P39177 and previous config saved to /var/cache/conftool/dbconfig/20221111-020500-marostegui.json
[02:08:52] <jinxer-wm>	 (JobUnavailable) firing: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:10:51] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P39178 and previous config saved to /var/cache/conftool/dbconfig/20221111-021051-ladsgroup.json
[02:14:20] <wikibugs>	 (03PS10) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[02:17:18] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39179 and previous config saved to /var/cache/conftool/dbconfig/20221111-021717-ladsgroup.json
[02:17:19] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[02:17:22] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[02:17:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P39180 and previous config saved to /var/cache/conftool/dbconfig/20221111-021725-ladsgroup.json
[02:17:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
[02:17:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1147 (T318605)', diff saved to https://phabricator.wikimedia.org/P39181 and previous config saved to /var/cache/conftool/dbconfig/20221111-021738-ladsgroup.json
[02:18:52] <jinxer-wm>	 (JobUnavailable) resolved: (10) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:20:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P39182 and previous config saved to /var/cache/conftool/dbconfig/20221111-022006-marostegui.json
[02:20:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751 (owner: 10Andrew Bogott)
[02:25:58] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T322618)', diff saved to https://phabricator.wikimedia.org/P39183 and previous config saved to /var/cache/conftool/dbconfig/20221111-022557-ladsgroup.json
[02:25:59] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
[02:26:02] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[02:26:13] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
[02:26:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P39184 and previous config saved to /var/cache/conftool/dbconfig/20221111-022619-ladsgroup.json
[02:28:39] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P39185 and previous config saved to /var/cache/conftool/dbconfig/20221111-022838-ladsgroup.json
[02:32:32] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2136 (T318605)', diff saved to https://phabricator.wikimedia.org/P39186 and previous config saved to /var/cache/conftool/dbconfig/20221111-023231-ladsgroup.json
[02:32:33] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
[02:32:36] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[02:32:47] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
[02:32:53] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2137:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39187 and previous config saved to /var/cache/conftool/dbconfig/20221111-023252-ladsgroup.json
[02:35:08] <wikibugs>	 (03PS11) 10Andrew Bogott: Add cookbook to restart openstack services [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/837751
[02:35:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321123)', diff saved to https://phabricator.wikimedia.org/P39188 and previous config saved to /var/cache/conftool/dbconfig/20221111-023513-marostegui.json
[02:35:15] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
[02:35:18] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[02:35:28] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
[02:35:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2167:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39189 and previous config saved to /var/cache/conftool/dbconfig/20221111-023534-marostegui.json
[02:36:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39190 and previous config saved to /var/cache/conftool/dbconfig/20221111-023643-marostegui.json
[02:43:46] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P39191 and previous config saved to /var/cache/conftool/dbconfig/20221111-024345-ladsgroup.json
[02:51:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P39192 and previous config saved to /var/cache/conftool/dbconfig/20221111-025150-marostegui.json
[02:58:52] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P39193 and previous config saved to /var/cache/conftool/dbconfig/20221111-025851-ladsgroup.json
[03:06:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P39194 and previous config saved to /var/cache/conftool/dbconfig/20221111-030656-marostegui.json
[03:13:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2182 (T322618)', diff saved to https://phabricator.wikimedia.org/P39195 and previous config saved to /var/cache/conftool/dbconfig/20221111-031358-ladsgroup.json
[03:14:03] <stashbot>	 T322618: Fix renamed indexes of flaggedrevs_tracking table in production - https://phabricator.wikimedia.org/T322618
[03:14:49] <icinga-wm>	 PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:15:29] <icinga-wm>	 PROBLEM - mailman archives on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:19:19] <icinga-wm>	 RECOVERY - mailman archives on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 48974 bytes in 0.063 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:22:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39196 and previous config saved to /var/cache/conftool/dbconfig/20221111-032203-marostegui.json
[03:22:05] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
[03:22:09] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[03:22:18] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
[03:22:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2170:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39197 and previous config saved to /var/cache/conftool/dbconfig/20221111-032224-marostegui.json
[03:22:41] <icinga-wm>	 RECOVERY - mailman list info on lists1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8571 bytes in 0.252 second response time https://wikitech.wikimedia.org/wiki/Mailman/Monitoring
[03:24:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39198 and previous config saved to /var/cache/conftool/dbconfig/20221111-032434-marostegui.json
[03:24:44] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations: Beta mwmaint puppet runs fail with "Resource type not found: Profile::Lvs::Classes" - https://phabricator.wikimedia.org/T322901 (10Tgr)
[03:39:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P39199 and previous config saved to /var/cache/conftool/dbconfig/20221111-033940-marostegui.json
[03:44:55] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 112 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:46:53] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:51:12] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10Beta-Cluster-reproducible: Beta mwmaint puppet runs fail with "Resource type not found: Profile::Lvs::Classes" - https://phabricator.wikimedia.org/T322901 (10Tgr) `modules/profile/types/lvs/classes.pp` is physically not present on deployment-puppetmaster04. Which wou...
[03:52:47] <tgr_>	 ^ seems to be a production puppet bug, could someone more familiar with the codebase look into it?
[03:54:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P39200 and previous config saved to /var/cache/conftool/dbconfig/20221111-035447-marostegui.json
[04:03:03] <jinxer-wm>	 (ProbeDown) firing: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:08:03] <jinxer-wm>	 (ProbeDown) resolved: (2) Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip4) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[04:09:54] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321123)', diff saved to https://phabricator.wikimedia.org/P39201 and previous config saved to /var/cache/conftool/dbconfig/20221111-040953-marostegui.json
[04:09:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
[04:09:59] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[04:10:09] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
[04:10:11] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
[04:10:24] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
[04:10:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2173 (T321123)', diff saved to https://phabricator.wikimedia.org/P39202 and previous config saved to /var/cache/conftool/dbconfig/20221111-041030-marostegui.json
[04:11:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321123)', diff saved to https://phabricator.wikimedia.org/P39203 and previous config saved to /var/cache/conftool/dbconfig/20221111-041139-marostegui.json
[04:26:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P39204 and previous config saved to /var/cache/conftool/dbconfig/20221111-042646-marostegui.json
[04:31:05] <wikibugs>	 (03PS4) 10Andrea Denisse: netmon: Open LibreNMS port for netmon2002. [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523)
[04:31:45] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] netmon: Open LibreNMS port for netmon2002. [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523) (owner: 10Andrea Denisse)
[04:41:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P39205 and previous config saved to /var/cache/conftool/dbconfig/20221111-044152-marostegui.json
[04:44:26] <wikibugs>	 (03PS5) 10Andrea Denisse: netmon: Open LibreNMS port for netmon2002. [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523)
[04:46:54] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] netmon: Open LibreNMS port for netmon2002. [puppet] - 10https://gerrit.wikimedia.org/r/854951 (https://phabricator.wikimedia.org/T315523) (owner: 10Andrea Denisse)
[04:56:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321123)', diff saved to https://phabricator.wikimedia.org/P39206 and previous config saved to /var/cache/conftool/dbconfig/20221111-045659-marostegui.json
[04:57:01] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
[04:57:04] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[04:57:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
[04:57:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2174 (T321123)', diff saved to https://phabricator.wikimedia.org/P39207 and previous config saved to /var/cache/conftool/dbconfig/20221111-045720-marostegui.json
[04:59:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321123)', diff saved to https://phabricator.wikimedia.org/P39208 and previous config saved to /var/cache/conftool/dbconfig/20221111-045930-marostegui.json
[05:14:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P39209 and previous config saved to /var/cache/conftool/dbconfig/20221111-051436-marostegui.json
[05:29:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P39210 and previous config saved to /var/cache/conftool/dbconfig/20221111-052943-marostegui.json
[05:44:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321123)', diff saved to https://phabricator.wikimedia.org/P39211 and previous config saved to /var/cache/conftool/dbconfig/20221111-054449-marostegui.json
[05:44:51] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
[05:44:55] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[05:45:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
[05:45:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2176 (T321123)', diff saved to https://phabricator.wikimedia.org/P39212 and previous config saved to /var/cache/conftool/dbconfig/20221111-054511-marostegui.json
[05:47:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321123)', diff saved to https://phabricator.wikimedia.org/P39213 and previous config saved to /var/cache/conftool/dbconfig/20221111-054720-marostegui.json
[05:56:08] <wikibugs>	 (03CR) 10Vgutierrez: Varnish analytics: support differential privacy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[06:02:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P39214 and previous config saved to /var/cache/conftool/dbconfig/20221111-060227-marostegui.json
[06:02:39] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:02:57] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 6/7 UP : OSPFv3: 6/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:17:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P39215 and previous config saved to /var/cache/conftool/dbconfig/20221111-061733-marostegui.json
[06:22:02] <vgutierrez>	 !log restart varnish on cp4047 to clear VarnishChildRestarted alert - T322903
[06:22:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:07] <stashbot>	 T322903: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903
[06:32:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321123)', diff saved to https://phabricator.wikimedia.org/P39216 and previous config saved to /var/cache/conftool/dbconfig/20221111-063240-marostegui.json
[06:32:45] <stashbot>	 T321123: Drop old index cuc_user_time on cu_changes table for wmf wikis - https://phabricator.wikimedia.org/T321123
[06:57:44] <_joe_>	 uhm why is it still saying XioNoX even if it's me?
[06:58:13] <_joe_>	 because sirenbot crashed I guess
[06:59:44] <_joe_>	 ah no, it reconnected and wasn't automatically made operator like it should, sigh
[07:11:33] <wikibugs>	 10SRE, 10Traffic: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez) Free memory on NUMA Node 0 got below the min threshold (1028416 < 1041448): `Node 0 Normal free:1028416kB min:1041448kB low:1303560kB high:1565672kB reserved_highatomic:2048KB active_anon:1800292kB inactiv...
[07:12:25] <wikibugs>	 10SRE, 10Traffic: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez)
[07:21:49] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:22:07] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 7/7 UP : OSPFv3: 7/7 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:50:30] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T318605)', diff saved to https://phabricator.wikimedia.org/P39217 and previous config saved to /var/cache/conftool/dbconfig/20221111-075028-ladsgroup.json
[07:50:34] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20221111T0800)
[08:05:37] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P39218 and previous config saved to /var/cache/conftool/dbconfig/20221111-080536-ladsgroup.json
[08:09:27] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Remove from cluster for eventual reimage
[08:09:43] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1020.eqiad.wmnet with reason: Remove from cluster for eventual reimage
[08:14:14] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reimage for host ganeti1020.eqiad.wmnet with OS bullseye
[08:14:19] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti1020.eqiad.wmnet with OS bullseye
[08:20:43] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P39219 and previous config saved to /var/cache/conftool/dbconfig/20221111-082042-ladsgroup.json
[08:28:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1020.eqiad.wmnet with reason: host reimage
[08:32:47] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1020.eqiad.wmnet with reason: host reimage
[08:35:50] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1147 (T318605)', diff saved to https://phabricator.wikimedia.org/P39220 and previous config saved to /var/cache/conftool/dbconfig/20221111-083549-ladsgroup.json
[08:35:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[08:35:54] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[08:36:04] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
[08:36:11] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1148 (T318605)', diff saved to https://phabricator.wikimedia.org/P39221 and previous config saved to /var/cache/conftool/dbconfig/20221111-083611-ladsgroup.json
[08:39:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39222 and previous config saved to /var/cache/conftool/dbconfig/20221111-083922-ladsgroup.json
[08:49:14] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1020.eqiad.wmnet with OS bullseye
[08:49:19] <wikibugs>	 10SRE, 10Ganeti, 10Infrastructure-Foundations: Upgrade ganeti/eqiad to Bullseye - https://phabricator.wikimedia.org/T311687 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti1020.eqiad.wmnet with OS bullseye completed: - ganeti1020 (**PASS**)   - Downtimed on...
[08:54:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P39223 and previous config saved to /var/cache/conftool/dbconfig/20221111-085428-ladsgroup.json
[08:55:51] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
[09:01:37] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
[09:02:01] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
[09:02:12] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[09:02:25] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
[09:03:45] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
[09:03:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2113.codfw.wmnet with reason: Maintenance
[09:04:00] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2113.codfw.wmnet with reason: Maintenance
[09:06:02] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.ganeti.addnode for new host ganeti1020.eqiad.wmnet to cluster eqiad and group D
[09:06:28] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[09:06:52] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1130.eqiad.wmnet with reason: Maintenance
[09:07:04] <logmsgbot>	 !log jmm@cumin2002 END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1020.eqiad.wmnet to cluster eqiad and group D
[09:08:26] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[09:08:40] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1098.eqiad.wmnet with reason: Maintenance
[09:08:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1098:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39224 and previous config saved to /var/cache/conftool/dbconfig/20221111-090846-marostegui.json
[09:08:50] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[09:09:35] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P39225 and previous config saved to /var/cache/conftool/dbconfig/20221111-090935-ladsgroup.json
[09:10:17] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, and 2 others: Fix rule violation in the lvs balancer role - https://phabricator.wikimedia.org/T264132 (10jbond) it would be useful to understand why theses changes where reverted to avoid issues in the future.
[09:15:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39226 and previous config saved to /var/cache/conftool/dbconfig/20221111-091514-marostegui.json
[09:15:19] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[09:16:50] <wikibugs>	 (03PS1) 10Marostegui: add_cul_actor_T321126.py: New schema change [software/schema-changes] - 10https://gerrit.wikimedia.org/r/855959 (https://phabricator.wikimedia.org/T321126)
[09:24:42] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39227 and previous config saved to /var/cache/conftool/dbconfig/20221111-092441-ladsgroup.json
[09:24:43] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
[09:24:46] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[09:24:57] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
[09:25:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2138:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39228 and previous config saved to /var/cache/conftool/dbconfig/20221111-092503-ladsgroup.json
[09:30:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P39229 and previous config saved to /var/cache/conftool/dbconfig/20221111-093020-marostegui.json
[09:32:04] <wikibugs>	 (03PS1) 10JMeybohm: Update to v1.23.14 [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/855961 (https://phabricator.wikimedia.org/T307943)
[09:34:01] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Update to v1.23.14 [debs/kubernetes] (v1.23) - 10https://gerrit.wikimedia.org/r/855961 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[09:35:15] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
[09:39:47] <wikibugs>	 (03PS1) 10Vgutierrez: hieradata: unify ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244)
[09:40:10] <wikibugs>	 (03PS2) 10Vgutierrez: hieradata: unify cp@ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244)
[09:40:53] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] hieradata: unify cp@ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244) (owner: 10Vgutierrez)
[09:41:40] <wikibugs>	 (03Abandoned) 10Phuedx: wgWMESchemaEditAttemptStepSamplingRate to 1 everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/854006 (https://phabricator.wikimedia.org/T312016) (owner: 10Phuedx)
[09:43:06] <wikibugs>	 (03PS3) 10Vgutierrez: hieradata: unify cp@ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244)
[09:45:25] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery-Search (Current work): Degraded RAID on elastic2052 - https://phabricator.wikimedia.org/T320482 (10Gehel) >>! In T320482#8385142, @RKemper wrote: > @Papaul Yup per jbond's comment above we're still seeing the RAID issue. Could we try either rebuilding raid with the current dis...
[09:45:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P39230 and previous config saved to /var/cache/conftool/dbconfig/20221111-094526-marostegui.json
[09:45:56] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[09:46:07] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet wi...
[09:47:24] <wikibugs>	 (03PS4) 10Vgutierrez: hieradata: unify cp@ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244)
[09:53:31] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (NOOP 15): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38103/console" [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244) (owner: 10Vgutierrez)
[09:54:22] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] hieradata: unify cp@ulsfo definitions [puppet] - 10https://gerrit.wikimedia.org/r/855962 (https://phabricator.wikimedia.org/T317244) (owner: 10Vgutierrez)
[09:54:44] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
[09:55:13] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
[09:57:20] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wmcs: cleanup SAL log messages (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/855650 (owner: 10Arturo Borrero Gonzalez)
[10:00:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39231 and previous config saved to /var/cache/conftool/dbconfig/20221111-100033-marostegui.json
[10:00:35] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[10:00:38] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[10:00:44] <wikibugs>	 (03Merged) 10jenkins-bot: wmcs: cleanup SAL log messages [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/855650 (owner: 10Arturo Borrero Gonzalez)
[10:00:48] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1101.eqiad.wmnet with reason: Maintenance
[10:00:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1101:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39232 and previous config saved to /var/cache/conftool/dbconfig/20221111-100054-marostegui.json
[10:00:59] <wikibugs>	 (03PS1) 10Vgutierrez: hieradata: clean up unused esams role cache::(text|upload) definitions [puppet] - 10https://gerrit.wikimedia.org/r/855964
[10:01:23] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[10:01:52] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hieradata: clean up unused esams role cache::(text|upload) definitions [puppet] - 10https://gerrit.wikimedia.org/r/855964 (owner: 10Vgutierrez)
[10:07:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39233 and previous config saved to /var/cache/conftool/dbconfig/20221111-100725-marostegui.json
[10:07:29] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[10:09:00] <wikibugs>	 (03PS1) 10Vgutierrez: varnish: Increase reserved memory to 120G in upload@ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/855965 (https://phabricator.wikimedia.org/T322903)
[10:12:01] <wikibugs>	 (03CR) 10Vgutierrez: "Please see https://phabricator.wikimedia.org/T322903" [puppet] - 10https://gerrit.wikimedia.org/r/849633 (owner: 10BBlack)
[10:13:25] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] wmcs: cleanup SAL log messages (031 comment) [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/855650 (owner: 10Arturo Borrero Gonzalez)
[10:14:50] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38104/console" [puppet] - 10https://gerrit.wikimedia.org/r/855965 (https://phabricator.wikimedia.org/T322903) (owner: 10Vgutierrez)
[10:15:34] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
[10:18:57] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
[10:22:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P39234 and previous config saved to /var/cache/conftool/dbconfig/20221111-102231-marostegui.json
[10:22:47] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
[10:26:18] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: Add rake task to perform basic conversions [deployment-charts] - 10https://gerrit.wikimedia.org/r/855668
[10:29:22] <icinga-wm>	 RECOVERY - Host lvs1014.mgmt is UP: PING OK - Packet loss = 0%, RTA = 2.87 ms
[10:35:16] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: cloudvirt200[123]-dev: use standard partman recipes for raid1 on 2 devices [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911)
[10:36:31] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: "@andrew this is for you to consider. It is 100% untested." [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911) (owner: 10Arturo Borrero Gonzalez)
[10:37:01] <wikibugs>	 (03PS1) 10Elukey: istio: change configs to adapt for 1.15.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193)
[10:37:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P39235 and previous config saved to /var/cache/conftool/dbconfig/20221111-103738-marostegui.json
[10:37:54] <icinga-wm>	 PROBLEM - IPMI Sensor Status on restbase1018 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures
[10:40:16] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: cloudvirt200[123]-dev: use standard partman recipes for raid1 on 2 devices [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911)
[10:40:43] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez) After further inspection I don't think that ATS memory increase is enough to explain what we are seeing here, text nodes in ulsfo are using around 326G of RAM but upload ones are usin...
[10:42:03] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] istio: change configs to adapt for 1.15.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193) (owner: 10Elukey)
[10:42:54] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: cloudvirt200[123]-dev: use standard partman recipes for raid1 on 2 devices [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911)
[10:45:13] <wikibugs>	 (03CR) 10JMeybohm: "I would suggest to add a comment to "deleting" spec.strategy (maybe even linking to https://istio.io/latest/docs/reference/config/istio.op" [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193) (owner: 10Elukey)
[10:48:54] <wikibugs>	 (03PS1) 10JMeybohm: k8s: Stop docker/runc spam from being written to syslog [puppet] - 10https://gerrit.wikimedia.org/r/855969 (https://phabricator.wikimedia.org/T307943)
[10:49:20] <wikibugs>	 (03PS2) 10JMeybohm: k8s: Stop docker/runc spam from being written to syslog [puppet] - 10https://gerrit.wikimedia.org/r/855969 (https://phabricator.wikimedia.org/T307943)
[10:52:31] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[10:52:40] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet with O...
[10:52:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39236 and previous config saved to /var/cache/conftool/dbconfig/20221111-105244-marostegui.json
[10:52:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[10:52:50] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[10:52:59] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1127.eqiad.wmnet with reason: Maintenance
[10:53:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1127 (T321130)', diff saved to https://phabricator.wikimedia.org/P39237 and previous config saved to /var/cache/conftool/dbconfig/20221111-105305-marostegui.json
[10:53:45] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 11): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38105/console" [puppet] - 10https://gerrit.wikimedia.org/r/855969 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[10:54:38] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: prometheus: drop cloudvirt ceph metrics generator [puppet] - 10https://gerrit.wikimedia.org/r/855970 (https://phabricator.wikimedia.org/T271096)
[10:56:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] prometheus: drop cloudvirt ceph metrics generator [puppet] - 10https://gerrit.wikimedia.org/r/855970 (https://phabricator.wikimedia.org/T271096) (owner: 10Arturo Borrero Gonzalez)
[10:56:49] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez) In fact it seems like varnish is the one eating the extra memory... in cp4045 (upload) with the following malloc specific config: `-s malloc,283G -s Transient=malloc,10G` varnish is c...
[10:59:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321130)', diff saved to https://phabricator.wikimedia.org/P39238 and previous config saved to /var/cache/conftool/dbconfig/20221111-105918-marostegui.json
[10:59:23] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[10:59:39] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: prometheus: drop cloudvirt ceph metrics generator [puppet] - 10https://gerrit.wikimedia.org/r/855970 (https://phabricator.wikimedia.org/T271096)
[11:02:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:03:07] <moritzm>	 !log installing wireshark security updates
[11:03:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-codfw&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DLogstashKafkaConsumerLag
[11:08:18] <wikibugs>	 (03PS2) 10Elukey: istio: change configs to adapt for 1.15.3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193)
[11:14:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P39239 and previous config saved to /var/cache/conftool/dbconfig/20221111-111424-marostegui.json
[11:18:24] <wikibugs>	 (03PS1) 10Muehlenhoff: Add ganeti1033 [puppet] - 10https://gerrit.wikimedia.org/r/855973 (https://phabricator.wikimedia.org/T314303)
[11:20:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add ganeti1033 [puppet] - 10https://gerrit.wikimedia.org/r/855973 (https://phabricator.wikimedia.org/T314303) (owner: 10Muehlenhoff)
[11:25:58] <wikibugs>	 (03CR) 10Hnowlan: Decode poolcounter messages, fix 429 error (033 comments) [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/855033 (https://phabricator.wikimedia.org/T312104) (owner: 10Hnowlan)
[11:28:43] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 03+1] "PCC as expected: https://puppet-compiler.wmflabs.org/pcc-worker1003/38108/" [puppet] - 10https://gerrit.wikimedia.org/r/855970 (https://phabricator.wikimedia.org/T271096) (owner: 10Arturo Borrero Gonzalez)
[11:29:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P39240 and previous config saved to /var/cache/conftool/dbconfig/20221111-112931-marostegui.json
[11:40:09] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911) (owner: 10Arturo Borrero Gonzalez)
[11:40:26] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt200[123]-dev: use standard partman recipes for raid1 on 2 devices [puppet] - 10https://gerrit.wikimedia.org/r/855966 (https://phabricator.wikimedia.org/T322911) (owner: 10Arturo Borrero Gonzalez)
[11:41:53] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt2002-dev: move to a single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/855042 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[11:42:36] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[11:42:45] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet wi...
[11:44:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321130)', diff saved to https://phabricator.wikimedia.org/P39241 and previous config saved to /var/cache/conftool/dbconfig/20221111-114437-marostegui.json
[11:44:39] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[11:44:43] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[11:44:53] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1136.eqiad.wmnet with reason: Maintenance
[11:44:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1136 (T321130)', diff saved to https://phabricator.wikimedia.org/P39242 and previous config saved to /var/cache/conftool/dbconfig/20221111-114458-marostegui.json
[11:45:01] <wikibugs>	 (03PS3) 10Hnowlan: Decode poolcounter messages, fix 429 error [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/855033 (https://phabricator.wikimedia.org/T312104)
[11:45:14] <wikibugs>	 (03PS1) 10Muehlenhoff: Retire raid1-lvm-xfs-nova.cfg [puppet] - 10https://gerrit.wikimedia.org/r/855975 (https://phabricator.wikimedia.org/T156955)
[11:45:40] <wikibugs>	 (03CR) 10Btullis: istio: change configs to adapt for 1.15.3 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193) (owner: 10Elukey)
[11:47:13] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321130)', diff saved to https://phabricator.wikimedia.org/P39243 and previous config saved to /var/cache/conftool/dbconfig/20221111-114712-marostegui.json
[11:51:27] <logmsgbot>	 !log aborrero@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[11:51:30] <wikibugs>	 (03CR) 10Gmodena: Varnish analytics: support differential privacy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[11:51:36] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet with O...
[11:52:53] <wikibugs>	 (03CR) 10Vlad.shapik: [C: 03+1] "Looks good to me." [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/855033 (https://phabricator.wikimedia.org/T312104) (owner: 10Hnowlan)
[11:53:58] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[11:54:08] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet wi...
[11:58:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Andrew might know or have opinions on this." [puppet] - 10https://gerrit.wikimedia.org/r/855975 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff)
[11:59:20] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] Decode poolcounter messages, fix 429 error [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/855033 (https://phabricator.wikimedia.org/T312104) (owner: 10Hnowlan)
[12:00:35] <wikibugs>	 (03CR) 10Vgutierrez: Varnish analytics: support differential privacy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[12:02:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P39244 and previous config saved to /var/cache/conftool/dbconfig/20221111-120219-marostegui.json
[12:04:15] <wikibugs>	 (03Merged) 10jenkins-bot: Decode poolcounter messages, fix 429 error [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/855033 (https://phabricator.wikimedia.org/T312104) (owner: 10Hnowlan)
[12:10:26] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
[12:13:37] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
[12:14:06] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
[12:17:26] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P39245 and previous config saved to /var/cache/conftool/dbconfig/20221111-121725-marostegui.json
[12:19:02] <wikibugs>	 (03PS1) 10Hnowlan: thumbor: bump version number [deployment-charts] - 10https://gerrit.wikimedia.org/r/855977 (https://phabricator.wikimedia.org/T233196)
[12:27:47] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] thumbor: bump version number [deployment-charts] - 10https://gerrit.wikimedia.org/r/855977 (https://phabricator.wikimedia.org/T233196) (owner: 10Hnowlan)
[12:30:52] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.13 point update - https://phabricator.wikimedia.org/T317413 (10MoritzMuehlenhoff)
[12:32:20] <wikibugs>	 (03Merged) 10jenkins-bot: thumbor: bump version number [deployment-charts] - 10https://gerrit.wikimedia.org/r/855977 (https://phabricator.wikimedia.org/T233196) (owner: 10Hnowlan)
[12:32:21] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.13 point update - https://phabricator.wikimedia.org/T317413 (10MoritzMuehlenhoff)
[12:32:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321130)', diff saved to https://phabricator.wikimedia.org/P39246 and previous config saved to /var/cache/conftool/dbconfig/20221111-123232-marostegui.json
[12:32:34] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[12:32:39] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[12:32:48] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
[12:32:49] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[12:33:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
[12:33:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1158 (T321130)', diff saved to https://phabricator.wikimedia.org/P39247 and previous config saved to /var/cache/conftool/dbconfig/20221111-123310-marostegui.json
[12:34:02] <logmsgbot>	 !log jmm@cumin2002 END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti1033.eqiad.wmnet
[12:35:06] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] START helmfile.d/services/thumbor: sync
[12:35:25] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321130)', diff saved to https://phabricator.wikimedia.org/P39248 and previous config saved to /var/cache/conftool/dbconfig/20221111-123524-marostegui.json
[12:35:55] <logmsgbot>	 !log hnowlan@deploy1002 helmfile [staging] DONE helmfile.d/services/thumbor: sync
[12:37:40] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
[12:37:51] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudvirt2002-dev.codfw.wmnet with O...
[12:42:23] <moritzm>	 !log installing debootstrap bugfix updates from buster point release
[12:42:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P39249 and previous config saved to /var/cache/conftool/dbconfig/20221111-125030-marostegui.json
[12:55:19] <logmsgbot>	 !log jnuche@deploy1002 Started scap: (no justification provided)
[12:58:24] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855979
[12:58:26] <wikibugs>	 (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855979 (owner: 10QChris)
[13:01:30] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-debug: apply
[13:01:30] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-debug: apply
[13:03:18] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.13 point update - https://phabricator.wikimedia.org/T317413 (10MoritzMuehlenhoff)
[13:05:22] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
[13:05:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P39251 and previous config saved to /var/cache/conftool/dbconfig/20221111-130537-marostegui.json
[13:05:41] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[13:06:00] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[13:06:01] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[13:07:56] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
[13:08:01] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
[13:08:01] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
[13:08:04] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
[13:10:07] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
[13:10:35] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-int: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
[13:12:58] <logmsgbot>	 !log jnuche@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
[13:12:59] <logmsgbot>	 !log jnuche@deploy1002 helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
[13:13:08] <logmsgbot>	 !log jnuche@deploy1002 sync-world aborted: (no justification provided) (duration: 17m 49s)
[13:17:02] <jnuche>	 ^ please disregards, that was some testing related to scap-based K8s deployments
[13:18:11] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero)
[13:20:20] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[13:20:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321130)', diff saved to https://phabricator.wikimedia.org/P39252 and previous config saved to /var/cache/conftool/dbconfig/20221111-132043-marostegui.json
[13:20:46] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[13:20:48] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[13:20:59] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
[13:21:05] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1170:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39253 and previous config saved to /var/cache/conftool/dbconfig/20221111-132105-marostegui.json
[13:21:48] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.13 point update - https://phabricator.wikimedia.org/T317413 (10MoritzMuehlenhoff)
[13:27:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39254 and previous config saved to /var/cache/conftool/dbconfig/20221111-132714-marostegui.json
[13:27:19] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[13:27:45] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt2003-dev: move to a single NIC setup [puppet] - 10https://gerrit.wikimedia.org/r/855043 (https://phabricator.wikimedia.org/T319184) (owner: 10Arturo Borrero Gonzalez)
[13:30:29] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
[13:30:39] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
[13:30:54] <moritzm>	 !log installing procmail security updates
[13:30:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:57] <wikibugs>	 (03CR) 10Hokwelum: [C: 03+1] "Thanks for the update, Dan. WANSecurity is not currently an active mirror, which is why the ipv4 entry still has "wikimedia.wansec.com."" [puppet] - 10https://gerrit.wikimedia.org/r/855096 (owner: 10Dzahn)
[13:37:14] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Fix db1206's owner [puppet] - 10https://gerrit.wikimedia.org/r/855982
[13:37:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] site.pp: Fix db1206's owner [puppet] - 10https://gerrit.wikimedia.org/r/855982 (owner: 10Marostegui)
[13:37:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Fix db1206's owner [puppet] - 10https://gerrit.wikimedia.org/r/855982 (owner: 10Marostegui)
[13:42:00] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[13:42:09] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] START helmfile.d/services/mw-web: apply
[13:42:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P39255 and previous config saved to /var/cache/conftool/dbconfig/20221111-134221-marostegui.json
[13:45:33] <logmsgbot>	 !log oblivian@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
[13:47:03] <logmsgbot>	 !log aborrero@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
[13:49:52] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
[13:50:02] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] START helmfile.d/services/mw-web: apply
[13:51:33] <logmsgbot>	 !log oblivian@deploy1002 helmfile [codfw] DONE helmfile.d/services/mw-web: apply
[13:53:54] <icinga-wm>	 PROBLEM - DPKG on netmon1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[13:55:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T318605)', diff saved to https://phabricator.wikimedia.org/P39256 and previous config saved to /var/cache/conftool/dbconfig/20221111-135506-ladsgroup.json
[13:55:13] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[13:57:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P39257 and previous config saved to /var/cache/conftool/dbconfig/20221111-135727-marostegui.json
[14:01:38] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[14:10:13] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P39258 and previous config saved to /var/cache/conftool/dbconfig/20221111-141012-ladsgroup.json
[14:10:40] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-client-10.4-bullseye: Version change [software] - 10https://gerrit.wikimedia.org/r/855985 (https://phabricator.wikimedia.org/T322620)
[14:11:19] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb-client-10.4-bullseye: Version change [software] - 10https://gerrit.wikimedia.org/r/855985 (https://phabricator.wikimedia.org/T322620) (owner: 10Marostegui)
[14:11:56] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-client-10.4-bullseye: Version change [software] - 10https://gerrit.wikimedia.org/r/855985 (https://phabricator.wikimedia.org/T322620) (owner: 10Marostegui)
[14:12:30] <logmsgbot>	 !log aborrero@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
[14:12:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39259 and previous config saved to /var/cache/conftool/dbconfig/20221111-141233-marostegui.json
[14:12:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[14:12:37] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudvirt2003-dev.codfw.wmnet with OS bullseye completed:...
[14:12:38] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[14:13:00] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
[14:17:02] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[14:17:16] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
[14:17:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1174 (T321130)', diff saved to https://phabricator.wikimedia.org/P39260 and previous config saved to /var/cache/conftool/dbconfig/20221111-141721-marostegui.json
[14:19:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321130)', diff saved to https://phabricator.wikimedia.org/P39261 and previous config saved to /var/cache/conftool/dbconfig/20221111-141935-marostegui.json
[14:19:40] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[14:24:31] <wikibugs>	 (03CR) 10Elukey: istio: change configs to adapt for 1.15.3 (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193) (owner: 10Elukey)
[14:25:19] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P39262 and previous config saved to /var/cache/conftool/dbconfig/20221111-142519-ladsgroup.json
[14:31:08] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "I am very ignorant about the new _helpers templates but afaics it looks good :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/855667 (owner: 10Giuseppe Lavagetto)
[14:32:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/855720 (https://phabricator.wikimedia.org/T135991) (owner: 10Dzahn)
[14:34:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P39263 and previous config saved to /var/cache/conftool/dbconfig/20221111-143441-marostegui.json
[14:35:48] <wikibugs>	 (03PS1) 10Ssingh: Depool ulsfo for resolving varnish issues [dns] - 10https://gerrit.wikimedia.org/r/855987 (https://phabricator.wikimedia.org/T322903)
[14:39:09] <wikibugs>	 (03CR) 10Ssingh: "Emergency patch, do not merge." [dns] - 10https://gerrit.wikimedia.org/r/855987 (https://phabricator.wikimedia.org/T322903) (owner: 10Ssingh)
[14:40:26] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1148 (T318605)', diff saved to https://phabricator.wikimedia.org/P39264 and previous config saved to /var/cache/conftool/dbconfig/20221111-144025-ladsgroup.json
[14:40:27] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[14:40:31] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[14:40:41] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
[14:40:47] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1149 (T318605)', diff saved to https://phabricator.wikimedia.org/P39265 and previous config saved to /var/cache/conftool/dbconfig/20221111-144047-ladsgroup.json
[14:47:04] <wikibugs>	 10SRE-tools, 10DBA, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10User-Kormat: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10Marostegui) What's the status of this task?
[14:49:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P39266 and previous config saved to /var/cache/conftool/dbconfig/20221111-144948-marostegui.json
[14:55:17] <wikibugs>	 10SRE-tools, 10DBA, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10User-Kormat: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10jcrespo) I coded RemoteExecution initially for the backup library. But I...
[14:57:10] <wikibugs>	 10SRE-tools, 10DBA, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10User-Kormat: Revert workaround for cumin output verbosity on RemoteExecution (CuminExecution) abstraction - https://phabricator.wikimedia.org/T282775 (10Marostegui) Thanks for the update, so it is still a valid task :)
[15:04:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321130)', diff saved to https://phabricator.wikimedia.org/P39267 and previous config saved to /var/cache/conftool/dbconfig/20221111-150454-marostegui.json
[15:04:56] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1191.eqiad.wmnet with reason: Maintenance
[15:05:00] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[15:05:10] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Maintenance
[15:05:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1191 (T321130)', diff saved to https://phabricator.wikimedia.org/P39268 and previous config saved to /var/cache/conftool/dbconfig/20221111-150516-marostegui.json
[15:08:27] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero)
[15:13:00] <wikibugs>	 10SRE-swift-storage, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Phonos, 10Community-Tech (CommTech-Sprint-36), and 2 others: Phonos links to an unauthorized URL - https://phabricator.wikimedia.org/T317417 (10TheresNoTime)
[15:18:00] <wikibugs>	 10SRE-swift-storage, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Phonos, 10Community-Tech (CommTech-Sprint-36), and 2 others: Phonos links to an unauthorized URL - https://phabricator.wikimedia.org/T317417 (10TheresNoTime) 05Open→03Stalled [[ https://gerrit.wikimedia.org/r/c/operations/puppet...
[15:20:31] <wikibugs>	 10SRE, 10Infrastructure-Foundations: Integrate Buster 10.13 point update - https://phabricator.wikimedia.org/T317413 (10MoritzMuehlenhoff)
[15:21:09] <moritzm>	 !log installing node-end-of-stream security updates
[15:21:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39269 and previous config saved to /var/cache/conftool/dbconfig/20221111-153009-ladsgroup.json
[15:30:15] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[15:33:47] <wikibugs>	 (03PS1) 10Vgutierrez: varnish: Disable THP for varnish on upload@ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/855992 (https://phabricator.wikimedia.org/T322903)
[15:39:57] <wikibugs>	 (03PS1) 10Ssingh: site: update role for cp4052 [puppet] - 10https://gerrit.wikimedia.org/r/855993
[15:41:04] <wikibugs>	 (03PS2) 10Vgutierrez: varnish: Disable THP for varnish on cp404[5-8] [puppet] - 10https://gerrit.wikimedia.org/r/855992 (https://phabricator.wikimedia.org/T322903)
[15:42:22] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] site: update role for cp4052 [puppet] - 10https://gerrit.wikimedia.org/r/855993 (owner: 10Ssingh)
[15:43:39] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38109/console" [puppet] - 10https://gerrit.wikimedia.org/r/855992 (https://phabricator.wikimedia.org/T322903) (owner: 10Vgutierrez)
[15:43:52] <logmsgbot>	 !log sukhe@cumin2002 START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
[15:44:41] <wikibugs>	 (03PS1) 10AikoChou: ml-services: update outlink's model binary and docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/855995 (https://phabricator.wikimedia.org/T322881)
[15:45:16] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P39270 and previous config saved to /var/cache/conftool/dbconfig/20221111-154515-ladsgroup.json
[15:49:21] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www-7.4.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method
[15:49:52] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: update outlink's model binary and docker image [deployment-charts] - 10https://gerrit.wikimedia.org/r/855995 (https://phabricator.wikimedia.org/T322881) (owner: 10AikoChou)
[15:50:21] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez)
[15:51:21] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=POST
[15:51:52] <wikibugs>	 (03PS1) 10Ssingh: Release 0.15.0-2 [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855996 (https://phabricator.wikimedia.org/T321309)
[15:53:35] <wikibugs>	 (03CR) 10Ssingh: "No debian-glue yet but patch submitted for that: Idca43d2bc23c38bd664cdab298dda6541b674c45" [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855996 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[15:54:16] <wikibugs>	 (03PS1) 10JMeybohm: k8s: make profile::kubernetes::cluster_cidr mandatory [puppet] - 10https://gerrit.wikimedia.org/r/855997 (https://phabricator.wikimedia.org/T307943)
[15:55:46] <wikibugs>	 (03CR) 10JMeybohm: "Please double check your clusters CIDRs!" [puppet] - 10https://gerrit.wikimedia.org/r/855997 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[15:56:40] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] varnish: Disable THP for varnish on cp404[5-8] [puppet] - 10https://gerrit.wikimedia.org/r/855992 (https://phabricator.wikimedia.org/T322903) (owner: 10Vgutierrez)
[15:56:44] <logmsgbot>	 !log sukhe@cumin2002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
[15:57:13] <logmsgbot>	 !log aikochou@deploy1002 helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
[15:58:24] <vgutierrez>	 !log rolling restart of varnish in cp4045 - cp4050 - T322903
[15:58:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:29] <stashbot>	 T322903: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903
[16:00:22] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P39271 and previous config saved to /var/cache/conftool/dbconfig/20221111-160022-ladsgroup.json
[16:00:28] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 11): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38110/console" [puppet] - 10https://gerrit.wikimedia.org/r/855997 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[16:03:07] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Thanks. Have double-checked the dse-k8s CIDRs and the two manifest files look good to me." [puppet] - 10https://gerrit.wikimedia.org/r/855997 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[16:05:19] <vgutierrez>	 !log restart varnish in cp2042
[16:05:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:05:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321130)', diff saved to https://phabricator.wikimedia.org/P39272 and previous config saved to /var/cache/conftool/dbconfig/20221111-160532-marostegui.json
[16:05:37] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[16:13:29] <wikibugs>	 (03PS1) 10Muehlenhoff: buster tracking updates [puppet] - 10https://gerrit.wikimedia.org/r/855998
[16:15:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P39273 and previous config saved to /var/cache/conftool/dbconfig/20221111-161528-ladsgroup.json
[16:15:31] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
[16:15:32] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: codfw1dev: hiera: cleanup per-host network overrides [puppet] - 10https://gerrit.wikimedia.org/r/855044
[16:15:35] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[16:15:44] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
[16:16:40] <wikibugs>	 10SRE, 10cloud-services-team (Kanban): rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev - https://phabricator.wikimedia.org/T214448 (10aborrero)
[16:17:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[16:18:42] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] buster tracking updates [puppet] - 10https://gerrit.wikimedia.org/r/855998 (owner: 10Muehlenhoff)
[16:20:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P39274 and previous config saved to /var/cache/conftool/dbconfig/20221111-162038-marostegui.json
[16:21:19] <wikibugs>	 (03CR) 10Ssingh: "Since reviewing this might be a bit hard given there is no history in this repository:" [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855996 (https://phabricator.wikimedia.org/T321309) (owner: 10Ssingh)
[16:22:01] <jinxer-wm>	 (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1015:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DBlazegraphFreeAllocatorsDecreasingRapidly
[16:26:38] <wikibugs>	 (03PS2) 10Ssingh: Release 0.15.0-2 [debs/varnish-modules] - 10https://gerrit.wikimedia.org/r/855996 (https://phabricator.wikimedia.org/T321309)
[16:28:59] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users & Kerberos identity for Ilooremeta - https://phabricator.wikimedia.org/T322147 (10ILooremeta-WMF) @fgiunchedi what would the email read like, please? I think I might have lost it in the many updates
[16:35:45] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P39275 and previous config saved to /var/cache/conftool/dbconfig/20221111-163545-marostegui.json
[16:39:39] <icinga-wm>	 PROBLEM - Host lvs1014.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:39:53] <sukhe>	 er
[16:39:58] <sukhe>	 will file a task
[16:42:25] <wikibugs>	 10ops-eqiad, 10Traffic: Host lvs1014.mgmt is down - https://phabricator.wikimedia.org/T322933 (10ssingh)
[16:42:35] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: oom killed varnish on cp4047 - https://phabricator.wikimedia.org/T322903 (10Vgutierrez) p:05High→03Medium Lowing the priority after deploying several experiments in upload@ulsfo that could mitigate the issue, see the task description for more details
[16:42:39] <wikibugs>	 10ops-eqiad, 10Traffic: Host lvs1014.mgmt is down - https://phabricator.wikimedia.org/T322933 (10ssingh) p:05Triage→03Medium
[16:44:47] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: codfw1dev: hiera: cleanup per-host network overrides [puppet] - 10https://gerrit.wikimedia.org/r/855044
[16:49:19] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 03+1] "PCC NOOP https://puppet-compiler.wmflabs.org/pcc-worker1001/38112/" [puppet] - 10https://gerrit.wikimedia.org/r/855044 (owner: 10Arturo Borrero Gonzalez)
[16:50:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321130)', diff saved to https://phabricator.wikimedia.org/P39277 and previous config saved to /var/cache/conftool/dbconfig/20221111-165051-marostegui.json
[16:50:53] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1194.eqiad.wmnet with reason: Maintenance
[16:50:57] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[16:51:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Maintenance
[16:51:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1194 (T321130)', diff saved to https://phabricator.wikimedia.org/P39278 and previous config saved to /var/cache/conftool/dbconfig/20221111-165113-marostegui.json
[16:53:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321130)', diff saved to https://phabricator.wikimedia.org/P39279 and previous config saved to /var/cache/conftool/dbconfig/20221111-165326-marostegui.json
[16:53:46] <wikibugs>	 (03PS1) 10JMeybohm: k8s: Refactor profile::kubernetes::master::service_cluster_ip_range [puppet] - 10https://gerrit.wikimedia.org/r/855999 (https://phabricator.wikimedia.org/T307943)
[16:54:49] <wikibugs>	 (03PS1) 10Vgutierrez: varnish: Remove deprecated jemalloc options [puppet] - 10https://gerrit.wikimedia.org/r/856000
[16:55:08] <wikibugs>	 (03CR) 10JMeybohm: "Please double check your service clusters CIDRs!" [puppet] - 10https://gerrit.wikimedia.org/r/855999 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[16:58:23] <wikibugs>	 (03CR) 10FNegri: [C: 03+1] "PCC looks good, nice cleanup!" [puppet] - 10https://gerrit.wikimedia.org/r/855044 (owner: 10Arturo Borrero Gonzalez)
[16:58:59] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [V: 03+1 C: 03+2] codfw1dev: hiera: cleanup per-host network overrides [puppet] - 10https://gerrit.wikimedia.org/r/855044 (owner: 10Arturo Borrero Gonzalez)
[17:03:01] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Double checked our cluster's CIDRs. Looks good, thanks." [puppet] - 10https://gerrit.wikimedia.org/r/855999 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[17:03:13] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38113/console" [puppet] - 10https://gerrit.wikimedia.org/r/855999 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[17:07:00] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/38114/console" [puppet] - 10https://gerrit.wikimedia.org/r/856000 (owner: 10Vgutierrez)
[17:08:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P39280 and previous config saved to /var/cache/conftool/dbconfig/20221111-170833-marostegui.json
[17:23:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P39281 and previous config saved to /var/cache/conftool/dbconfig/20221111-172339-marostegui.json
[17:24:47] <wikibugs>	 10Puppet, 10Infrastructure-Foundations: Consider alternative configuration managment tooling - https://phabricator.wikimedia.org/T321874 (10jhathaway) > Let me elaborate a little more on my experience in deployment-prep: > * I created a cloud server with cloud-init and my cloud public key, but was permanently...
[17:24:58] <jinxer-wm>	 (KubernetesAPILatency) firing: High Kubernetes API latency (PUT customresourcedefinitions) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:29:58] <jinxer-wm>	 (KubernetesAPILatency) resolved: High Kubernetes API latency (PUT customresourcedefinitions) on k8s-mlserve@codfw - https://wikitech.wikimedia.org/wiki/Kubernetes - https://grafana.wikimedia.org/d/000000435?var-site=codfw&var-cluster=k8s-mlserve - https://alerts.wikimedia.org/?q=alertname%3DKubernetesAPILatency
[17:34:38] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-tls
[17:34:38] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-be
[17:34:39] <logmsgbot>	 !log sukhe@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=varnish-fe
[17:38:03] <jinxer-wm>	 (ProbeDown) firing: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:38:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) @Jclark-ctr bit of a heads up I'm hoping to get the migration kicked off for those Juniper Spine devices now that we've got the lic...
[17:38:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321130)', diff saved to https://phabricator.wikimedia.org/P39282 and previous config saved to /var/cache/conftool/dbconfig/20221111-173846-marostegui.json
[17:38:48] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db1202.eqiad.wmnet with reason: Maintenance
[17:38:51] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[17:39:01] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Maintenance
[17:39:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db1202 (T321130)', diff saved to https://phabricator.wikimedia.org/P39283 and previous config saved to /var/cache/conftool/dbconfig/20221111-173907-marostegui.json
[17:39:21] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "looks good" [deployment-charts] - 10https://gerrit.wikimedia.org/r/855967 (https://phabricator.wikimedia.org/T322193) (owner: 10Elukey)
[17:41:22] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T321130)', diff saved to https://phabricator.wikimedia.org/P39284 and previous config saved to /var/cache/conftool/dbconfig/20221111-174121-marostegui.json
[17:42:35] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "ranges look correct for aux, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/855999 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[17:43:03] <jinxer-wm>	 (ProbeDown) resolved: Service centrallog2002:6514 has failed probes (tcp_rsyslog_receiver_ip6) - https://wikitech.wikimedia.org/wiki/TLS/Runbook#centrallog2002:6514 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[17:43:34] <wikibugs>	 (03CR) 10JHathaway: [C: 03+1] "ranges look correct for aux, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/855997 (https://phabricator.wikimedia.org/T307943) (owner: 10JMeybohm)
[17:56:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P39285 and previous config saved to /var/cache/conftool/dbconfig/20221111-175627-marostegui.json
[18:01:38] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[18:11:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P39286 and previous config saved to /var/cache/conftool/dbconfig/20221111-181134-marostegui.json
[18:26:41] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1202 (T321130)', diff saved to https://phabricator.wikimedia.org/P39287 and previous config saved to /var/cache/conftool/dbconfig/20221111-182640-marostegui.json
[18:26:42] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[18:26:47] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[18:26:56] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
[18:30:53] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2098.codfw.wmnet with reason: Maintenance
[18:31:06] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2098.codfw.wmnet with reason: Maintenance
[18:35:23] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2100.codfw.wmnet with reason: Maintenance
[18:35:36] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2100.codfw.wmnet with reason: Maintenance
[18:39:58] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2108.codfw.wmnet with reason: Maintenance
[18:40:11] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2108.codfw.wmnet with reason: Maintenance
[18:40:18] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2108 (T321130)', diff saved to https://phabricator.wikimedia.org/P39288 and previous config saved to /var/cache/conftool/dbconfig/20221111-184017-marostegui.json
[18:40:22] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[18:46:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321130)', diff saved to https://phabricator.wikimedia.org/P39289 and previous config saved to /var/cache/conftool/dbconfig/20221111-184633-marostegui.json
[18:46:38] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[18:55:16] <wikibugs>	 (03PS2) 10Eevans: Add component/gocql to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/855102 (https://phabricator.wikimedia.org/T283838)
[18:56:16] <wikibugs>	 (03CR) 10Eevans: [C: 03+2] Add component/gocql to bullseye [puppet] - 10https://gerrit.wikimedia.org/r/855102 (https://phabricator.wikimedia.org/T283838) (owner: 10Eevans)
[19:01:40] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P39290 and previous config saved to /var/cache/conftool/dbconfig/20221111-190139-marostegui.json
[19:07:03] <wikibugs>	 (03PS1) 10Sergio Gimeno: [Growth] Make Vue mentor dashboard default by removing GEMentorDashboardUseVue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/856008
[19:08:51] <wikibugs>	 (03PS2) 10Sergio Gimeno: GrowthExperiments: Make Vue mentor dashboard default by removing GEMentorDashboardUseVue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/856008
[19:16:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P39291 and previous config saved to /var/cache/conftool/dbconfig/20221111-191646-marostegui.json
[19:31:53] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321130)', diff saved to https://phabricator.wikimedia.org/P39292 and previous config saved to /var/cache/conftool/dbconfig/20221111-193152-marostegui.json
[19:31:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2120.codfw.wmnet with reason: Maintenance
[19:31:57] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[19:32:08] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2120.codfw.wmnet with reason: Maintenance
[19:32:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2120 (T321130)', diff saved to https://phabricator.wikimedia.org/P39293 and previous config saved to /var/cache/conftool/dbconfig/20221111-193214-marostegui.json
[19:35:24] <wikibugs>	 (03CR) 10Htriedman: Varnish analytics: support differential privacy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/824769 (https://phabricator.wikimedia.org/T315676) (owner: 10Isaac Johnson)
[19:38:32] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321130)', diff saved to https://phabricator.wikimedia.org/P39294 and previous config saved to /var/cache/conftool/dbconfig/20221111-193832-marostegui.json
[19:38:38] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[19:41:00] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] phabricator/aphlict: pass through ensure parameter [puppet] - 10https://gerrit.wikimedia.org/r/855720 (https://phabricator.wikimedia.org/T135991) (owner: 10Dzahn)
[19:46:19] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 104 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:48:19] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:51:10] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "noop confirmed. thanks. yea, so this is an improvement and a noop everywhere but that still doesn't remove the restart code and alert from" [puppet] - 10https://gerrit.wikimedia.org/r/855720 (https://phabricator.wikimedia.org/T135991) (owner: 10Dzahn)
[19:53:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P39295 and previous config saved to /var/cache/conftool/dbconfig/20221111-195338-marostegui.json
[20:08:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P39296 and previous config saved to /var/cache/conftool/dbconfig/20221111-200845-marostegui.json
[20:14:00] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T318605)', diff saved to https://phabricator.wikimedia.org/P39297 and previous config saved to /var/cache/conftool/dbconfig/20221111-201400-ladsgroup.json
[20:14:05] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[20:20:39] <icinga-wm>	 RECOVERY - Check systemd state on phab1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:20:57] <icinga-wm>	 RECOVERY - Check systemd state on phab1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:21:03] <icinga-wm>	 RECOVERY - Check systemd state on phab2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:21:05] <mutante>	 !log phab1001,phab1004,phab2002 - systemctl reset-failed
[20:21:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:23:32] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "it just needed a 'systemctl reset-failed' on the 3 phab hosts. icinga recovered. units don't exist anymore and puppet is not adding them b" [puppet] - 10https://gerrit.wikimedia.org/r/855720 (https://phabricator.wikimedia.org/T135991) (owner: 10Dzahn)
[20:23:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321130)', diff saved to https://phabricator.wikimedia.org/P39298 and previous config saved to /var/cache/conftool/dbconfig/20221111-202351-marostegui.json
[20:23:54] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
[20:23:56] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[20:24:07] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2121.codfw.wmnet with reason: Maintenance
[20:24:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2121 (T321130)', diff saved to https://phabricator.wikimedia.org/P39299 and previous config saved to /var/cache/conftool/dbconfig/20221111-202413-marostegui.json
[20:29:07] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P39300 and previous config saved to /var/cache/conftool/dbconfig/20221111-202906-ladsgroup.json
[20:30:30] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321130)', diff saved to https://phabricator.wikimedia.org/P39301 and previous config saved to /var/cache/conftool/dbconfig/20221111-203030-marostegui.json
[20:30:35] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[20:44:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P39302 and previous config saved to /var/cache/conftool/dbconfig/20221111-204413-ladsgroup.json
[20:45:37] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P39303 and previous config saved to /var/cache/conftool/dbconfig/20221111-204536-marostegui.json
[20:59:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1149 (T318605)', diff saved to https://phabricator.wikimedia.org/P39304 and previous config saved to /var/cache/conftool/dbconfig/20221111-205919-ladsgroup.json
[20:59:21] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[20:59:26] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[20:59:35] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
[21:00:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P39305 and previous config saved to /var/cache/conftool/dbconfig/20221111-210043-marostegui.json
[21:10:15] <wikibugs>	 (03PS1) 10Dzahn: phabricator: add parameter for mysql port, set it to 3323 if using slave [puppet] - 10https://gerrit.wikimedia.org/r/856013 (https://phabricator.wikimedia.org/T280597)
[21:15:50] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321130)', diff saved to https://phabricator.wikimedia.org/P39306 and previous config saved to /var/cache/conftool/dbconfig/20221111-211550-marostegui.json
[21:15:52] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2122.codfw.wmnet with reason: Maintenance
[21:15:56] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[21:16:05] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2122.codfw.wmnet with reason: Maintenance
[21:16:12] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2122 (T321130)', diff saved to https://phabricator.wikimedia.org/P39307 and previous config saved to /var/cache/conftool/dbconfig/20221111-211611-marostegui.json
[21:22:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321130)', diff saved to https://phabricator.wikimedia.org/P39308 and previous config saved to /var/cache/conftool/dbconfig/20221111-212239-marostegui.json
[21:22:44] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[21:37:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P39309 and previous config saved to /var/cache/conftool/dbconfig/20221111-213745-marostegui.json
[21:52:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P39310 and previous config saved to /var/cache/conftool/dbconfig/20221111-215252-marostegui.json
[22:01:38] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[22:07:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321130)', diff saved to https://phabricator.wikimedia.org/P39311 and previous config saved to /var/cache/conftool/dbconfig/20221111-220758-marostegui.json
[22:08:00] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2150.codfw.wmnet with reason: Maintenance
[22:08:04] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[22:08:14] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Maintenance
[22:08:21] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2150 (T321130)', diff saved to https://phabricator.wikimedia.org/P39312 and previous config saved to /var/cache/conftool/dbconfig/20221111-220820-marostegui.json
[22:09:20] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
[22:09:33] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
[22:09:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2147 (T318605)', diff saved to https://phabricator.wikimedia.org/P39313 and previous config saved to /var/cache/conftool/dbconfig/20221111-220939-ladsgroup.json
[22:09:44] <stashbot>	 T318605: Deploy new externallinks fields to production - https://phabricator.wikimedia.org/T318605
[22:14:42] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321130)', diff saved to https://phabricator.wikimedia.org/P39314 and previous config saved to /var/cache/conftool/dbconfig/20221111-221441-marostegui.json
[22:14:47] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[22:29:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P39315 and previous config saved to /var/cache/conftool/dbconfig/20221111-222948-marostegui.json
[22:44:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P39316 and previous config saved to /var/cache/conftool/dbconfig/20221111-224454-marostegui.json
[23:00:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321130)', diff saved to https://phabricator.wikimedia.org/P39317 and previous config saved to /var/cache/conftool/dbconfig/20221111-230000-marostegui.json
[23:00:03] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2159.codfw.wmnet with reason: Maintenance
[23:00:06] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[23:00:17] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Maintenance
[23:00:18] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 10:00:00 on db2095.codfw.wmnet with reason: Maintenance
[23:00:32] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2095.codfw.wmnet with reason: Maintenance
[23:00:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2159 (T321130)', diff saved to https://phabricator.wikimedia.org/P39318 and previous config saved to /var/cache/conftool/dbconfig/20221111-230037-marostegui.json
[23:06:55] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T321130)', diff saved to https://phabricator.wikimedia.org/P39319 and previous config saved to /var/cache/conftool/dbconfig/20221111-230654-marostegui.json
[23:07:00] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[23:16:23] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) firing: (2) Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[23:22:01] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P39320 and previous config saved to /var/cache/conftool/dbconfig/20221111-232201-marostegui.json
[23:36:23] <jinxer-wm>	 (Wikidata Reliability Metrics - wbeditentity API: executeTiming alert) resolved: Wikidata Reliability Metrics - wbeditentity API: executeTiming alert   - https://alerts.wikimedia.org/?q=alertname%3DWikidata+Reliability+Metrics+-+wbeditentity+API%3A+executeTiming+alert
[23:37:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P39321 and previous config saved to /var/cache/conftool/dbconfig/20221111-233707-marostegui.json
[23:52:14] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2159 (T321130)', diff saved to https://phabricator.wikimedia.org/P39322 and previous config saved to /var/cache/conftool/dbconfig/20221111-235214-marostegui.json
[23:52:16] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[23:52:19] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130
[23:52:30] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
[23:52:36] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depooling db2168:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39323 and previous config saved to /var/cache/conftool/dbconfig/20221111-235235-marostegui.json
[23:59:02] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321130)', diff saved to https://phabricator.wikimedia.org/P39324 and previous config saved to /var/cache/conftool/dbconfig/20221111-235902-marostegui.json
[23:59:07] <stashbot>	 T321130: Add column cuc_private to cu_changes on wmf wikis - https://phabricator.wikimedia.org/T321130