[00:19:36] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P61057 and previous config saved to /var/cache/conftool/dbconfig/20240422-001933-ladsgroup.json [00:19:42] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [00:34:43] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P61058 and previous config saved to /var/cache/conftool/dbconfig/20240422-003442-ladsgroup.json [00:49:50] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P61059 and previous config saved to /var/cache/conftool/dbconfig/20240422-004950-ladsgroup.json [01:04:58] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P61060 and previous config saved to /var/cache/conftool/dbconfig/20240422-010457-ladsgroup.json [01:05:00] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance [01:05:03] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [01:05:13] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance [01:05:21] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61061 and previous config saved to /var/cache/conftool/dbconfig/20240422-010520-ladsgroup.json [02:38:51] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [03:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [03:03:51] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [04:44:45] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P61062 and previous config saved to /var/cache/conftool/dbconfig/20240422-044444-ladsgroup.json [04:44:53] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [04:59:52] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P61063 and previous config saved to /var/cache/conftool/dbconfig/20240422-045952-ladsgroup.json [05:15:00] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P61064 and previous config saved to /var/cache/conftool/dbconfig/20240422-051459-ladsgroup.json [05:30:07] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P61065 and previous config saved to /var/cache/conftool/dbconfig/20240422-053006-ladsgroup.json [05:30:09] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance [05:30:17] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [05:30:23] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance [06:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240422T0700) [07:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [07:03:51] (JobUnavailable) firing: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:30:25] (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:29:11] (03PS1) 10Majavah: Add cawiki 750k logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1023046 [08:29:31] (03PS2) 10Majavah: Add cawiki 750k logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1023046 (https://phabricator.wikimedia.org/T363057) [10:14:15] (03PS1) 10Fabfur: benthos: add envvar for buffer limit [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) [10:19:59] 06SRE, 10LDAP-Access-Requests: Grant Access to Superset for ifeatu_nnaobi_wmde - https://phabricator.wikimedia.org/T358091#9732824 (10Ifeatu_Nnaobi_WMDE) Thanks :) [10:25:13] (03PS2) 10Fabfur: benthos: add envvar for buffer limit [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) [10:30:22] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2068/co" [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [10:31:01] (03CR) 10Fabfur: benthos: add envvar for buffer limit [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [10:35:24] 10ops-eqiad: ManagementSSHDown - https://phabricator.wikimedia.org/T363086 (10phaultfinder) 03NEW [10:35:56] (03CR) 10Vgutierrez: [C:04-1] "split this CR in two. one for the parametrization of the value and another one for the increase on cp4037 (easier and cleaner to revert th" [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [10:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [10:41:03] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61066 and previous config saved to /var/cache/conftool/dbconfig/20240422-104102-ladsgroup.json [10:41:14] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [10:44:15] (03PS3) 10Fabfur: benthos: add envvar for buffer limit [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) [10:44:16] (03PS1) 10Fabfur: hiera: buffer memory limit override for cp4037 [puppet] - 10https://gerrit.wikimedia.org/r/1023060 (https://phabricator.wikimedia.org/T358109) [10:45:06] (03CR) 10Fabfur: "ack, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1023054 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [10:48:20] (03CR) 10Fabfur: [V:03+1] "PCC SUCCESS (CORE_DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet5-compiler-node/2069/co" [puppet] - 10https://gerrit.wikimedia.org/r/1023060 (https://phabricator.wikimedia.org/T358109) (owner: 10Fabfur) [10:56:11] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P61067 and previous config saved to /var/cache/conftool/dbconfig/20240422-105610-ladsgroup.json [11:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [11:03:51] (JobUnavailable) firing: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:11:18] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P61068 and previous config saved to /var/cache/conftool/dbconfig/20240422-111117-ladsgroup.json [11:26:25] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61069 and previous config saved to /var/cache/conftool/dbconfig/20240422-112625-ladsgroup.json [11:26:27] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance [11:26:32] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [11:26:40] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance [11:30:25] (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:36:40] (03CR) 10Urbanecm: [C:04-1] "issue: also needs to be defined in `wmf-config/ext-EventLogging.php`" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime) [12:02:38] (03PS5) 10Cyndywikime: Add account_conversion event streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 [12:09:08] (03CR) 10Cyndywikime: "Done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime) [12:11:50] (03CR) 10Urbanecm: [C:04-1] Add account_conversion event streams. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime) [12:29:43] (03PS6) 10Cyndywikime: Add account_conversion event streams. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 [12:30:59] (03CR) 10Cyndywikime: Add account_conversion event streams. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime) [12:51:04] I'm trying to add a prometheus target which doesn't quite fit into prometheus::class_config (the job is run manually), but is long-running and I want prometheus in order to collect runtime metrics so it also doesn't fit under the push adapter. [12:58:06] I guess I'll just add a raw `file` resource [13:07:24] (03PS1) 10Awight: Temporary monitoring for long-running analytics client job [puppet] - 10https://gerrit.wikimedia.org/r/1023085 (https://phabricator.wikimedia.org/T362904) [13:40:16] (03CR) 10Gergő Tisza: [C:03+1] logging: always register udp2log handlers (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1019253 (https://phabricator.wikimedia.org/T228838) (owner: 10Hashar) [13:47:47] !log mforns@deploy1002 Started deploy [analytics/refinery@a7af5a6]: Deploying Commons Impact Metrics dumps queries [analytics/refinery@a7af5a6b] [13:53:58] (03CR) 10Urbanecm: [C:03+1] "generally, new additions go to the bottom, but fair enough, this should work as well" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/989216 (owner: 10Cyndywikime) [14:00:50] !log mforns@deploy1002 Finished deploy [analytics/refinery@a7af5a6]: Deploying Commons Impact Metrics dumps queries [analytics/refinery@a7af5a6b] (duration: 13m 02s) [14:01:51] !log mforns@deploy1002 Started deploy [analytics/refinery@a7af5a6] (thin): Deploy Commons Impact Metrics dumps queries THIN [analytics/refinery@a7af5a6b] [14:05:26] !log mforns@deploy1002 Finished deploy [analytics/refinery@a7af5a6] (thin): Deploy Commons Impact Metrics dumps queries THIN [analytics/refinery@a7af5a6b] (duration: 03m 34s) [14:08:14] !log mforns@deploy1002 Started deploy [analytics/refinery@a7af5a6] (hadoop-test): Deploying Commons Impact Metrics dump queries TEST [analytics/refinery@a7af5a6b] [14:10:51] !log mforns@deploy1002 Finished deploy [analytics/refinery@a7af5a6] (hadoop-test): Deploying Commons Impact Metrics dump queries TEST [analytics/refinery@a7af5a6b] (duration: 02m 36s) [14:38:51] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [14:41:46] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down [14:41:48] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down [14:57:48] !log mforns@deploy1002 Started deploy [airflow-dags/analytics@70946de]: (no justification provided) [14:58:15] !log mforns@deploy1002 Finished deploy [airflow-dags/analytics@70946de]: (no justification provided) (duration: 00m 27s) [15:00:24] (JobUnavailable) firing: (2) Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [15:30:25] (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:32:10] (03PS1) 10C. Scott Ananian: ParserOutput: don't complain if TOCHTML is unset from ParserCache [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1023100 (https://phabricator.wikimedia.org/T363107) [15:36:35] (03PS1) 10Urbanecm: Growth: Enable Levelling up features on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1023101 (https://phabricator.wikimedia.org/T348086) [15:44:55] (03CR) 10Thiemo Kreuz (WMDE): [C:03+1] ParserOutput: don't complain if TOCHTML is unset from ParserCache [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1023100 (https://phabricator.wikimedia.org/T363107) (owner: 10C. Scott Ananian) [16:06:55] 10ops-magru, 06DC-Ops, 06Traffic: Q4:rack/setup/install cp70[01-16] - https://phabricator.wikimedia.org/T362729#9733382 (10Jhancock.wm) [16:07:51] 10ops-magru, 06DC-Ops, 06Infrastructure-Foundations, 10netops, 06Traffic: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9733383 (10Jhancock.wm) [16:21:58] (03CR) 10Subramanya Sastry: "I assume the plan to backport this to wmf.1 and not merge this on master?" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1023100 (https://phabricator.wikimedia.org/T363107) (owner: 10C. Scott Ananian) [16:23:20] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance [16:23:33] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance [16:23:41] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P61071 and previous config saved to /var/cache/conftool/dbconfig/20240422-162340-ladsgroup.json [16:23:45] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [16:38:11] (ProbeDown) firing: Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_collab_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:43:11] (ProbeDown) resolved: Service phab1004:443 has failed probes (http_phabricator_wikimedia_org_collab_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#phab1004:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [17:22:39] 10ops-magru, 06DC-Ops, 06Infrastructure-Foundations, 10netops, 06Traffic: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9733503 (10Jhancock.wm) [17:23:43] 10ops-magru, 06DC-Ops, 06Traffic: Q4:rack/setup/install cp70[01-16] - https://phabricator.wikimedia.org/T362729#9733511 (10Jhancock.wm) [18:30:48] (PuppetDisabled) firing: Puppet disabled on ganeti2033:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=ganeti&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [18:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [19:00:24] (JobUnavailable) firing: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [19:30:25] (SystemdUnitFailed) firing: (2) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:01:48] 10SRE-swift-storage, 06Commons, 10MediaWiki-File-management, 07Wikimedia-production-error: Due to PHP fatal, a new version upload overwrote a file (the original is gone) - https://phabricator.wikimedia.org/T198177#9733765 (10Krinkle) [21:07:32] 10ops-eqiad: eqiad: magru transport down - https://phabricator.wikimedia.org/T363117 (10ayounsi) 03NEW p:05Triage→03High [21:25:44] (03CR) 10C. Scott Ananian: "Yes. This patch is already targeted to wmf.1; it doesn't actually apply to master because the relevant code has been deleted there. I77a" [core] (wmf/1.43.0-wmf.1) - 10https://gerrit.wikimedia.org/r/1023100 (https://phabricator.wikimedia.org/T363107) (owner: 10C. Scott Ananian) [22:25:25] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:28:11] !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance [22:28:24] !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance [22:28:31] !log ladsgroup@cumin1002 dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P61072 and previous config saved to /var/cache/conftool/dbconfig/20240422-222830-ladsgroup.json [22:28:35] T352010: Gradually drop old pagelinks columns - https://phabricator.wikimedia.org/T352010 [22:30:48] (PuppetDisabled) firing: Puppet disabled on ganeti2033:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=ganeti&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [22:40:33] (KubernetesCalicoDown) firing: parse1002.eqiad.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=eqiad%20prometheus%2Fk8s&var-instance=parse1002.eqiad.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown [22:49:42] 10ops-eqiad, 06SRE, 10Observability-Metrics: Memory upgrade request for prometheus100[56] - https://phabricator.wikimedia.org/T360687#9733839 (10VRiley-WMF) hey @herron is there a specific time you'd like use to arrange for this activity? Let us know, thanks! [23:01:26] (RoutinatorRsyncErrors) firing: (2) Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [23:03:51] (JobUnavailable) firing: Reduced availability for job thanos-sidecar in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:12:10] jouncebot: nowandnext [23:12:10] For the next 7 hour(s) and 47 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240422T0700) [23:12:10] In 2 hour(s) and 47 minute(s): Automatic branching of MediaWiki, extensions, skins, and vendor – see Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240423T0200) [23:13:37] (03PS1) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1022532 [23:13:48] ah, its earth day [23:14:18] (03PS2) 10Zabe: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1022532 (https://phabricator.wikimedia.org/T363093) [23:20:25] (SystemdUnitFailed) firing: (3) docker-reporter-base-images.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:38:12] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1022533 [23:38:12] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1022533 (owner: 10TrainBranchBot) [23:57:39] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1022533 (owner: 10TrainBranchBot)