[00:03:20] <icinga-wm>	 PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[00:11:27] <wikibugs>	 (03PS1) 10Ebernhardson: rdf-query-service: consistently suffix env vars [puppet] - 10https://gerrit.wikimedia.org/r/757996
[00:14:20] <ebernhardson>	 !log restart elasticsearch_6@production-search-psi-eqiad on elastic1049 to address CirrusSearchJVMGCOldPoolFlatlined alert
[00:14:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:16:01] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) resolved: Elasticsearch instance elastic1049-production-search-psi-eqiad is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[00:28:59] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, 10netops, 10Patch-For-Review: Create Generalised blocking strategy - https://phabricator.wikimedia.org/T270618 (10Aklapper)
[00:44:28] <wikibugs>	 (03PS1) 10Eevans: Fold-in (minor) upstream configuration changes [puppet] - 10https://gerrit.wikimedia.org/r/757998 (https://phabricator.wikimedia.org/T298516)
[00:50:02] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] "This will be a no-op everywhere but aqs1010.eqiad.wmnet (which has already been upgraded to 'dev'), where it'll be a no-op because it intr" [puppet] - 10https://gerrit.wikimedia.org/r/757998 (https://phabricator.wikimedia.org/T298516) (owner: 10Eevans)
[00:59:25] <wikibugs>	 (03CR) 10Cwhite: "The context I was able to find around gc logging enabled by default seems fairly out of date.  Open for discussion!" [puppet] - 10https://gerrit.wikimedia.org/r/757955 (https://phabricator.wikimedia.org/T297239) (owner: 10Cwhite)
[01:01:21] <wikibugs>	 (03PS1) 10Eevans: Upgrade remaining aqs_next nodes to 'dev' (Cassandra 3.11.11) [puppet] - 10https://gerrit.wikimedia.org/r/757999 (https://phabricator.wikimedia.org/T298516)
[01:05:46] <wikibugs>	 (03CR) 10Eevans: "PCC: https://puppet-compiler.wmflabs.org/pcc-worker1001/33497/" [puppet] - 10https://gerrit.wikimedia.org/r/757999 (https://phabricator.wikimedia.org/T298516) (owner: 10Eevans)
[01:06:04] <wikibugs>	 (03CR) 10Eevans: [C: 04-1] "We should hold off on merging until we are ready to proceed with the upgrade." [puppet] - 10https://gerrit.wikimedia.org/r/757999 (https://phabricator.wikimedia.org/T298516) (owner: 10Eevans)
[01:20:04] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: keystone: set bind port for uwsgi process [puppet] - 10https://gerrit.wikimedia.org/r/757982 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[01:21:54] <icinga-wm>	 RECOVERY - Disk space on centrallog1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=centrallog1001&var-datasource=eqiad+prometheus/ops
[01:47:39] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "openstack: keystone: set bind port for uwsgi process" [puppet] - 10https://gerrit.wikimedia.org/r/758003
[01:50:21] <wikibugs>	 (03Abandoned) 10Andrew Bogott: Revert "openstack: keystone: set bind port for uwsgi process" [puppet] - 10https://gerrit.wikimedia.org/r/758003 (owner: 10Andrew Bogott)
[02:14:45] <wikibugs>	 (03PS1) 10Andrew Bogott: Keystone/victoria/bullseye: brute-force replace init scripts [puppet] - 10https://gerrit.wikimedia.org/r/758004 (https://phabricator.wikimedia.org/T300254)
[02:17:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Keystone/victoria/bullseye: brute-force replace init scripts [puppet] - 10https://gerrit.wikimedia.org/r/758004 (https://phabricator.wikimedia.org/T300254) (owner: 10Andrew Bogott)
[04:04:26] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 84 probes of 653 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[04:10:44] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 59 probes of 653 (alerts on 65) - https://atlas.ripe.net/measurements/32390541/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[04:30:29] <wikibugs>	 (03PS1) 10Scardenasmolinar: Lower The Wikipedia Library extension edit count [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758033 (https://phabricator.wikimedia.org/T288070)
[04:45:59] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
[04:46:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:50] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2003-dev.wikimedia.org with OS bullseye
[05:25:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:36:56] <wikibugs>	 (03PS1) 10Andrew Bogott: keystone::wsgi_server: 'keystone' on cloudcontrol2003-dev and 2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/758034
[05:37:55] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] keystone::wsgi_server: 'keystone' on cloudcontrol2003-dev and 2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/758034 (owner: 10Andrew Bogott)
[05:55:13] <icinga-wm>	 PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[05:58:31] <icinga-wm>	 RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:51:12] <icinga-wm>	 PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:53:28] <icinga-wm>	 RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220129T0800)
[08:05:10] <icinga-wm>	 PROBLEM - Check systemd state on mirror1001 is CRITICAL: CRITICAL - degraded: The following units failed: update-openstack-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:31:32] <icinga-wm>	 PROBLEM - SSH on mw2257.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:56:38] <wikibugs>	 (03PS1) 10Majavah: openstack: force start barbican and trove services [puppet] - 10https://gerrit.wikimedia.org/r/758036 (https://phabricator.wikimedia.org/T300254)
[09:32:48] <icinga-wm>	 RECOVERY - SSH on mw2257.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:43:12] <icinga-wm>	 PROBLEM - SSH on mw2258.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:59:58] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[10:00:36] <icinga-wm>	 PROBLEM - Wikidough DoH Check on doh6002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidough
[10:01:34] <icinga-wm>	 PROBLEM - Wikidough DoH Check on doh6001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidough
[10:01:38] <icinga-wm>	 PROBLEM - Wikidough DoT Check on doh6001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidough
[10:01:40] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:02:24] <icinga-wm>	 PROBLEM - Wikidough DoT Check on doh6002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidough
[11:45:50] <icinga-wm>	 RECOVERY - SSH on mw2258.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:51:40] <icinga-wm>	 PROBLEM - High average POST latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[12:54:02] <icinga-wm>	 RECOVERY - High average POST latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=POST
[13:19:05] <wikibugs>	 (03PS2) 10Hashar: zuul: stop keeping reflog on the mergers [puppet] - 10https://gerrit.wikimedia.org/r/757943
[13:19:07] <wikibugs>	 (03PS2) 10Hashar: zuul: prune heads and tags on each fetches [puppet] - 10https://gerrit.wikimedia.org/r/757944 (https://phabricator.wikimedia.org/T220606)
[13:31:25] <wikibugs>	 (03PS1) 10Majavah: beta: READ_NEW for CentralAuth hidden level migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758038 (https://phabricator.wikimedia.org/T289068)
[13:53:12] <hashar>	 !log contint1001 and contint2001 : pruning old reflog from Zuul merger git repositories: `sudo -u zuul find /srv/zuul/git -maxdepth 4 -type d -name .git -print -execdir git reflog expire --all --expire=now \;`
[13:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:19] <wikibugs>	 (03CR) 10Hashar: "They can be cleaned manually on contint1001 and contint2001 using:" [puppet] - 10https://gerrit.wikimedia.org/r/757943 (owner: 10Hashar)
[14:34:14] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, 10netops: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) Thanks @wiki_willy for sharing the link.  It is missing the IDs for the fiber links between switches though, it just has the consol...
[15:18:53] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] beta: READ_NEW for CentralAuth hidden level migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/758038 (https://phabricator.wikimedia.org/T289068) (owner: 10Majavah)
[16:17:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) firing: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[16:20:48] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is CRITICAL: 156 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:22:55] <jinxer-wm>	 (LogstashKafkaConsumerLag) resolved: Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?var-cluster=logging-eqiad - https://alerts.wikimedia.org
[16:27:56] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for jobrunner on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[16:29:00] <RhinosF1>	 That's gonna be wikidata
[16:29:06] <RhinosF1>	 I assume RO due to lag
[16:30:06] <RhinosF1>	 No there codfw
[16:30:08] <RhinosF1>	 Ignore me
[16:30:42] <RhinosF1>	 Also I meant commons
[16:56:36] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
[16:56:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:58:31] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack: force start barbican and trove services [puppet] - 10https://gerrit.wikimedia.org/r/758036 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[17:08:08] <wikibugs>	 (03PS1) 10Majavah: P:openstack: fix novaenv path [puppet] - 10https://gerrit.wikimedia.org/r/758049 (https://phabricator.wikimedia.org/T300254)
[17:22:05] <wikibugs>	 (03PS1) 10Majavah: aptrepo: import hp hwraid packages for bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758050 (https://phabricator.wikimedia.org/T300438)
[17:29:35] <wikibugs>	 (03CR) 10Inductiveload: Wikisource: Increase PDF rendering resolution to 300 dpi (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/757892 (https://phabricator.wikimedia.org/T256959) (owner: 10Inductiveload)
[17:57:43] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.wikimedia.org with OS bullseye
[17:57:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:46] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.reimage for host cloudservices2003-dev.wikimedia.org with OS bullseye
[18:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:05:44] <wikibugs>	 (03PS1) 10Majavah: P:openstack::galera: fix monitoring process name on bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758052 (https://phabricator.wikimedia.org/T300254)
[18:06:49] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33499/console" [puppet] - 10https://gerrit.wikimedia.org/r/758052 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[18:12:07] <wikibugs>	 (03PS1) 10Inductiveload: Make DPI configurable for Ghostscript [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/758053 (https://phabricator.wikimedia.org/T256959)
[18:13:12] <wikibugs>	 (03Abandoned) 10Inductiveload: Wikisource: Increase PDF rendering resolution to 300 dpi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/757892 (https://phabricator.wikimedia.org/T256959) (owner: 10Inductiveload)
[18:25:53] <wikibugs>	 (03PS2) 10Inductiveload: Make DPI configurable for Ghostscript [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/758053 (https://phabricator.wikimedia.org/T256959)
[18:27:04] <wikibugs>	 (03PS3) 10Inductiveload: Make DPI configurable for Ghostscript [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/758053 (https://phabricator.wikimedia.org/T256959)
[18:50:38] <wikibugs>	 10SRE, 10Traffic, 10serviceops: Upgrade envoyproxy to 1.16.2 - https://phabricator.wikimedia.org/T271407 (10Aloist) I suggest to activate configuration option accept_http_10 (bool) Handle incoming HTTP/1.0 and HTTP 0.9 requests. This is off by default, and not fully standards compliant. There is support for...
[19:03:29] <wikibugs>	 (03CR) 10AntiCompositeNumber: [C: 04-1] "Do not directly make changes in this repository, use operations/software/thumbor-plugins" [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/758053 (https://phabricator.wikimedia.org/T256959) (owner: 10Inductiveload)
[19:48:52] <wikibugs>	 10SRE, 10Traffic: Problem loading thumbnail images due to Envoy (426 Upgrade Required) - https://phabricator.wikimedia.org/T300366 (10Aklapper) 05In progress→03Open
[20:25:57] <wikibugs>	 (03PS1) 10Andrew Bogott: pdns::auth::db: support Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758057
[20:29:28] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] pdns::auth::db: support Bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758057 (owner: 10Andrew Bogott)
[21:08:29] <logmsgbot>	 !log andrew@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2003-dev.wikimedia.org with OS bullseye
[21:08:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:21] <wikibugs>	 (03PS1) 10Majavah: pdns: support bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758063 (https://phabricator.wikimedia.org/T300254)
[21:36:54] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33500/console" [puppet] - 10https://gerrit.wikimedia.org/r/758063 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[21:37:47] <wikibugs>	 (03PS2) 10Majavah: pdns: support bullseye [puppet] - 10https://gerrit.wikimedia.org/r/758063 (https://phabricator.wikimedia.org/T300254)
[21:38:58] <wikibugs>	 (03CR) 10Majavah: [V: 03+1] "PCC SUCCESS (DIFF 1 NOOP 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/33501/console" [puppet] - 10https://gerrit.wikimedia.org/r/758063 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[21:42:46] <wikibugs>	 (03PS1) 10Andrew Bogott: pdns-recursor: add another edge case for the socket dir [puppet] - 10https://gerrit.wikimedia.org/r/758064
[21:42:48] <wikibugs>	 (03PS1) 10Andrew Bogott: pdns: on Bullseye, pdns runs as the 'pdns' user; fix conf ownership [puppet] - 10https://gerrit.wikimedia.org/r/758065
[21:43:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] pdns-recursor: add another edge case for the socket dir [puppet] - 10https://gerrit.wikimedia.org/r/758064 (owner: 10Andrew Bogott)
[21:48:01] <wikibugs>	 (03PS2) 10Andrew Bogott: pdns-recursor: add another edge case for the socket dir [puppet] - 10https://gerrit.wikimedia.org/r/758064
[21:48:03] <wikibugs>	 (03PS2) 10Andrew Bogott: pdns: on Bullseye, pdns runs as the 'pdns' user; fix conf ownership [puppet] - 10https://gerrit.wikimedia.org/r/758065
[21:54:01] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] "see also https://gerrit.wikimedia.org/r/c/operations/puppet/+/758063" [puppet] - 10https://gerrit.wikimedia.org/r/758064 (owner: 10Andrew Bogott)
[21:54:30] <wikibugs>	 (03PS3) 10Andrew Bogott: pdns: on Bullseye, pdns runs as the 'pdns' user; fix conf ownership [puppet] - 10https://gerrit.wikimedia.org/r/758065
[21:55:30] <wikibugs>	 (03PS1) 10Majavah: Bare minimun port to Python 3 to support Debian Bullseye [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/758068 (https://phabricator.wikimedia.org/T300254)
[21:56:11] <wikibugs>	 (03CR) 10Andrew Bogott: "Looks right but let's get at least one other pdns users to sign off." [puppet] - 10https://gerrit.wikimedia.org/r/758063 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[21:57:11] <wikibugs>	 (03CR) 10Andrew Bogott: "Dropping in favor of https://gerrit.wikimedia.org/r/c/operations/puppet/+/758063" [puppet] - 10https://gerrit.wikimedia.org/r/758065 (owner: 10Andrew Bogott)
[21:57:29] <wikibugs>	 (03Abandoned) 10Andrew Bogott: pdns: on Bullseye, pdns runs as the 'pdns' user; fix conf ownership [puppet] - 10https://gerrit.wikimedia.org/r/758065 (owner: 10Andrew Bogott)
[21:58:07] <wikibugs>	 (03Abandoned) 10Andrew Bogott: pdns-recursor: add another edge case for the socket dir [puppet] - 10https://gerrit.wikimedia.org/r/758064 (owner: 10Andrew Bogott)
[21:58:44] <wikibugs>	 (03PS2) 10Majavah: Bare minimun port to Python 3 to support Debian Bullseye [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/758068 (https://phabricator.wikimedia.org/T300254)
[22:05:24] <wikibugs>	 (03CR) 10Hashar: "Lintian has some issues :)" [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/758068 (https://phabricator.wikimedia.org/T300254) (owner: 10Majavah)
[22:05:30] <wikibugs>	 (03PS3) 10Majavah: Bare minimun port to Python 3 to support Debian Bullseye [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/758068 (https://phabricator.wikimedia.org/T300254)
[22:05:48] <hashar>	 taavi: the debian-glue CI job does give some hints here and there :)
[22:05:54] <hashar>	 looks like it almost work for that repo
[22:06:14] <taavi>	 hashar: yeah I know, trying to fix them is hard when you get a different output locally
[22:06:24] <wikibugs>	 (03PS4) 10Majavah: Bare minimun port to Python 3 to support Debian Bullseye [debs/prometheus-pdns-rec-exporter] - 10https://gerrit.wikimedia.org/r/758068 (https://phabricator.wikimedia.org/T300254)
[22:07:43] <taavi>	 finally
[22:08:17] <hashar>	 taavi: I guess we can make that job voting as a result :]
[22:46:44] <wikibugs>	 10SRE-OnFire (FY2021/2022-Q3): incidents occurring during Q2 have been scored with the scorecard - https://phabricator.wikimedia.org/T292254 (10lmata)
[22:59:06] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27
[23:05:54] <icinga-wm>	 PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27
[23:14:58] <icinga-wm>	 RECOVERY - k8s API server requests latencies on kubestagemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27
[23:26:34] <icinga-wm>	 PROBLEM - Disk space on centrallog1001 is CRITICAL: DISK CRITICAL - free space: /srv 34394 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=centrallog1001&var-datasource=eqiad+prometheus/ops
[23:36:46] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase2009 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[23:43:28] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase2009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase