[00:22:54] <icinga-wm>	 PROBLEM - WDQS high update lag on wdqs1004 is CRITICAL: 6.105e+07 ge 4.32e+07 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[00:40:20] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Infrastructure-Foundations, 10Release-Engineering-Team, 10Scap: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10AlexisJazz) >>! In T296125#7518351, @LucasWerkmeister wrote: > Web reques...
[00:47:38] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:51:50] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:54:00] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:12:40] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1004 is OK: (C)4.32e+07 ge (W)2.16e+07 ge 2.092e+07 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[01:42:18] <icinga-wm>	 RECOVERY - Maps tiles generation on alert1001 is OK: OK: Less than 90.00% under the threshold [10.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1
[02:29:26] <icinga-wm>	 PROBLEM - Check systemd state on dumpsdata1003 is CRITICAL: CRITICAL - degraded: The following units failed: cleanup_tmpdumps.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[04:38:08] <icinga-wm>	 RECOVERY - ElasticSearch shard size check - 9200 on logstash1035 is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed
[04:52:02] <wikibugs>	 10SRE, 10Platform Engineering, 10Traffic, 10Patch-For-Review, 10Wikimedia-production-error: Wikimedia\Assert\PostconditionException: Postcondition failed: makeTitleSafe() should always return a Title for the text returned by getRootText(). - https://phabricator.wikimedia.org/T290194 (10MdsShakil) @Umheri...
[05:13:20] <Amir1>	 !log end of djvu metadata maint script run (T275268)
[05:13:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:13:24] <stashbot>	 T275268: Address "image" table capacity problems by storing pdf/djvu text outside file metadata - https://phabricator.wikimedia.org/T275268
[05:22:35] <Amir1>	 !log running clean up of djvu files in all wikis (T275268)
[05:22:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:22:39] <stashbot>	 T275268: Address "image" table capacity problems by storing pdf/djvu text outside file metadata - https://phabricator.wikimedia.org/T275268
[06:02:13] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[06:08:30] <icinga-wm>	 PROBLEM - mailman3_queue_size on lists1001 is CRITICAL: CRITICAL: 1 mailman3 queues above limits: bounces is 504 (limit: 25) https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3
[06:12:38] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[06:23:02] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[06:25:02] <icinga-wm>	 PROBLEM - Host ores1003 is DOWN: PING CRITICAL - Packet loss = 100%
[06:25:38] <icinga-wm>	 RECOVERY - Host ores1003 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[06:33:54] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[06:44:44] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[06:47:02] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:51:14] <icinga-wm>	 RECOVERY - mailman3_queue_size on lists1001 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3
[06:53:30] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:04:24] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:15:18] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[07:26:04] <XioNoX>	 !log cr1-eqiad# deactivate protocols bgp group Confed_eqord
[07:26:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:26:10] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:37:06] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:41:06] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/740306 (https://phabricator.wikimedia.org/T295234) (owner: 10Majavah)
[07:45:50] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[07:46:23] <wikibugs>	 (03PS1) 10David Caro: Revert "dynamicproxy: add keystone token verification" [puppet] - 10https://gerrit.wikimedia.org/r/740319 (https://phabricator.wikimedia.org/T296144)
[07:47:00] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] Revert "dynamicproxy: add keystone token verification" [puppet] - 10https://gerrit.wikimedia.org/r/740319 (https://phabricator.wikimedia.org/T296144) (owner: 10David Caro)
[07:49:32] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] Revert "dynamicproxy: add keystone token verification" [puppet] - 10https://gerrit.wikimedia.org/r/740319 (https://phabricator.wikimedia.org/T296144) (owner: 10David Caro)
[07:56:42] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[08:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211121T0800)
[08:01:04] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 1 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[09:50:36] <wikibugs>	 (03PS1) 10Majavah: kubeadm: Update kube-state-metrics to 2.2.4 [puppet] - 10https://gerrit.wikimedia.org/r/740323 (https://phabricator.wikimedia.org/T295190)
[11:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[12:46:09] <wikibugs>	 (03CR) 10Tacsipacsi: wikireplicas: add Translate extension tables (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735088 (https://phabricator.wikimedia.org/T289952) (owner: 10AntiCompositeNumber)
[13:17:07] <dcausse>	 !log restarting blazegraph on wdqs1007 (jvm stuck for 10h)
[13:17:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:18] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) is CRITICAL: Test bad URL returned the unexpected status 503 (expecting: 404): /api (Zotero and citoid alive) is CRITICAL: Test Zotero and citoid alive returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[13:37:28] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[14:19:22] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Daimona - https://phabricator.wikimedia.org/T295993 (10Dzahn) @Daimona ACK, understood! We will just move you from nda to wmf then with the existing account. Could you change the email address on your Wikitech account (https://wikitech.wikimedia.org/wiki/S...
[14:45:40] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /_info (retrieve service info) is CRITICAL: Test retrieve service info returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
[14:47:44] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[14:55:14] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[15:09:58] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[15:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[15:20:54] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[15:38:26] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[15:47:10] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[15:58:10] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[16:00:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[16:04:46] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[16:06:54] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[16:23:36] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[16:31:48] <icinga-wm>	 PROBLEM - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is CRITICAL: 10 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[17:21:20] <icinga-wm>	 RECOVERY - Check for VMs leaked by the nova-fullstack test on cloudcontrol1003 is OK: 0 instances in the admin-monitoring project https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Check_for_VMs_leaked_by_the_nova-fullstack_test
[18:21:09] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for Daimona - https://phabricator.wikimedia.org/T295993 (10Daimona) >>! In T295993#7518844, @Dzahn wrote: > @Daimona ACK, understood! We will just move you from nda to wmf then with the existing account. Could you change the email address on your Wikitech acco...
[19:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[19:18:42] <icinga-wm>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.198 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:21:38] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s8 on db1171 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1278.58 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[19:22:42] <icinga-wm>	 PROBLEM - Check systemd state on webperf1002 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:22:58] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s1 on db2141 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1369.23 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[19:23:32] <icinga-wm>	 RECOVERY - BGP status on cr2-eqord is OK: Use of uninitialized value duration in numeric gt () at /usr/lib/nagios/plugins/check_bgp line 323. https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:27:47] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: No response from remote host 208.80.154.198 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:27:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:27:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:29:34] <icinga-wm>	 PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - No response from remote host 208.80.154.198 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:31:28] <icinga-wm>	 RECOVERY - Check systemd state on webperf1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:34:46] <jinxer-wm>	 (Primary inbound port utilisation over 80%  #page) firing: Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:34:46] <jinxer-wm>	 (Primary inbound port utilisation over 80%  #page) firing: Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:38:46] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: No response from remote host 208.80.154.198 for 1.3.6.1.2.1.2.2.1.7 with snmp version 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:41:02] <icinga-wm>	 RECOVERY - BGP status on cr2-eqord is OK: Use of uninitialized value duration in numeric gt () at /usr/lib/nagios/plugins/check_bgp line 323. https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:43:02] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 45, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[19:43:40] <wikibugs>	 (03PS1) 10CDanis: add `^Wget/` to bad UAs [puppet] - 10https://gerrit.wikimedia.org/r/740345
[19:44:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] add `^Wget/` to bad UAs [puppet] - 10https://gerrit.wikimedia.org/r/740345 (owner: 10CDanis)
[19:44:32] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] add `^Wget/` to bad UAs [puppet] - 10https://gerrit.wikimedia.org/r/740345 (owner: 10CDanis)
[19:44:37] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] add `^Wget/` to bad UAs [puppet] - 10https://gerrit.wikimedia.org/r/740345 (owner: 10CDanis)
[19:47:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: (2) Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:47:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) firing: (2) Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:52:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: (2) Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:52:49] <jinxer-wm>	 (Primary outbound port utilisation over 80%  #page) resolved: (2) Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:54:46] <jinxer-wm>	 (Primary inbound port utilisation over 80%  #page) resolved: Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[19:54:46] <jinxer-wm>	 (Primary inbound port utilisation over 80%  #page) resolved: Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org
[20:08:26] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on labstore1006 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[20:39:32] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on labstore1006 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[21:04:50] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.14 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[21:10:12] <icinga-wm>	 PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:28:16] <icinga-wm>	 RECOVERY - MariaDB Replica Lag: s8 on db1171 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[22:11:16] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:22:15] <wikibugs>	 (03PS1) 10Urbanecm: [DNM] snapshot: Dump information about Growth mentorship [puppet] - 10https://gerrit.wikimedia.org/r/740371 (https://phabricator.wikimedia.org/T291966)
[22:23:03] <wikibugs>	 (03CR) 10Urbanecm: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/740371 (https://phabricator.wikimedia.org/T291966) (owner: 10Urbanecm)
[22:23:37] <wikibugs>	 (03PS2) 10Urbanecm: [DNM] snapshot: Dump information about Growth mentorship [puppet] - 10https://gerrit.wikimedia.org/r/740371 (https://phabricator.wikimedia.org/T291966)
[22:26:01] <wikibugs>	 (03PS3) 10Urbanecm: [DNM] snapshot: Dump information about Growth mentorship [puppet] - 10https://gerrit.wikimedia.org/r/740371 (https://phabricator.wikimedia.org/T291966)
[22:53:44] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[23:12:42] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.70 ms
[23:17:16] <jinxer-wm>	 (CirrusSearchJVMGCOldPoolFlatlined) firing: (2) Elasticsearch instance elastic2044-production-search-codfw is showing memory pressure in the old pool - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell  - https://alerts.wikimedia.org
[23:32:02] <wikibugs>	 (03PS1) 10Samtar: planet: Add TheresNoTime's blog to en [puppet] - 10https://gerrit.wikimedia.org/r/740376
[23:32:43] <tn>	 \o/
[23:59:38] <wikibugs>	 (03PS2) 10MacFan4000: ExtensionDistributor: 1.37.0 is out now, so there's no beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/739861 (https://phabricator.wikimedia.org/T289585) (owner: 10Jforrester)