[00:05:20] PROBLEM - Check systemd state on puppetmaster2001 is CRITICAL: CRITICAL - degraded: The following units failed: logrotate.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:23:28] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:40:42] RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [00:41:10] PROBLEM - MegaRAID on db1126 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:41:11] ACKNOWLEDGEMENT - MegaRAID on db1126 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T292325 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [00:41:14] 10SRE, 10ops-eqiad: Degraded RAID on db1126 - https://phabricator.wikimedia.org/T292325 (10ops-monitoring-bot) [01:38:07] 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: Q1:(Need By: TBD) rack/setup (4) fundraising hosts - https://phabricator.wikimedia.org/T289812 (10Jclark-ctr) [01:39:03] 10SRE, 10ops-eqiad, 10DC-Ops, 10fundraising-tech-ops: Q1:(Need By: TBD) rack/setup (4) fundraising hosts - https://phabricator.wikimedia.org/T289812 (10Jclark-ctr) host racked in fundraising will finish cabling Monday [01:39:46] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10Jclark-ctr) [01:40:38] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Jclark-ctr) [01:41:36] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q1:(Need By: TBD) rack/setup/install cloudmetrics100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T289888 (10Jclark-ctr) [03:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [03:11:48] PROBLEM - SSH on ms-fe2006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:08:52] PROBLEM - SSH on bast5001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:12:46] RECOVERY - SSH on ms-fe2006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:44:58] PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:45:56] RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:10:58] RECOVERY - SSH on bast5001.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:25:16] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [06:37:22] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [07:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [07:25:36] PROBLEM - MegaRAID on backup1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [07:25:37] ACKNOWLEDGEMENT - MegaRAID on backup1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T292329 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [07:25:41] 10SRE, 10ops-eqiad: Degraded RAID on backup1002 - https://phabricator.wikimedia.org/T292329 (10ops-monitoring-bot) [08:18:42] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [08:36:58] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [09:22:07] 10SRE, 10Acme-chief, 10Traffic-Icebox, 10Patch-For-Review: Let's Encrypt transitioning to ISRG's Root - https://phabricator.wikimedia.org/T263006 (10Tgr) [09:40:18] 10SRE, 10ops-eqiad, 10Data-Persistence-Backup: Degraded RAID on backup1002 - https://phabricator.wikimedia.org/T292329 (10Peachey88) [11:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [12:30:18] PROBLEM - Wikitech and wt-static content in sync on labweb1002 is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (214018s 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [13:15:12] That's not good [13:50:56] PROBLEM - Wikitech and wt-static content in sync on cloudweb2001-dev is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (219461s 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [13:50:56] PROBLEM - Wikitech and wt-static content in sync on labweb1001 is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (219461s 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [14:10:12] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10RhinosF1) [14:10:22] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10RhinosF1) p:05Triage→03High [14:55:28] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10Reedy) >[14:50:56] PROBLEM - Wikitech and wt-static content in sync on cloudweb2001-dev is CRITICAL: wikitech-static CRIT - wikitech and wikitec... [15:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [15:22:52] (03CR) 10Iflaq: [C: 03+1] Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/720320 (https://phabricator.wikimedia.org/T289752) (owner: 10Rishabhbhat) [15:26:12] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [15:38:24] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [15:40:32] (03PS1) 10BryanDavis: toolhub: Set CronJob's backoffLimit back to 1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/725428 (https://phabricator.wikimedia.org/T292027) [15:52:37] (03CR) 10BryanDavis: [C: 03+2] toolhub: Set CronJob's backoffLimit back to 1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/725428 (https://phabricator.wikimedia.org/T292027) (owner: 10BryanDavis) [15:56:54] (03Merged) 10jenkins-bot: toolhub: Set CronJob's backoffLimit back to 1 [deployment-charts] - 10https://gerrit.wikimedia.org/r/725428 (https://phabricator.wikimedia.org/T292027) (owner: 10BryanDavis) [16:05:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [16:10:55] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [16:11:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:11:37] (03CR) 10Ahmon Dancy: [C: 03+2] train-dev: Add missing service configuration [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/725391 (owner: 10Dduvall) [16:12:22] (03Merged) 10jenkins-bot: train-dev: Add missing service configuration [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/725391 (owner: 10Dduvall) [16:19:49] (03CR) 10Ahmon Dancy: train-dev: Remove hardcoding of datacenters in redis configuration (031 comment) [mediawiki-config] (train-dev) - 10https://gerrit.wikimedia.org/r/725392 (owner: 10Dduvall) [17:18:08] (03PS1) 10BryanDavis: toolhub: Set concurrencyPolicy=Replace for CronJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/725430 (https://phabricator.wikimedia.org/T292027) [17:23:22] (03CR) 10BryanDavis: [C: 03+2] toolhub: Set concurrencyPolicy=Replace for CronJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/725430 (https://phabricator.wikimedia.org/T292027) (owner: 10BryanDavis) [17:26:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [17:27:37] (03Merged) 10jenkins-bot: toolhub: Set concurrencyPolicy=Replace for CronJob [deployment-charts] - 10https://gerrit.wikimedia.org/r/725430 (https://phabricator.wikimedia.org/T292027) (owner: 10BryanDavis) [17:28:51] !log bd808@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . [17:28:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:39] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10Reedy) It seems to run and update manually just fine [17:46:48] Reedy: should icinga go green then? [17:46:56] Presumably [17:46:59] I dunno how often it checks [17:47:32] I guess we wait [17:47:46] Would be good if we knew why it broke though [17:48:22] * Reedy force checks some stuff [17:48:30] Oh, I can't [17:48:38] it just pretends I can, until I get to the step that I can't [17:49:05] Next Scheduled Active Check: 2021-10-02 17:51:58 [17:49:12] Hmm, should fix itself in a couple of mins [17:50:18] Scripts that do that are fun [17:51:55] Looks like it's SSL fallout [17:52:04] Or, was [17:52:05] Oct 2 04:00:02 wikitech-static import-wikitech.sh[9981]: ERROR: The certificate of ‘wikitech.wikimedia.org’ is not trusted. [17:52:05] Oct 2 04:00:02 wikitech-static import-wikitech.sh[9981]: ERROR: The certificate of ‘wikitech.wikimedia.org’ has expired. [17:52:11] Hmm [17:52:29] I did the package upgrades earlier [17:52:35] Ah cool [17:53:08] RECOVERY - Wikitech and wt-static content in sync on labweb1001 is OK: wikitech-static OK - wikitech and wikitech-static in sync (66540 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [17:53:08] RECOVERY - Wikitech and wt-static content in sync on cloudweb2001-dev is OK: wikitech-static OK - wikitech and wikitech-static in sync (66540 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [17:53:21] There we go [17:53:33] Go enjoy that thing called a weekend now :) [17:54:03] Have you looked out of a window today? :P [17:54:29] It's liked we flicked a switch into autumn [17:54:35] It's been horrible [17:56:11] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10Reedy) a:03Reedy From cron/syslog: ` Oct 2 04:00:02 wikitech-static import-wikitech.sh[9981]: ERROR: The certificate of ‘wikitech.wikimedia.org’ is not... [17:57:39] 10SRE, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Wikitech and wikitech-static out of sync - https://phabricator.wikimedia.org/T292342 (10Reedy) [17:57:42] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10Reedy) [18:04:05] 10SRE, 10Infrastructure-Foundations, 10Traffic, 10Patch-For-Review: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10Reedy) [18:33:40] RECOVERY - Wikitech and wt-static content in sync on labweb1002 is OK: wikitech-static OK - wikitech and wikitech-static in sync (68393 200000s) https://wikitech.wikimedia.org/wiki/Wikitech-static [18:46:14] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [18:54:42] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:00:03] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir4001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86397 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:00:12] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir2002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86387 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:00:46] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir5001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86355 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:00:58] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir2001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86342 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:00] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir5002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86340 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:18] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir1001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86323 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:36] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir3001 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86305 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:36] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir3002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86304 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:54] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir1002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86286 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:01:54] PROBLEM - HTTPS non-canonical-redirect-3 on ncredir4002 is CRITICAL: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has 86286 seconds left https://wikitech.wikimedia.org/wiki/Ncredir [19:03:12] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:05:20] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [19:18:08] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:20:16] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:26:40] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:29:56] (03PS1) 10Ladsgroup: mailman3: Drop profile::mailman3 [puppet] - 10https://gerrit.wikimedia.org/r/725435 (https://phabricator.wikimedia.org/T282303) [19:35:13] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:43:46] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:43:59] (03PS1) 10Ladsgroup: mailman: Drop mailman module and move them to profile::lists [puppet] - 10https://gerrit.wikimedia.org/r/725436 (https://phabricator.wikimedia.org/T282303) [19:46:08] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/725436 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [19:50:08] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:51:03] (03CR) 10Ladsgroup: "PCC is basically noop: https://puppet-compiler.wmflabs.org/compiler1001/1001/lists1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/725436 (https://phabricator.wikimedia.org/T282303) (owner: 10Ladsgroup) [19:58:36] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:05:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [20:13:32] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:21:44] PROBLEM - Elevated latency for icinga checks in codfw on alert1001 is CRITICAL: cluster=alerting instance=alert2001 job=icinga site=codfw https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [20:23:30] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:25:30] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:35:30] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:39:43] RECOVERY - Elevated latency for icinga checks in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/rsCfQfuZz/icinga [20:43:32] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:49:33] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:53:33] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [20:59:33] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:03:36] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:09:38] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:13:52] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:20:14] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:26:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [21:28:42] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:35:08] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [21:43:46] (03PS17) 10Juan90264: Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) [21:44:53] (03CR) 10jerkins-bot: [V: 04-1] Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) (owner: 10Juan90264) [21:55:01] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [21:56:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [21:57:07] (03PS18) 10Juan90264: Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) [21:57:30] (03PS19) 10Juan90264: Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) [21:58:40] (03CR) 10jerkins-bot: [V: 04-1] Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) (owner: 10Juan90264) [22:05:16] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:06:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:07:06] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [22:13:30] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [22:19:54] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [22:26:01] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:26:18] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [22:27:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:35:16] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:37:02] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:42:01] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:46:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [22:51:52] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [22:58:18] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:00:26] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:04:48] PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:06:38] PROBLEM - SSH on cp5006 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:08:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: (2) Blazegraph instance wdqs2001:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [23:12:01] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [23:14:01] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs1009:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [23:16:01] (BlazegraphFreeAllocatorsDecreasingRapidly) resolved: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [23:27:16] (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wdqs2008:9193 is burning free allocators at a very high rate - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Free_allocators_decrease_rapidly - https://alerts.wikimedia.org [23:47:11] (03PS20) 10Juan90264: Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) [23:48:21] (03CR) 10jerkins-bot: [V: 04-1] Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) (owner: 10Juan90264) [23:52:18] (03PS21) 10Juan90264: Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) [23:53:18] (03CR) 10jerkins-bot: [V: 04-1] Adding and use square wordmark for trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/704170 (https://phabricator.wikimedia.org/T286133) (owner: 10Juan90264) [23:53:58] RECOVERY - SSH on cp5006 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [23:57:20] PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.003 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service