[00:08:57] (03PS2) 10Dzahn: puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) [00:09:53] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [00:12:18] (03PS3) 10Dzahn: puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) [00:13:13] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [00:17:54] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Krinkle) [00:18:05] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Krinkle) >>! In T285232#7355412, @Joe wrote: > As it stands, my new configuration would mean that we're going to iss... [00:21:00] 10SRE, 10Traffic, 10Performance-Team (Radar): Strip new X-Request-Id header from non-debug responses - https://phabricator.wikimedia.org/T283291 (10Krinkle) 05Open→03Resolved a:03Krinkle [00:24:08] (03PS4) 10Dzahn: puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) [00:24:58] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [00:31:04] (03PS5) 10Dzahn: puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) [00:31:25] (03CR) 10Dzahn: "the lint check is behaving odd, it says there is a "delta" even though before and after nothing was found" [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [00:31:45] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::geoip: install additional maxmind databases for IP Info [puppet] - 10https://gerrit.wikimedia.org/r/723337 (https://phabricator.wikimedia.org/T288844) (owner: 10Dzahn) [00:53:14] (03PS2) 10Legoktm: Have PdfHandler use Shellbox service on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723050 (https://phabricator.wikimedia.org/T289228) [00:53:16] (03PS2) 10Legoktm: Only set tiff settings when $wmgUsePagedTiffHandler = true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723051 [00:53:18] (03PS2) 10Legoktm: Have PagedTiffHandler use Shellbox service on group0 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723052 (https://phabricator.wikimedia.org/T289228) [00:53:20] (03PS1) 10Legoktm: Configure Timeline like most other extensions (1/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723648 [00:53:22] (03PS1) 10Legoktm: Configure Timeline like most other extensions (2/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723649 [00:53:24] (03PS1) 10Legoktm: Configure Timeline like most other extensions (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723650 [00:53:26] (03PS1) 10Legoktm: Set $wgTimelineFonts and send all Timeline generation to Shellbox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723651 (https://phabricator.wikimedia.org/T289226) [00:53:28] (03PS1) 10Legoktm: Remove obsolete Timeline configuration and fonts submodule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723652 [00:58:53] (03CR) 10Jforrester: Remove obsolete Timeline configuration and fonts submodule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723652 (owner: 10Legoktm) [01:00:04] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723653 [01:01:16] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/715985 (owner: 10PipelineBot) [01:01:18] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/715986 (owner: 10PipelineBot) [01:01:20] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/716470 (owner: 10PipelineBot) [01:01:23] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/716471 (owner: 10PipelineBot) [01:01:25] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/716472 (owner: 10PipelineBot) [01:01:28] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/717644 (owner: 10PipelineBot) [01:01:30] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/717648 (owner: 10PipelineBot) [01:01:33] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/717650 (owner: 10PipelineBot) [01:01:37] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/720976 (owner: 10PipelineBot) [01:01:39] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/720980 (owner: 10PipelineBot) [01:01:41] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/720981 (owner: 10PipelineBot) [01:01:44] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/721624 (owner: 10PipelineBot) [01:01:46] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/721625 (owner: 10PipelineBot) [01:01:48] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/721631 (owner: 10PipelineBot) [01:01:51] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723593 (owner: 10PipelineBot) [01:01:54] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723595 (owner: 10PipelineBot) [01:01:57] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723597 (owner: 10PipelineBot) [01:02:00] (03Abandoned) 10Legoktm: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723598 (owner: 10PipelineBot) [01:02:03] (03Abandoned) 10Legoktm: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723599 (owner: 10PipelineBot) [01:02:08] (03Abandoned) 10Legoktm: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723600 (owner: 10PipelineBot) [01:02:53] legoktm: Should PipelineBot push to an existing ChangeId if there is one for a given chart somehow? [01:03:21] that would be nice [01:03:52] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723654 [01:04:10] I'm somewhat tempted to just turn it off though, most shellbox changes don't require a redeploy [01:05:02] I'll file a Phab task, which is clearly the hard work. ;-) [01:07:29] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723655 [01:12:41] (03CR) 10Legoktm: [C: 03+2] shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723655 (owner: 10PipelineBot) [01:14:40] (03CR) 10Legoktm: shellbox: pipeline bot promote (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/720980 (owner: 10PipelineBot) [01:19:07] (03Merged) 10jenkins-bot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723655 (owner: 10PipelineBot) [01:24:13] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [01:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:27:53] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [01:27:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:29:47] (03Abandoned) 10Legoktm: Disable logging [extensions/GuidedTour] (wmf/1.37.0-wmf.23) - 10https://gerrit.wikimedia.org/r/722395 (https://phabricator.wikimedia.org/T288416) (owner: 10Jforrester) [01:39:41] (03PS1) 10PipelineBot: citoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/723658 [01:58:00] (03PS8) 10Huji: Temporarily disable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) [01:58:05] (03PS9) 10Huji: Temporarily disable article editing by anonymous users on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) [01:58:30] (03CR) 10Huji: Temporarily disable article editing by anonymous users on fawiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/721108 (https://phabricator.wikimedia.org/T291018) (owner: 10Huji) [02:00:03] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [02:00:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:33:01] PROBLEM - Persistent high iowait on labstore1004 is CRITICAL: 13.43 ge 10 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring [02:40:42] RECOVERY - Persistent high iowait on labstore1004 is OK: (C)10 ge (W)5 ge 0.4211 https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring [02:44:52] (03PS1) 10RLazarus: Minimal version of the image catalog [docker-images/imagecatalog] - 10https://gerrit.wikimedia.org/r/723663 (https://phabricator.wikimedia.org/T287130) [05:10:50] PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:13:20] PROBLEM - SSH on analytics1069.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:14:22] RECOVERY - SSH on analytics1069.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:02:59] 10ops-eqiad, 10Analytics: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10elukey) [11:17:47] (03CR) 10MarcoAurelio: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/723689 (https://phabricator.wikimedia.org/T291736) (owner: 10MarcoAurelio) [11:31:20] PROBLEM - MD RAID on sessionstore1003 is CRITICAL: CRITICAL: State: degraded, Active: 4, Working: 4, Failed: 2, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [11:31:21] ACKNOWLEDGEMENT - MD RAID on sessionstore1003 is CRITICAL: CRITICAL: State: degraded, Active: 4, Working: 4, Failed: 2, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T291738 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [11:31:24] 10SRE, 10ops-eqiad: Degraded RAID on sessionstore1003 - https://phabricator.wikimedia.org/T291738 (10ops-monitoring-bot) [11:54:04] (03PS4) 10Lucas Werkmeister: Perform rolling restarts on kubernetes [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/721989 (https://phabricator.wikimedia.org/T290833) [12:10:42] PROBLEM - Device not healthy -SMART- on sessionstore1003 is CRITICAL: cluster=sessionstore device=sdb instance=sessionstore1003 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=sessionstore1003&var-datasource=eqiad+prometheus/ops [13:42:57] (03PS1) 10Ladsgroup: admin: Deprecate mailman-admins group [puppet] - 10https://gerrit.wikimedia.org/r/723673 (https://phabricator.wikimedia.org/T282303) [13:42:59] (03PS1) 10Ladsgroup: mailman: More mailman2 clean ups [puppet] - 10https://gerrit.wikimedia.org/r/723674 (https://phabricator.wikimedia.org/T282303) [14:47:26] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:59:50] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.68 ms [15:53:47] 10SRE, 10MassMessage, 10Wikimedia-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Snaevar) Another example at 7 september 2021, 2 minutes apart: https://commons.wikimedia.org/w/index.php?title=Commons%3AVi... [18:34:16] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [18:36:22] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 3 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [23:41:22] PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [23:43:18] RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 6 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator [23:49:10] 10SRE, 10DBA, 10Traffic, 10User-Ladsgroup, 10Wikimedia-Incident: 2021-09-04 enwiki was down at 10:44 (UTC) - https://phabricator.wikimedia.org/T290379 (10Reedy)