[00:33:02] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [00:44:38] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [00:51:17] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search: hw troubleshooting: failure to power up for elastic2043.codfw.wmnet - https://phabricator.wikimedia.org/T281327 (10Papaul) Email from Dell below ` After having others look at the logs we have a backplane cable that is having issues. Can you disconnect and... [01:09:58] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [01:55:27] (03PS1) 10Dzahn: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) [01:59:41] (03PS6) 10Dzahn: static-bugzilla: add config to server gzipped HTML and a test file [container/miscweb] - 10https://gerrit.wikimedia.org/r/698079 (https://phabricator.wikimedia.org/T281538) [01:59:43] (03PS2) 10Dzahn: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) [02:01:43] (03PS3) 10Dzahn: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) [02:02:42] (03PS4) 10BBlack: wikimedia.com: facilitate NS changes [dns] - 10https://gerrit.wikimedia.org/r/698525 (https://phabricator.wikimedia.org/T281428) [02:03:57] (03PS1) 10Dzahn: rename test file to correct name, add w3m browser for tests [container/miscweb] - 10https://gerrit.wikimedia.org/r/699320 (https://phabricator.wikimedia.org/T281538) [02:04:29] (03CR) 10Dzahn: [C: 03+2] load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:04:40] (03CR) 10BBlack: [C: 03+2] wikimedia.com: facilitate NS changes [dns] - 10https://gerrit.wikimedia.org/r/698525 (https://phabricator.wikimedia.org/T281428) (owner: 10BBlack) [02:08:05] PROBLEM - SSH on wdqs2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:09:59] (03PS4) 10Dzahn: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) [02:14:13] (03PS7) 10Dzahn: static-bugzilla: add config to serve gzipped HTML and a test file [container/miscweb] - 10https://gerrit.wikimedia.org/r/698079 (https://phabricator.wikimedia.org/T281538) [02:14:28] (03CR) 10Dzahn: [C: 03+2] static-bugzilla: add config to serve gzipped HTML and a test file [container/miscweb] - 10https://gerrit.wikimedia.org/r/698079 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:15:27] (03Merged) 10jenkins-bot: static-bugzilla: add config to serve gzipped HTML and a test file [container/miscweb] - 10https://gerrit.wikimedia.org/r/698079 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:17:19] (03PS5) 10Dzahn: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) [02:18:43] (03CR) 10Dzahn: [C: 03+2] load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:20:13] (03Merged) 10jenkins-bot: load mod_rewrite and mod_headers, add content-encoding config [container/miscweb] - 10https://gerrit.wikimedia.org/r/699319 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:20:45] (03PS2) 10Dzahn: rename test file to correct name, add w3m browser for tests [container/miscweb] - 10https://gerrit.wikimedia.org/r/699320 (https://phabricator.wikimedia.org/T281538) [02:20:56] (03CR) 10Dzahn: [C: 03+2] rename test file to correct name, add w3m browser for tests [container/miscweb] - 10https://gerrit.wikimedia.org/r/699320 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:21:45] (03Merged) 10jenkins-bot: rename test file to correct name, add w3m browser for tests [container/miscweb] - 10https://gerrit.wikimedia.org/r/699320 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [02:29:21] (03PS1) 10Dzahn: DHCP: add doh1001 and doh1002 MAC addresses [puppet] - 10https://gerrit.wikimedia.org/r/699321 (https://phabricator.wikimedia.org/T284348) [02:29:23] (03PS1) 10Dzahn: httpbb: reduce git pull frequence from minute to hour [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) [02:29:25] (03PS1) 10Dzahn: installserver/tftp: install tftp client on tftp servers for debugging [puppet] - 10https://gerrit.wikimedia.org/r/699323 [02:31:53] (03Abandoned) 10Dzahn: DHCP: add doh1001 and doh1002 MAC addresses [puppet] - 10https://gerrit.wikimedia.org/r/699321 (https://phabricator.wikimedia.org/T284348) (owner: 10Dzahn) [04:09:35] RECOVERY - SSH on wdqs2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [04:43:57] (03PS1) 10Marostegui: clouddb1021: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/699327 [04:44:51] (03CR) 10Marostegui: [C: 03+2] clouddb1021: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/699327 (owner: 10Marostegui) [04:58:24] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10Papaul) One other option is to use a bash script and a text file with the IP addresses of the nodes (see below) Note: This was test... [05:24:14] (03PS1) 10Marostegui: regex.yaml: Remove labsdb references [puppet] - 10https://gerrit.wikimedia.org/r/699329 (https://phabricator.wikimedia.org/T282662) [05:47:17] !log run systemctl reset-failed ifup@en5.service on doh1001 - T273026 [05:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:22] T273026: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 [05:47:52] RECOVERY - Check systemd state on doh1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:50:50] (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699290 (https://phabricator.wikimedia.org/T284793) (owner: 10DannyS712) [05:56:35] !log rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run [05:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:57:35] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/699329 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [06:09:00] (03CR) 10Marostegui: [C: 03+2] regex.yaml: Remove labsdb references [puppet] - 10https://gerrit.wikimedia.org/r/699329 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [06:13:08] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mwdebug: add service proxy listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/699243 (owner: 10Giuseppe Lavagetto) [06:15:44] (03Merged) 10jenkins-bot: mwdebug: add service proxy listeners [deployment-charts] - 10https://gerrit.wikimedia.org/r/699243 (owner: 10Giuseppe Lavagetto) [06:15:51] <_joe_> marostegui: oh removing stuff from regex.yaml <3 [06:56:49] !log oblivian@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [06:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210611T0700) [07:03:00] !log jmm@cumin2002 START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet [07:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:16] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 301 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:09:46] !log jmm@cumin2002 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet [07:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:26] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 11 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:16:52] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 249 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:29:28] !log restarting archiva to pick up OpenJDK security updates [07:29:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:45] (03PS1) 10Ema: varnish: remove ats-be migration leftover from varnishttfb [puppet] - 10https://gerrit.wikimedia.org/r/699377 (https://phabricator.wikimedia.org/T241239) [07:32:10] (03CR) 10jerkins-bot: [V: 04-1] varnish: remove ats-be migration leftover from varnishttfb [puppet] - 10https://gerrit.wikimedia.org/r/699377 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema) [07:34:57] (03PS1) 10Muehlenhoff: archiva: Switch to profile::nginx [puppet] - 10https://gerrit.wikimedia.org/r/699378 (https://phabricator.wikimedia.org/T164456) [07:35:19] 10SRE, 10ops-codfw, 10User-fgiunchedi: Degraded RAID on ms-be2038 - https://phabricator.wikimedia.org/T283401 (10fgiunchedi) 05Open→03Resolved Thank you @Papaul I think we're good with this BBU! [07:37:08] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/699378 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [07:40:44] (03PS1) 10Muehlenhoff: archiva: Switch to nginx-light [puppet] - 10https://gerrit.wikimedia.org/r/699379 (https://phabricator.wikimedia.org/T164456) [07:42:18] (03PS2) 10Ema: varnish: remove ats-be migration leftover from varnishttfb [puppet] - 10https://gerrit.wikimedia.org/r/699377 (https://phabricator.wikimedia.org/T241239) [07:43:44] (03CR) 10jerkins-bot: [V: 04-1] varnish: remove ats-be migration leftover from varnishttfb [puppet] - 10https://gerrit.wikimedia.org/r/699377 (https://phabricator.wikimedia.org/T241239) (owner: 10Ema) [07:44:32] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/699379 (https://phabricator.wikimedia.org/T164456) (owner: 10Muehlenhoff) [07:49:28] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 10 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:51:44] (03PS1) 10Elukey: Add support for knative serving [deployment-charts] - 10https://gerrit.wikimedia.org/r/699380 (https://phabricator.wikimedia.org/T278194) [07:53:18] (03Abandoned) 10Abijeet Patro: Rename wgTranslateBlacklist to wgTranslateExclusionList [mediawiki-config] - 10https://gerrit.wikimedia.org/r/676909 (https://phabricator.wikimedia.org/T277965) (owner: 10Abijeet Patro) [07:53:21] (03CR) 10jerkins-bot: [V: 04-1] Add support for knative serving [deployment-charts] - 10https://gerrit.wikimedia.org/r/699380 (https://phabricator.wikimedia.org/T278194) (owner: 10Elukey) [07:56:05] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10ayounsi) p:05Medium→03High There is some kind of missconfig here: asw2-b-eqiad has those pending changes: ` [edit interfaces... [07:57:56] (03PS6) 10Elukey: Add the custom_deploy.d directory with basic Istio config [deployment-charts] - 10https://gerrit.wikimedia.org/r/697938 (https://phabricator.wikimedia.org/T278192) [07:57:58] (03PS2) 10Elukey: Add support for knative serving [deployment-charts] - 10https://gerrit.wikimedia.org/r/699380 (https://phabricator.wikimedia.org/T278194) [08:06:52] (03CR) 10Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1002/29870/" [puppet] - 10https://gerrit.wikimedia.org/r/699150 (https://phabricator.wikimedia.org/T284420) (owner: 10Effie Mouzeli) [08:15:56] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10ayounsi) @RobH what are the conclusions of yesterday's experiment? Is it ok to rollback the network change? @Papaul that looks usef... [08:21:26] (03PS4) 10David Caro: tools: try to alleviate sudo crashing when triggering oom [puppet] - 10https://gerrit.wikimedia.org/r/699216 (https://phabricator.wikimedia.org/T284130) [08:24:33] 10SRE, 10ops-eqiad, 10DC-Ops: Audit down ports - https://phabricator.wikimedia.org/T218751 (10ayounsi) 05Open→03Resolved Looking at the current list they all look like servers being provisioned. So it's fine to close it. Thanks [08:24:37] (03CR) 10David Caro: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/699216 (https://phabricator.wikimedia.org/T284130) (owner: 10David Caro) [08:44:38] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10cmooney) @papaul that looks like a nice approach. One thing we need to consider @ayounsi is that this makes the connection to the i... [08:53:58] (03CR) 10Filippo Giunchedi: logstash: add ecs migration config for sampled webrequest logs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [09:03:49] (03CR) 10Effie Mouzeli: [C: 03+1] [Beta Cluster] mc-labs.php: Enable onHostRoutingPrefix for WAN cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692729 (https://phabricator.wikimedia.org/T264604) (owner: 10Krinkle) [09:11:22] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 56 probes of 704 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:12:05] (03PS1) 10Marostegui: labsdb: Remove some labsdb entries [puppet] - 10https://gerrit.wikimedia.org/r/699386 (https://phabricator.wikimedia.org/T282662) [09:13:58] (03CR) 10Marostegui: [C: 03+2] labsdb: Remove some labsdb entries [puppet] - 10https://gerrit.wikimedia.org/r/699386 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [09:17:10] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 4 probes of 704 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [09:24:22] (03PS1) 10Marostegui: maintain-dbusers.py: Remove labsdb references [puppet] - 10https://gerrit.wikimedia.org/r/699389 (https://phabricator.wikimedia.org/T282662) [09:25:53] (03CR) 10Marostegui: [C: 03+2] maintain-dbusers.py: Remove labsdb references [puppet] - 10https://gerrit.wikimedia.org/r/699389 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [09:28:10] (03CR) 10Hnowlan: Initial image-suggestion-api helm chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/688358 (https://phabricator.wikimedia.org/T281257) (owner: 10Nikki Nikkhoui) [09:31:00] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff Boldly closing this task since reimages of bastions happened with the buster update, please reopen if anyo... [09:31:43] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10MoritzMuehlenhoff) >>! In T243057#7140031, @herron wrote: > The 150G secondary disk has been removed from the prometheus3001 VM. > > Strangely after gnt-instance shutdown... [09:36:19] (03CR) 10Zabe: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699295 (https://phabricator.wikimedia.org/T284384) (owner: 10Acamicamacaraca) [09:46:03] <_joe_> uhm [09:46:11] <_joe_> no patches since 10 minutes [09:46:16] (03PS1) 10Giuseppe Lavagetto: Bump the score image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/699391 [09:46:25] <_joe_> broke the drought :D [09:53:58] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Bump the score image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/699391 (owner: 10Giuseppe Lavagetto) [09:56:21] (03Merged) 10jenkins-bot: Bump the score image version [deployment-charts] - 10https://gerrit.wikimedia.org/r/699391 (owner: 10Giuseppe Lavagetto) [09:58:23] 10SRE: Upgrade eqiad/codfw Ganeti clusters to Buster - https://phabricator.wikimedia.org/T284811 (10MoritzMuehlenhoff) [10:06:39] (03PS1) 10Hashar: puppet-compiler: update instances fqdn [puppet] - 10https://gerrit.wikimedia.org/r/699393 [10:07:21] (03CR) 10Hashar: "The script could not copy the facts to the compilerXXXX instances due to well... obsolete domain name :]" [puppet] - 10https://gerrit.wikimedia.org/r/699393 (owner: 10Hashar) [10:08:18] (03PS1) 10Giuseppe Lavagetto: shellbox: fix the lookup value for the php image [deployment-charts] - 10https://gerrit.wikimedia.org/r/699394 [10:09:10] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] puppet-compiler: update instances fqdn [puppet] - 10https://gerrit.wikimedia.org/r/699393 (owner: 10Hashar) [10:13:47] (03CR) 10Giuseppe Lavagetto: [C: 03+2] shellbox: fix the lookup value for the php image [deployment-charts] - 10https://gerrit.wikimedia.org/r/699394 (owner: 10Giuseppe Lavagetto) [10:16:43] (03Merged) 10jenkins-bot: shellbox: fix the lookup value for the php image [deployment-charts] - 10https://gerrit.wikimedia.org/r/699394 (owner: 10Giuseppe Lavagetto) [10:24:54] (03CR) 10Giuseppe Lavagetto: [C: 03+1] nutcracker::yaml_defs: add nutcracker pools for kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/699150 (https://phabricator.wikimedia.org/T284420) (owner: 10Effie Mouzeli) [10:25:18] (03CR) 10Effie Mouzeli: [C: 03+2] nutcracker::yaml_defs: add nutcracker pools for kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/699150 (https://phabricator.wikimedia.org/T284420) (owner: 10Effie Mouzeli) [10:25:26] (03CR) 10Hashar: "Compiled it against all deployment-prep instances: https://puppet-compiler.wmflabs.org/compiler1003/29872/" [puppet] - 10https://gerrit.wikimedia.org/r/699207 (https://phabricator.wikimedia.org/T100837) (owner: 10Hashar) [10:36:58] (03CR) 10Hashar: [C: 03+1] "I have cherry picked the patch on deployment-puppetmaster04 , ran puppet on deployment-memc08.deployment-prep.eqiad1.wikimedia.cloud" [puppet] - 10https://gerrit.wikimedia.org/r/699207 (https://phabricator.wikimedia.org/T100837) (owner: 10Hashar) [10:39:48] PROBLEM - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [10:44:08] ACKNOWLEDGEMENT - Postgres Replication Lag on maps2007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 236928412784 and 295933 seconds Hnowlan Awaiting resyncing https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [10:46:59] !log oblivian@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' . [10:47:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:46] (03PS3) 10Kormat: mariadb: Automatically manage pt-heartbeat. [puppet] - 10https://gerrit.wikimedia.org/r/699213 [10:53:24] PROBLEM - puppet last run on mx1001 is CRITICAL: CRITICAL: Puppet has been disabled for 604982 seconds, message: debug email accounts - jbond, last run 7 days ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:59:15] (03CR) 10MSantos: [C: 04-1] "One minor nit: The patchset should be comparing with wmf/v0.14.x branch" [software/tegola] (v0.14.x) - 10https://gerrit.wikimedia.org/r/699251 (owner: 10Jgiannelos) [11:02:25] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10jbond) From the very original post regarding a tftp server in general i think option B is the better choice. i'm also thinking we m... [11:13:56] (03CR) 10Jbond: [C: 03+2] puppet-compiler: update instances fqdn [puppet] - 10https://gerrit.wikimedia.org/r/699393 (owner: 10Hashar) [11:14:16] hasharLunch: FYI merged ^^^ [11:54:20] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10MoritzMuehlenhoff) >>! In T283771#7151163, @jbond wrote: > From the very original post regarding a tftp server in general i think op... [12:09:06] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29873/console" [puppet] - 10https://gerrit.wikimedia.org/r/699196 (https://phabricator.wikimedia.org/T278187) (owner: 10Jbond) [12:11:04] (03PS3) 10Ema: varnish: remove ats-be migration leftover from varnishttfb [puppet] - 10https://gerrit.wikimedia.org/r/699377 (https://phabricator.wikimedia.org/T241239) [12:11:42] (03PS3) 10Kormat: mariadb: Promote db1157 as s3 primary [puppet] - 10https://gerrit.wikimedia.org/r/698981 (https://phabricator.wikimedia.org/T284648) [12:19:25] (03CR) 10Jbond: "LGTM Question inline" (031 comment) [debs/wmf-certificates] - 10https://gerrit.wikimedia.org/r/699155 (https://phabricator.wikimedia.org/T284417) (owner: 10Giuseppe Lavagetto) [12:24:40] (03PS1) 10Kormat: Prepare for 0.7.1 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/699407 [12:26:05] jbond: merci beaucoup! [12:27:30] (03CR) 10Kormat: [C: 03+2] Prepare for 0.7.1 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/699407 (owner: 10Kormat) [12:29:49] (03Merged) 10jenkins-bot: Prepare for 0.7.1 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/699407 (owner: 10Kormat) [12:35:18] (03PS3) 10Elukey: Add support for knative serving [deployment-charts] - 10https://gerrit.wikimedia.org/r/699380 (https://phabricator.wikimedia.org/T278194) [12:37:33] (03CR) 10Effie Mouzeli: [C: 03+2] Rename maps-vector-server to tegola-vector-tiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/693917 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [12:38:22] (03PS3) 10Effie Mouzeli: Add tokens and users for tegola-vector-tiles [labs/private] - 10https://gerrit.wikimedia.org/r/693924 (https://phabricator.wikimedia.org/T283159) [12:39:47] (03Merged) 10jenkins-bot: Rename maps-vector-server to tegola-vector-tiles [deployment-charts] - 10https://gerrit.wikimedia.org/r/693917 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [12:41:54] 10SRE, 10MW-on-K8s, 10serviceops: Add all redis and memcached backends to mw on k8s automatically - https://phabricator.wikimedia.org/T284420 (10hashar) > [operations/puppet@production] mcrouter: add mcrouter pools to deployment servers > https://gerrit.wikimedia.org/r/698829 @jijiki , looks like that broke... [12:42:44] (03PS4) 10Effie Mouzeli: Add tokens and users for tegola-vector-tiles [labs/private] - 10https://gerrit.wikimedia.org/r/693924 (https://phabricator.wikimedia.org/T283159) [12:44:56] (03CR) 10Effie Mouzeli: [C: 03+2] Add tokens and users for tegola-vector-tiles [labs/private] - 10https://gerrit.wikimedia.org/r/693924 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [12:45:06] (03CR) 10Effie Mouzeli: [V: 03+2 C: 03+2] Add tokens and users for tegola-vector-tiles [labs/private] - 10https://gerrit.wikimedia.org/r/693924 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [13:35:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json [13:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:06] (03CR) 10JMeybohm: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/699196 (https://phabricator.wikimedia.org/T278187) (owner: 10Jbond) [13:42:00] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:45:46] (03PS1) 10JMeybohm: Replace all consumers of docker-registry credentials with alias [labs/private] - 10https://gerrit.wikimedia.org/r/699414 [13:47:08] (03PS1) 10Ottomata: Make kafka cumin aliases consistent and complete [puppet] - 10https://gerrit.wikimedia.org/r/699415 [13:47:58] (03PS1) 10Phuedx: WIP: vector: Disable highlighting query in search autocomplete [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699416 (https://phabricator.wikimedia.org/T281797) [13:50:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json [13:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:44] (03PS1) 10Ottomata: sre/kafka/* update kafka cluster choices [cookbooks] - 10https://gerrit.wikimedia.org/r/699418 (https://phabricator.wikimedia.org/T279342) [13:51:29] (03PS2) 10Ottomata: sre/kafka/* update kafka cluster choices [cookbooks] - 10https://gerrit.wikimedia.org/r/699418 (https://phabricator.wikimedia.org/T279342) [13:52:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json [13:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:06] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009 [13:53:07] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009 [13:53:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:11] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [13:53:32] !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet [13:53:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:19] (03CR) 10Bstorm: [C: 03+1] "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/699216 (https://phabricator.wikimedia.org/T284130) (owner: 10David Caro) [13:54:46] (03CR) 10jerkins-bot: [V: 04-1] sre/kafka/* update kafka cluster choices [cookbooks] - 10https://gerrit.wikimedia.org/r/699418 (https://phabricator.wikimedia.org/T279342) (owner: 10Ottomata) [13:59:17] (03CR) 10Effie Mouzeli: [C: 03+2] Add tokens and users for tegola-vector-tiles [puppet] - 10https://gerrit.wikimedia.org/r/692669 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [13:59:19] (03PS3) 10Ottomata: sre/kafka/* update kafka cluster choices [cookbooks] - 10https://gerrit.wikimedia.org/r/699418 (https://phabricator.wikimedia.org/T279342) [14:00:51] 10SRE, 10Technical-blog-posts, 10Wikimedia-Mailing-lists: Story idea for Blog: Discovering and fixing CVE-2021-33038 in Mailman3 - https://phabricator.wikimedia.org/T284486 (10srodlund) I haven't heard back from @Legoktm, but @Ladsgroup has had a chance to look it over. Since they are also CCd on the task, I... [14:01:19] (03CR) 10David Caro: [C: 03+2] "> Patch Set 4: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/699216 (https://phabricator.wikimedia.org/T284130) (owner: 10David Caro) [14:03:43] (03PS2) 10Effie Mouzeli: Add a namespace for tegola-vector-tiles service [deployment-charts] - 10https://gerrit.wikimedia.org/r/693927 (https://phabricator.wikimedia.org/T283159) [14:06:41] (03CR) 10JMeybohm: [C: 03+1] Add a namespace for tegola-vector-tiles service [deployment-charts] - 10https://gerrit.wikimedia.org/r/693927 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [14:10:12] (03CR) 10Bstorm: [C: 03+1] "> Patch Set 4: Code-Review+2" [puppet] - 10https://gerrit.wikimedia.org/r/699216 (https://phabricator.wikimedia.org/T284130) (owner: 10David Caro) [14:11:58] 10SRE, 10DC-Ops, 10SRE-tools, 10netops: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10RobH) Ok so in testing yesterday we got the idrac firmware to load over TFTP, but it seems they don't support TFTP for DUP files lik... [14:13:48] (03CR) 10Effie Mouzeli: [C: 03+2] Add a namespace for tegola-vector-tiles service [deployment-charts] - 10https://gerrit.wikimedia.org/r/693927 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [14:15:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json [14:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:51] (03Merged) 10jenkins-bot: Add a namespace for tegola-vector-tiles service [deployment-charts] - 10https://gerrit.wikimedia.org/r/693927 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [14:17:41] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet [14:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:09] !log jiji@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [14:20:10] !log jiji@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [14:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:02] !log jiji@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [14:22:02] !log jiji@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [14:22:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:07] 10SRE, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install pc1011-pc1014 - https://phabricator.wikimedia.org/T282484 (10Marostegui) [14:29:11] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Marostegui) [14:29:12] (03PS5) 10Hnowlan: osm: create missing imposm directories, add mirror support to import [puppet] - 10https://gerrit.wikimedia.org/r/699044 (https://phabricator.wikimedia.org/T269582) [14:30:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json [14:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:29] (03PS1) 10Reedy: Make MediaSearch default search experience for all users [extensions/MediaSearch] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/699297 [14:31:07] !log jiji@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'. [14:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:06] !log jiji@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [14:32:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:12] (03CR) 10Reedy: [C: 03+2] Make MediaSearch default search experience for all users [extensions/MediaSearch] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/699297 (owner: 10Reedy) [14:32:36] (03CR) 10RLazarus: [C: 03+1] httpbb: reduce git pull frequence from minute to hour [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) (owner: 10Dzahn) [14:33:04] !log jiji@deploy1002 helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [14:33:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:27] !log jiji@deploy1002 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [14:33:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:00] !log hnowlan@puppetmaster1001 conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet [14:34:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:36] !log jiji@deploy1002 helmfile [eqiad] START helmfile.d/admin 'apply'. [14:34:39] (03PS1) 10Marostegui: site.pp: Add new parsercache hosts as insetup [puppet] - 10https://gerrit.wikimedia.org/r/699424 (https://phabricator.wikimedia.org/T284825) [14:34:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:57] (03PS1) 10Arlolra: Bump envoy timeout for parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) [14:34:58] !log jiji@deploy1002 helmfile [eqiad] DONE helmfile.d/admin 'apply'. [14:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:06] !log jiji@deploy1002 helmfile [codfw] START helmfile.d/admin 'apply'. [14:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:44] (03CR) 10Marostegui: [C: 03+2] site.pp: Add new parsercache hosts as insetup [puppet] - 10https://gerrit.wikimedia.org/r/699424 (https://phabricator.wikimedia.org/T284825) (owner: 10Marostegui) [14:35:52] !log jiji@deploy1002 helmfile [codfw] DONE helmfile.d/admin 'apply'. [14:35:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:42] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009 [14:36:42] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009 [14:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:09] (03CR) 10Giuseppe Lavagetto: "When you say "VE talks to parsoid directly", can you please explain how that is done? Is restbase just acting as a passthrough? In that ca" [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [14:37:16] (03CR) 10Anne Tomasevich: [C: 03+1] "Thanks, Reedy!" [extensions/MediaSearch] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/699297 (owner: 10Reedy) [14:42:48] (03CR) 10Arlolra: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [14:42:59] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@5d7c993]: (no justification provided) [14:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:05] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s) [14:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:42] !log mbsantos@deploy1002 Started deploy [tilerator/deploy@6bfdab5]: (no justification provided) [14:44:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:47] !log mbsantos@deploy1002 Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s) [14:44:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:01] hnowlan: ^ [14:45:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json [14:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:31] maps2009 is deployed with latest imposm3 work [14:46:19] (03PS1) 10Hashar: mwdeploy user is provided by LDAP on WMCS [puppet] - 10https://gerrit.wikimedia.org/r/699427 (https://phabricator.wikimedia.org/T73480) [14:56:35] (03Merged) 10jenkins-bot: Make MediaSearch default search experience for all users [extensions/MediaSearch] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/699297 (owner: 10Reedy) [14:58:31] (03PS1) 10Cwhite: Add metric group. [software/ecs] - 10https://gerrit.wikimedia.org/r/699428 [15:00:09] (03CR) 10Hashar: "I really don't know what I am doing. I am just hoping that "mediawiki::users::mwdeploy_user_provider" ends up being passed to the "mediaw" [puppet] - 10https://gerrit.wikimedia.org/r/699427 (https://phabricator.wikimedia.org/T73480) (owner: 10Hashar) [15:00:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json [15:00:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:52] !log reedy@deploy1002 Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s) [15:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:32] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [15:07:40] 10SRE, 10Technical-blog-posts, 10Wikimedia-Mailing-lists: Story idea for Blog: Discovering and fixing CVE-2021-33038 in Mailman3 - https://phabricator.wikimedia.org/T284486 (10Ladsgroup) Kunal is on vacation this week. I'm in no position to approve on his behalf but I'll try to reach out to him. [15:11:04] (03PS1) 10Hnowlan: postgres: fix sync bugs in resync_replica script [puppet] - 10https://gerrit.wikimedia.org/r/699430 [15:14:15] (03PS1) 10Effie Mouzeli: mwdebug: include nutcracker and mcrouter pools in values [deployment-charts] - 10https://gerrit.wikimedia.org/r/699432 (https://phabricator.wikimedia.org/T284420) [15:15:15] (03PS3) 10Cwhite: logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) [15:15:21] (03PS1) 10Arlolra: Use restbase-for-services for VE's VirtualRestClient calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699434 (https://phabricator.wikimedia.org/T279825) [15:16:23] (03CR) 10Cwhite: logstash: add ecs migration config for sampled webrequest logs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [15:17:05] (03CR) 10Arlolra: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [15:17:24] (03CR) 10jerkins-bot: [V: 04-1] logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [15:17:51] (03CR) 10jerkins-bot: [V: 04-1] Use restbase-for-services for VE's VirtualRestClient calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699434 (https://phabricator.wikimedia.org/T279825) (owner: 10Arlolra) [15:17:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: Remove non-helm ingress-nginx files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/698588 (https://phabricator.wikimedia.org/T264221) (owner: 10Majavah) [15:18:18] (03CR) 10Cwhite: "CI is going to warn about logstash-filter-verifier due to the missing metric field. The fix is to rebase on top of ECS 1.7.0-4 (I489cc5b6" [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [15:21:27] (03PS1) 10Majavah: toolforge: Remove ingress-jobs [puppet] - 10https://gerrit.wikimedia.org/r/699436 [15:21:39] (03CR) 10Subramanya Sastry: Use restbase-for-services for VE's VirtualRestClient calls (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699434 (https://phabricator.wikimedia.org/T279825) (owner: 10Arlolra) [15:24:06] (03CR) 10Subramanya Sastry: Use restbase-for-services for VE's VirtualRestClient calls (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699434 (https://phabricator.wikimedia.org/T279825) (owner: 10Arlolra) [15:34:20] (03CR) 10Arturo Borrero Gonzalez: "Hold this while I confirm this is actually desirable." [puppet] - 10https://gerrit.wikimedia.org/r/699436 (owner: 10Majavah) [15:41:56] (03CR) 10Mforns: [C: 03+1] Finalize backend EP migration of 4 EL schemas [puppet] - 10https://gerrit.wikimedia.org/r/699002 (https://phabricator.wikimedia.org/T282855) (owner: 10Ottomata) [15:47:45] (03CR) 10Herron: [C: 03+1] Make kafka cumin aliases consistent and complete [puppet] - 10https://gerrit.wikimedia.org/r/699415 (owner: 10Ottomata) [15:50:31] (03CR) 10Herron: [C: 03+1] "super helpful -- thanks!" [cookbooks] - 10https://gerrit.wikimedia.org/r/699418 (https://phabricator.wikimedia.org/T279342) (owner: 10Ottomata) [15:55:35] 10SRE, 10netops: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929 (10cmooney) I agree on option 2 above that it makes sense to assign a /48 for cloud services at each site. Some people these days are assigning a /64 per-VM so we should provide space to cater for potential future cases such as th... [15:59:29] 10SRE, 10Technical-blog-posts, 10Wikimedia-Mailing-lists: Story idea for Blog: Discovering and fixing CVE-2021-33038 in Mailman3 - https://phabricator.wikimedia.org/T284486 (10srodlund) @Ladsgroup. Thanks! I'm going to go ahead and leave it published for now, as there aren't any major changes from the origin... [16:40:40] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004 [16:40:41] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004 [16:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:43] (03PS1) 10Papaul: DHCP: Add MAC Address for pc201[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/699447 (https://phabricator.wikimedia.org/T282482) [17:12:51] (03PS4) 10Elukey: Add support for knative serving [deployment-charts] - 10https://gerrit.wikimedia.org/r/699380 (https://phabricator.wikimedia.org/T278194) [17:20:33] (03CR) 10Arlolra: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [17:36:22] 10SRE, 10Release-Engineering-Team, 10serviceops-radar, 10Release Pipeline (Blubber): build and import blubber package for buster and bullseye (which supports v4) - https://phabricator.wikimedia.org/T283891 (10thcipriani) [17:48:32] RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:00:02] 10SRE, 10LDAP-Access-Requests: Grant Access to for - https://phabricator.wikimedia.org/T284832 (10RhinosF1) [18:01:07] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for WQuarshie - https://phabricator.wikimedia.org/T284832 (10RhinosF1) [18:36:49] (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC Address for pc201[1-4] [puppet] - 10https://gerrit.wikimedia.org/r/699447 (https://phabricator.wikimedia.org/T282482) (owner: 10Papaul) [18:55:45] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10Dzahn) > secondary disk has been removed > Strangely .. network interface was renamed. Interface was ens14 before shutdown, and after rebooting it is ens13. @herron @Mo... [18:58:10] (03PS4) 10Krinkle: wikimedia.org: TXT entry for GitHub domain verified profile [dns] - 10https://gerrit.wikimedia.org/r/661180 (https://phabricator.wikimedia.org/T207364) [18:59:48] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10Dzahn) > The 150G secondary disk has been removed from the prometheus3001 VM. Thank you @herron :) When doing a `gnt-node list` on ganeti3001 I can see more free resourc... [19:00:40] RECOVERY - Check systemd state on cumin2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:00:42] (03PS5) 10Krinkle: wikimedia.org: TXT entry for GitHub domain verified profile [dns] - 10https://gerrit.wikimedia.org/r/661180 (https://phabricator.wikimedia.org/T207364) [19:01:04] (03CR) 10Krinkle: "Updated has since the previous one expired. I got this from https://github.com/orgs/wikimedia/domain/133306/verification_steps." [dns] - 10https://gerrit.wikimedia.org/r/661180 (https://phabricator.wikimedia.org/T207364) (owner: 10Krinkle) [19:10:42] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts: ` pc2011.codfw.wmnet ` The log can be found in `/var/log/wm... [19:11:24] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10herron) That's interesting about the same behavior happening in the opposite direction with a disk add. I guess that makes some sense in a bug-ish kind of way -- network... [19:12:38] 10SRE, 10observability, 10Patch-For-Review: Move Prometheus off eqsin/ulsfo/esams bastions - https://phabricator.wikimedia.org/T243057 (10Dzahn) I agree, pretty sure this only happens when we add/remove disks, never happened randomly on just a reboot to me. [19:25:37] !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE [19:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:40] !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE [19:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:16] (03CR) 10BBlack: [C: 03+2] wikimedia.org: TXT entry for GitHub domain verified profile [dns] - 10https://gerrit.wikimedia.org/r/661180 (https://phabricator.wikimedia.org/T207364) (owner: 10Krinkle) [19:30:09] (03PS1) 10Brennen Bearnes: install-gitlab-server.sh: NOCOWS=1 [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/699463 [19:31:36] (03CR) 10Brennen Bearnes: [V: 03+2 C: 03+2] "Self-merging as reflects currently-used state." [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/699463 (owner: 10Brennen Bearnes) [19:33:59] (03PS1) 10Brennen Bearnes: gitlab_backup_keep_time to 3 days [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/699464 (https://phabricator.wikimedia.org/T274463) [19:36:27] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc2011.codfw.wmnet'] ` and were **ALL** successful. [19:40:52] (03Abandoned) 10Brennen Bearnes: gitlab_sshd_macs: Fix type [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/684434 (owner: 10Jbond) [19:41:49] (03Abandoned) 10Brennen Bearnes: sshd review: do not merge [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/684443 (https://phabricator.wikimedia.org/T276148) (owner: 10Jbond) [19:47:00] PROBLEM - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:47:21] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts: ` pc2012.codfw.wmnet ` The log can be found in `/var/log/wm... [19:56:16] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [20:02:19] !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE [20:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:23] !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE [20:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:06] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc2012.codfw.wmnet'] ` and were **ALL** successful. [20:14:50] PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:44:27] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts: ` pc2013.codfw.wmnet ` The log can be found in `/var/log/wm... [20:46:00] (03CR) 10Brennen Bearnes: [V: 03+2 C: 03+2] "Merging after discussion." [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/699464 (https://phabricator.wikimedia.org/T274463) (owner: 10Brennen Bearnes) [20:47:39] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:51:51] RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 23664 bytes in 0.362 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:59:25] !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE [20:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:33] 10SRE, 10ops-codfw: Degraded RAID on pc2012 - https://phabricator.wikimedia.org/T284845 (10ops-monitoring-bot) [21:01:26] !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE [21:01:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:38] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc2013.codfw.wmnet'] ` and were **ALL** successful. [21:10:59] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts: ` pc2014.codfw.wmnet ` The log can be found in `/var/log/wm... [21:12:13] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [21:25:28] (03PS1) 10Bstorm: paws: add a very simple backup server [puppet] - 10https://gerrit.wikimedia.org/r/699471 (https://phabricator.wikimedia.org/T267683) [21:25:55] !log pt1979@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE [21:25:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:54] !log pt1979@cumin2002 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE [21:27:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:21] (03CR) 10Bstorm: "The one thing that I'm considering I may want to change here: I might want to actually make this a general convenience class for anyone to" [puppet] - 10https://gerrit.wikimedia.org/r/699471 (https://phabricator.wikimedia.org/T267683) (owner: 10Bstorm) [21:37:56] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc2014.codfw.wmnet'] ` and were **ALL** successful. [21:39:45] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) [21:51:38] 10SRE, 10ops-codfw, 10DC-Ops, 10Patch-For-Review: (Need By: TBD) rack/setup/install pc2011-pc2014 - https://phabricator.wikimedia.org/T282482 (10Papaul) 05Open→03Resolved @Marostegui this is complete [21:57:23] (03CR) 10Dzahn: [V: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/29874/install4001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/699323 (owner: 10Dzahn) [21:58:30] (03PS2) 10Dzahn: installserver/tftp: install tftp client on tftp servers for debugging [puppet] - 10https://gerrit.wikimedia.org/r/699323 (https://phabricator.wikimedia.org/T283771) [21:59:29] (03CR) 10Dzahn: [C: 03+1] gitlab_backup_keep_time to 3 days [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/699464 (https://phabricator.wikimedia.org/T274463) (owner: 10Brennen Bearnes) [22:08:14] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for WQuarshie - https://phabricator.wikimedia.org/T284832 (10Volans) 05Open→03Resolved p:05Triage→03Medium a:03Volans `wmf` LDAP group membership added to `uid=wquarshie` after verification of being staff. Resolving. @codebug please re-open in... [22:08:52] !log gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not. [22:08:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:11] (03PS2) 10Dzahn: httpbb: reduce git pull frequency from minute to hour [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) [22:14:35] (03CR) 10Dzahn: [C: 03+2] httpbb: reduce git pull frequency from minute to hour [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) (owner: 10Dzahn) [22:14:52] (03CR) 10AGueyte: [C: 03+1] Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699218 (https://phabricator.wikimedia.org/T283711) (owner: 10Wikitrent) [22:15:02] (03PS3) 10Dzahn: httpbb: reduce git pull frequency from minute to hour [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) [22:18:17] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for WQuarshie - https://phabricator.wikimedia.org/T284832 (10Volans) @codebug last note, please link your wikitech account to your Phabricator account in your profile here on Phabricator. [22:20:00] (03CR) 10Dzahn: "This should be a 98.3333% decrease in the number of pulls from gerrit." [puppet] - 10https://gerrit.wikimedia.org/r/699322 (https://phabricator.wikimedia.org/T260936) (owner: 10Dzahn) [22:23:37] 10SRE, 10SRE-Access-Requests: Requesting access to mwmaint1002 for mepps - https://phabricator.wikimedia.org/T284773 (10Volans) p:05Triage→03Medium @mepps: - Please sign L3 (Wikimedia Server Access Responsibilities document) - Access to production hosts is based on unix groups mapped to clusters and not ma... [22:24:07] 10SRE, 10SRE-Access-Requests: Requesting access to mwmaint1002 for mepps - https://phabricator.wikimedia.org/T284773 (10Volans) [22:28:11] (03CR) 10Dzahn: [C: 03+2] "I had manually installed it on one install server and now puppetizing it because manual installs are bad." [puppet] - 10https://gerrit.wikimedia.org/r/699323 (https://phabricator.wikimedia.org/T283771) (owner: 10Dzahn) [22:28:34] (03PS3) 10Dzahn: installserver/tftp: install tftp client on tftp servers for debugging [puppet] - 10https://gerrit.wikimedia.org/r/699323 (https://phabricator.wikimedia.org/T283771) [22:30:29] 10SRE, 10ops-codfw: Degraded RAID on pc2014 - https://phabricator.wikimedia.org/T284849 (10ops-monitoring-bot) [22:33:11] (03CR) 10Dzahn: "will now be installed on install*" [puppet] - 10https://gerrit.wikimedia.org/r/699323 (https://phabricator.wikimedia.org/T283771) (owner: 10Dzahn) [22:34:22] (03CR) 10Dzahn: "@Jelto I see the gitlab config was changed now to 3 days from 7." [puppet] - 10https://gerrit.wikimedia.org/r/697850 (https://phabricator.wikimedia.org/T274463) (owner: 10Dzahn) [22:36:57] (03Abandoned) 10Dzahn: httpd: add a resursive chmod to ensure log files are group writable [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/691293 (owner: 10Dzahn) [22:38:19] (03Abandoned) 10Dzahn: scap/dsh: add doc1002/doc2001 to ci-docroot hosts [puppet] - 10https://gerrit.wikimedia.org/r/650306 (https://phabricator.wikimedia.org/T247653) (owner: 10Dzahn) [22:40:39] (03Abandoned) 10Dzahn: add discovery-geo name and resources for doc [dns] - 10https://gerrit.wikimedia.org/r/650626 (owner: 10Dzahn) [23:09:50] 10SRE, 10Technical-blog-posts, 10Wikimedia-Mailing-lists: Story idea for Blog: Discovering and fixing CVE-2021-33038 in Mailman3 - https://phabricator.wikimedia.org/T284486 (10Legoktm) Yay, thanks! @srodlund one small thing, could `T281402` in the "Discovery" section link to the T281402 Phabricator task? Oth... [23:26:12] (03PS1) 10Dzahn: Revert "Allow mgmt range to connect to tftp servers." [puppet] - 10https://gerrit.wikimedia.org/r/699301 [23:26:38] (03PS2) 10Dzahn: Revert "Allow mgmt range to connect to tftp servers." [puppet] - 10https://gerrit.wikimedia.org/r/699301 (https://phabricator.wikimedia.org/T283771) [23:27:24] (03PS3) 10Dzahn: Revert "Allow mgmt range to connect to tftp servers." [puppet] - 10https://gerrit.wikimedia.org/r/699301 (https://phabricator.wikimedia.org/T283771) [23:35:08] (03CR) 10Dzahn: [C: 03+2] Revert "Allow mgmt range to connect to tftp servers." [puppet] - 10https://gerrit.wikimedia.org/r/699301 (https://phabricator.wikimedia.org/T283771) (owner: 10Dzahn) [23:37:25] !log removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades [23:37:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:39] 10SRE, 10DC-Ops, 10SRE-tools, 10netops, 10Patch-For-Review: Allow idrac tftp fetching of firmware updates (either to existing tftp or new solution) - https://phabricator.wikimedia.org/T283771 (10Dzahn) I reverted the firewall (ferm) change that allowed mgmt to connect to install since as comments above s... [23:49:57] PROBLEM - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [23:57:51] 10SRE, 10LDAP-Access-Requests: Add Dat Nguyen to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T284285 (10KFrancis) @colewhite Hello, I am confirming the NDA has been fully executed. Please proceed with the access request. Thanks! [23:58:27] 10SRE, 10LDAP-Access-Requests: Add Kara Payne to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T284308 (10KFrancis) @colewhite Hello, I am confirming the NDA has been fully executed. Please proceed with the access request. Thanks!