[09:27:46] brett: happy to do that on monday, public holiday here [11:00:21] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations: sre.ganeti.makevm cook book only allows specifying RAM size in full gigabytes - https://phabricator.wikimedia.org/T230712 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff sre.ganeti.makevm now supports fractions of gigabytes. [11:18:51] (ProbeDown) firing: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:48:51] (ProbeDown) resolved: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [11:52:00] Hey I/F team, could you please take a look at the tab https://office.wikimedia.org/wiki/Technology/SRE/Infrastructure_Foundations/Team and let me know what do you think? [12:12:17] looks good. I hadn't filled out the spreadsheet yet, does that mean I should still I update the spreadsheet for now and directly in the office wiki page? [12:13:05] Neat, I filled out the spreadsheet, but it didn't get included :-( [12:13:19] Do you need us to just update the Wiki-page? [12:13:51] (ProbeDown) firing: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#idm2001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:23:19] Uhm, Puppet felt like stopping uwsgi on idm2001... I may have configured something wrong. [12:25:50] moritz you can do it directly in the office wiki page [12:26:24] slyngs I'll do it myself, no worries [12:26:34] Thanks [12:39:36] ack [12:39:59] (PuppetDisabled) firing: Puppet disabled on puppetmaster2004:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [12:56:16] 10netops, 10Infrastructure-Foundations: Users management on SONiC - https://phabricator.wikimedia.org/T338028 (10ayounsi) [13:01:33] 10netops, 10Infrastructure-Foundations: Users management on SONiC - https://phabricator.wikimedia.org/T338028 (10ayounsi) [13:03:03] 10netops, 10Infrastructure-Foundations: Users management on SONiC - https://phabricator.wikimedia.org/T338028 (10ayounsi) [13:43:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Codfw:row A/B: rack/cable new switches - https://phabricator.wikimedia.org/T332180 (10Papaul) [13:45:46] I think Kerberos and LDAP should also be listed in the team page (given that it's two things we manage which are heavily used by other teams) [14:23:16] LDAP at least was in the spreadsheet [14:28:43] jobo / moritzm I've add LDAP to the Wiki page [15:19:10] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10aborrero) >>! In T336587#8897657, @Andrew wrote: > I haven't dug much, but de... [16:07:15] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10Aklapper) [16:16:13] Tanks [16:16:18] *Thanks [16:27:22] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [16:27:57] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10aborrero) [16:39:59] (PuppetDisabled) firing: Puppet disabled on puppetmaster2004:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [16:49:32] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) @RoySmith definitely concerning, I asked Bishonen to send me a test email, so I can look through our email logs. [16:49:46] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) a:03jhathaway [16:50:27] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jrbs) Just had a look and this did indeed go to spam. I have pulled the email out and can allowlist Bish. [16:52:25] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jrbs) [17:08:56] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) @jrbs can you forward me a copy of the original headers, please remove the body of the message, I would like to investig... [17:40:08] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jrbs) >>! In T338032#8899252, @jhathaway wrote: > @jrbs can you forward me a copy of the original headers, please remove the body o... [18:06:42] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Papaul) @Volans i tested the cookbook today on ssw1-a1 the switch did recevied dhcp but was not able to fetch the config file from the tftp serv... [18:12:18] hi folks, posting for awareness [18:12:28] seems like one of the eqiad<->codfw links is down [18:12:31] 208.80.154.214 Down xe-1/1/1:1.0 0.000 2.000 3 [18:21:10] 14:19:45 <+icinga-wm> RECOVERY - BFD status on cr2-codfw is OK: UP: 16 AdminDown: 0 Down: 0 [18:21:15] recovered, nothing to see here :) [18:40:55] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) @jrbs there are no spam scores in the headers, which seems strange, did gmail indicate why it was marked as spam? [18:41:31] sukhe: ty :) just one of the links going down is generally fine, btw [18:41:57] yeah! so I was told but I thought just in case! [18:42:01] totally, thanks for noting [18:42:11] actually, looking at it, for the eqiad<>codfw case I think we can lose any three links without impact [18:43:35] sukhe: was it the Lumen link? we got email about that at the time [18:44:51] oh ok great then! [18:44:55] let me see [18:45:47] yep [18:51:50] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) @jrbs do you have any gmail specific spam settings for the emergency email address? [19:06:30] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jrbs) >>! In T338032#8899554, @jhathaway wrote: > @jrbs there are no spam scores in the headers, which seems strange, did gmail ind... [19:29:31] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10Dzahn) > I think the best team to ask would be ITS. T&S operates the emergency mailbox but the backend is the domain of ITS. Yes,... [19:32:18] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10Barkeep49) Just to add background that I know Joe knows but others may not, there was a period of time around Dec/Jan where I was a... [19:35:51] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10Dzahn) One other option I see would be to ask for a redirect from emergency@ to emergency@lists, request a mailman list and handle... [20:03:23] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jhathaway) Thanks @Dzahn, I'll open a ticket with ITS to investigate. [20:39:59] (PuppetDisabled) firing: Puppet disabled on puppetmaster2004:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [21:00:23] 10Mail, 10Infrastructure-Foundations, 10Trust-and-Safety: Mail from Bishzilla to emergency@wikimedia.org is possibly getting lost - https://phabricator.wikimedia.org/T338032 (10jrbs) Just to give some additional insight: Emails sent to emergency@ are routed into our Zendesk system and, from there, to our Pa...