[08:09:37] moritzm: Guten Tag, Daniel and you removed `subversion` from the Phabricator hosts but it still used there unfortunately :\ Got filed as https://phabricator.wikimedia.org/T307889 and I have proposed the revert at https://gerrit.wikimedia.org/r/c/operations/puppet/+/789958 [08:25:59] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) Also see {F35119049} [08:27:10] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) @jhathaway wonder if anything may have changed recently [08:47:05] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: decommission atlas-esams - https://phabricator.wikimedia.org/T307026 (10ayounsi) a:03ayounsi [09:05:31] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) I have looked in our logs and the following is an example of what we see on our side ` 2022-05-06 01:57:31 H=mail-lf1-x12b.google.com [2... [09:19:55] topranks, XioNoX: could either of you please redirect ICMP traffic away from ping3002? I need to restart the VM to apply a Ganeti config change [09:20:13] moritzm: sure [09:20:14] moritzm: yep no problem let me have a look [09:20:17] heh [09:20:31] XioNoX: I can take care of it [09:20:37] topranks: ok! thanks [09:23:50] moritzm: ok you should be good to go [09:24:16] thanks, restarting it now [09:33:44] topranks: completed, you can revert the config change [09:35:46] moritzm: ok thanks [13:46:17] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jhathaway) @jbond I can't think of any recent changes that would have introduced this behavior. The boxes were rebooted on Friday to catch the l... [13:47:03] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10herron) Looking at count of log lines matching "BDAT command used when CHUNKING not advertised" on mx1001 this appears to have began on the 5th,... [15:03:40] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10herron) Hi @bcampbell, while SRE is investigating could ITS please open a case with the google postmasters about this issue as well? We have no... [15:03:44] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) demonstrating the we support chunking ` $ telnet -4 mx1001.wikimedia.org 25 Tryin... [15:09:08] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10bcampbell) Thank you @herron and @jbond I'll open a ticket with Google now and keep you updated. [15:11:36] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) My reading of https://seclists.org/oss-sec/2017/q4/324 suggests that if a BDAT command is issued after the mail or RCPT command then exim... [15:16:23] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jbond) >>! In T307873#7914164, @jbond wrote: > My reading of https://seclists.org/oss-sec/2017/q4/324 suggests that if a BDAT command is issued... [15:17:47] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jhathaway) First messages in the logs appeared on May 4th: ` $ zgrep "BDAT command used when CHUNKING not advertised" /var/log/exim4/mainlog.5.... [15:46:41] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10grin) My suggestions: - Please first **remove the google servers from the callout cache**, and you may also consider examining what caused call... [15:57:31] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10herron) p:05High→03Medium `'chunking_advertise_hosts ='` (disabling chunking) has been applied to both MXes and we have not seen this error... [16:01:38] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: sre.ganeti.reboot-vm cookbook should re-enable Puppet if it was disabled - https://phabricator.wikimedia.org/T307792 (10ssingh) Thanks Moritz for the patch: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/790266. I tried this with doh6001 for T3... [16:40:58] 10Mail, 10Infrastructure-Foundations, 10SRE: [Urgent] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10bcampbell) I heard back from SADA, our Google vendor. "Hope you're doing well! We are not currently aware of any changes to how Google would be... [17:14:09] 10Mail, 10Infrastructure-Foundations, 10SRE: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10herron) [17:22:00] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10JMeybohm) [18:32:36] lmata: Myself and Arzhel were going to swap on-call shifts. [18:32:43] Are you able to update the sheet? [18:33:04] I'll do the week starting Monday May 30th, and Arzhel is going to cover week starting Mon June 27th [18:33:14] +1 [18:33:27] I will take care of this, thanks 😊 [20:54:15] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001 - https://phabricator.wikimedia.org/T307873 (10jhathaway) >>! In T307873#7914363, @grin wrote: > - Please first **remove the google servers from the callout cache**,...