[00:14:58] <wikibugs>	 10serviceops, 10Performance-Team, 10Code-Health-Objective, 10Platform Team Initiatives (Session Management Service (CDP2)), and 2 others: Determine multi-dc strategy for CentralAuth - https://phabricator.wikimedia.org/T267270 (10tstarling) 05Open→03Resolved
[01:06:26] <wikibugs>	 10serviceops, 10SRE, 10Patch-For-Review, 10User-Joe: Set up A/B testing mechanism for PHP7 - https://phabricator.wikimedia.org/T216676 (10Krinkle)
[01:07:44] <wikibugs>	 10serviceops, 10MediaWiki-General, 10MediaWiki-libs-ObjectCache, 10Performance-Team, 10User-jijiki: Use php-hrtime monotonic clock instead of microtime for perf measure in MW - https://phabricator.wikimedia.org/T245464 (10Krinkle) 05Open→03Declined a:05dpifke→03None In favour of {T271736}
[01:09:24] <wikibugs>	 10serviceops, 10MediaWiki-General, 10MediaWiki-libs-ObjectCache, 10Performance-Team, 10User-jijiki: Use php-hrtime monotonic clock instead of microtime for perf measure in MW - https://phabricator.wikimedia.org/T245464 (10Krinkle) 05Declined→03Open Re-opening for original purpose. The php-hrtime bloc...
[05:29:41] <_joe_>	 jelto: ping me when you're around and we can depool codfw
[07:36:08] <jelto>	 _joe_: I'm here in ~30m. I'll ping you
[07:36:19] <_joe_>	 sure, no rush
[08:01:53] <jelto>	 _joe_ I'm here now and looking at your dc-maint.sh script
[08:06:29] <_joe_>	 jelto: I have a small cosmetic update
[08:09:50] <RhinosF1>	 mw2289 timed out for kart's deploy ongoing. Not sure if known/expected.
[08:12:40] <jelto>	 _joe_ Is there a diff/change for the update? :)
[08:13:08] <_joe_>	 RhinosF1: it should be set to pooled=inactive already let me check
[08:15:10] <RhinosF1>	 Thanks joe!
[08:46:33] <hnowlan>	 _joe_: yep
[08:47:01] <_joe_>	 hnowlan: if you take another look at the etherpad, there's some stuff there
[08:51:44] <hnowlan>	 _joe_: will do 
[08:52:08] <hnowlan>	 I am actually going to be on and off a few times in the afternoon but I can set things up in advance, it should be fine
[09:00:44] <jelto>	 _joe_: whats the plan with depooling codfw? I saw you prepared all of our hosts. Do we wait until restbase, maps and sessionstore are DOWN too?
[09:00:58] <_joe_>	 no
[09:01:01] <_joe_>	 let's depool early
[09:01:42] <_joe_>	 jelto: wait a sec, though, I'll copy a new version of the script with a bit of comments an some cosmetic improvements to the output
[09:01:53] <jelto>	 ack :)
[09:02:06] <_joe_>	 actually, no, let me check one thing, and let's go with the current version
[09:02:17] <jelto>	 works for me too 
[09:02:27] <_joe_>	 before we launch it, we need to check if there are services currently just pooled in codfw
[09:02:45] <_joe_>	 my way of doing that is
[09:05:22] <_joe_>	 for dc in eqiad codfw; do confctl --object-type discovery select "name=$dc' get | grep -F '"pooled": true'  | jq .tags | sort | uniq > $dc.pooled; done
[09:06:45] <_joe_>	 diffing the files, the only thing that's pooled only in codfw is kartotherian
[09:06:51] <_joe_>	 which we have in the exclude list
[09:08:35] <jelto>	 I can confirm that. But what about docker-registry? That service is also pooled in codfw but not in eqiad
[09:08:57] <jelto>	 ah docker-registry is also in the exclude list
[09:09:44] <_joe_>	 yes
[09:13:07] <_joe_>	 so, I think we're good to go
[09:13:12] <_joe_>	 proceed at your convenience
[09:13:20] <_joe_>	 and !log to #operations when you start
[09:13:34] <_joe_>	 I decided against flooding the chat with stuff
[09:15:02] <jelto>	 _joe_: okay. Just to make sure I don't miss anything, it's just /home/oblivian/dc-maint.sh depool codfw ?
[09:15:15] <_joe_>	 jelto: yes
[09:15:52] <_joe_>	 sorry, gods of automation. We'll finish the kubernetes rolling restart cookbook to placate you
[09:18:10] <jelto>	 _joe_ is the cookbook running currently? I did not find anything in SAL
[09:18:41] <_joe_>	 no I mean
[09:18:51] <_joe_>	 we'll finish making it work as it should :P
[09:20:11] <jelto>	 ah you mean the logic of dc-maint.sh should move to a kubernetes cookbook at some point?
[09:30:15] <jelto>	 _joe_: I'm ready. If you are ok I'm going start the script and log in -operations
[09:30:27] <_joe_>	 jelto: +1
[09:38:51] <jelto>	 _joe_: done, script finished
[09:39:21] <_joe_>	 ack, I'll check results in a bit
[09:39:51] <jelto>	 great thanks a lot :) I updated the status in the pad
[09:40:30] <_joe_>	 cool, thanks
[09:51:00] <vgutierrez>	 hi, it looks like mediawiki_access_log.mtail could be impacted by https://phabricator.wikimedia.org/T314922
[09:52:46] <_joe_>	 it would, in theory, but it won't in practice
[09:52:59] <_joe_>	 apache timings are in nanoseconds so we never get a proper 0 value
[09:53:14] <_joe_>	 but yeah, we should probably still add that bucket
[09:57:56] <_joe_>	 jelto: lgtm
[09:59:20] <jelto>	 thanks!
[10:34:53] <hnowlan>	 maps2010 and restbase stuff is down, sessionstore is still up because I want to leave that until closer to the work
[10:35:54] <hnowlan>	 aiui we're still 100% safe to lose a sessionstore host but given that there's one host per rack (hence ownership of 100%) I would like to avoid the exposure to risk for now 
[10:43:55] <_joe_>	 +1
[13:37:06] <sobanski>	 _joe_, jelto: A heads up that Reuven may be late / unavailable today. If that's the case I'll reach out to Daniel.
[14:56:40] <rzl>	 I'm around after all :) appreciate it though
[14:56:45] <rzl>	 catching up now
[16:32:54] <wikibugs>	 10serviceops, 10Gerrit, 10SRE, 10serviceops-collab, and 2 others: replacement for gerrit2001, decom gerrit2001 - https://phabricator.wikimedia.org/T243027 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin2002 for hosts: `gerrit2001.wikimedia.org` - gerrit2001.wikimedia.org (**...
[17:46:21] <wikibugs>	 10serviceops, 10Gerrit, 10SRE, 10serviceops-collab, and 2 others: replacement for gerrit2001, decom gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn)
[17:46:29] <wikibugs>	 10serviceops, 10Gerrit, 10SRE, 10serviceops-collab, and 2 others: replacement for gerrit2001, decom gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) 05In progress→03Resolved gerrit2002 is production https://gerrit-replica.wikimedia.org  gerrit2001 is shut down and fully decom'ed.
[18:31:50] <rzl>	 weird, "Host parse[2019,2020] is not in mediawiki-installation dsh group" are both still crit despite the hosts being repooled
[18:36:34] <mutante>	 rzl: the same thing happened previously with other appservers.. then I told Icinga to reschedule and waited longer and eventually it resolved
[18:36:48] <mutante>	 I think it's just that the check doesnt run very often
[18:37:07] <mutante>	 or it needs puppet run on conf* and alert* both 
[18:37:58] <rzl>	 yeah, rescheduling didn't work but I figured it was something like that puppet run -- I'll check back in a little bit
[18:39:55] <mutante>	 yea, I had the exact same pattern.. "why is it still crit".."why does reschedule still not do it".. then ..it did 
[18:59:21] <wikibugs>	 10serviceops, 10Community-Tech, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Phonos, 10SRE: SRE/Data Persistence consultation — use of FSFileBackend for caching audio files - https://phabricator.wikimedia.org/T314789 (10MusikAnimal) >>! In T314789#8139432, @Legoktm wrote: > I would recommend...
[20:57:41] <wikibugs>	 10serviceops, 10Community-Tech, 10Data-Persistence (Consultation), 10MediaWiki-extensions-Phonos, 10SRE: SRE/Data Persistence consultation — use of FSFileBackend for caching audio files - https://phabricator.wikimedia.org/T314789 (10Legoktm) >>! In T314789#8140256, @TheDJ wrote: > You can use lame and/or...