[07:36:29] <marostegui>	 In 25 minutes I will failover m2 master, if you see issues on debmonitor, otrs, recommendationsapi, mwaddlink, xhgui, let me know
[07:58:12] <marostegui>	 Going to start in 2 minutes
[08:01:45] <Emperor>	 godog: I'm trying to tell VO about my mobile, via the "Add Contact Method" UI in my profile. I put "mobile" in the name and type in my cell number, and get "Oh Snap! something went wrong". Is this a known thing, or should I mail victorops-support and ask them?
[08:04:49] <godog>	 Emperor: not known no, seems like a VO problem though, yeah support is generally responsive
[08:05:02] <marostegui>	 m2 failover is done, if you see issues on debmonitor, otrs, recommendationsapi, mwaddlink, xhgui, let me know
[08:05:08] <godog>	 Emperor: let us know how that goes, we can poke support too if needed
[08:07:31] <Emperor>	 ta, have mailed them
[08:48:42] <RhinosF1>	 volans: hi
[08:51:49] <volans>	 hello
[08:54:12] <RhinosF1>	 volans: I replied on patch
[08:54:40] <RhinosF1>	 I assume you'd deploy after MediaWiki lost s10 from existence but I don't really understand how conftool works
[09:02:07] <volans>	 RhinosF1: AFAIK we'll need to revert the process outlined at https://wikitech.wikimedia.org/wiki/Dbctl#Add_new_core_section so that patch will be the last bit, hence it seemed a bit premature. Depending on what the DBs currently serving s10 will do we might need (or not) to remove them too.
[09:03:22] <RhinosF1>	 Manuel will know about instances.yaml I guess and whether dbs need to stay
[09:03:34] <RhinosF1>	 It's very early. I don't think it's been done until after switchback
[09:03:55] <marostegui>	 yeah it won't be done till the switch back, s10 will have no databases
[09:04:54] <RhinosF1>	 marostegui: what will happen to those dbs though?
[09:05:12] <marostegui>	 RhinosF1: s10 only serves labswiki, and labswiki will go to s6
[09:05:15] <marostegui>	 hence s10 will be empty
[09:05:30] <marostegui>	 The servers, will keep running with other databases (as that hardware also serves m5)
[09:05:52] <RhinosF1>	 So we don't need to remove the dbxxxx entries from conftool then ?
[09:06:11] <RhinosF1>	 Or is conftool just MediaWiki
[09:07:01] <marostegui>	 It is just mediawiki
[09:07:04] <marostegui>	 m5 isn't handled by conftool
[09:07:17] <RhinosF1>	 Right so we can remove them from conftool
[09:07:28] <RhinosF1>	 I assume you know which dbs are in s10
[09:07:44] <marostegui>	 RhinosF1: Yes, as I said, labswiki is the only MW database living on s10
[09:07:59] <RhinosF1>	 No which dbxxxx servers
[09:13:10] <marostegui>	 yes
[09:33:08] <RhinosF1>	 We can update instances.yaml then closer to time
[09:33:19] <RhinosF1>	 (I don't have laptop working so can't now)
[09:42:53] <jynus>	 him I filed T288212 about a strand md thing I run into, if someone is more familiar than me with software raid could comment to know if this is something we should worry about or a normal behaviour?
[09:42:53] <stashbot>	 T288212: A few hosts on production with software raid (md) have partitions in resync=PENDING status - https://phabricator.wikimedia.org/T288212
[09:43:06] <jynus>	 *strange
[09:57:30] <kormat>	 jynus: replied on task
[10:00:16] <jynus>	 kormat, thank's that's very useful!
[10:01:54] <jynus>	 now the worry is- is that the case in all hosts (e.g. swap or unused fs) or there could be some recently replaced disks that were not correctly synced?
[10:02:48] <XioNoX>	 We have one of the eqiad-codfw links down, please refrain from doing any out of the ordinary transfer between the two sites until it's fixed. cf. https://phabricator.wikimedia.org/T288218
[10:04:45] <kormat>	 jynus: i just did a survey - every machine that has resync=PENDING also has auto-read-only set
[10:04:59] <jynus>	 cool, then I think I will close it as invalid
[10:05:07] <kormat>	 (this isn't _100%_ conclusive. my check doesn't know if the resync and auto-read-only are for the same array or not, but it's probably pretty close)
[10:05:08] <jynus>	 thanks for the input!
[10:05:12] <kormat>	 np!
[10:05:35] <jynus>	 yeah, but I agree, if it is auto-read-only I agree it is not an issue
[10:06:57] <Emperor>	 +1 to this
[10:08:35] <jynus>	 thanks for the comments, I got a bit worried at first with the scary "no sync" messages
[10:08:56] <jynus>	 specially on backups hosts!
[10:09:04] <jynus>	 thanks to both!
[13:47:41] <btullis>	 Hello. I'd be grateful for any SRE input on this ticket please? Not urgent by any means. It's about whether it's possible to enable lingering for user accounts. https://phabricator.wikimedia.org/T268985#7263287
[13:47:44] <btullis>	 Thanks.
[13:52:22] <dcaro>	 hashar: maybe you know who to ping, I have puppet failing on some VMs under the VPS project 'integration', I can't ssh to them to debug myself, do you know who handles those? I think you created them a long time ago xd
[13:52:48] <volans>	 btullis: you most likely want at least Moritz's input (currently not here, his client quittes 2h ago AFAICT)
[13:56:15] <btullis>	 volans: Thank you. Moritz is already an observer on the ticket, so maybe that's fine. There's no hurry, it's just an open question and I thought I'd just ask anyone who might have an opinion.
[13:58:15] <majavah>	 hashar: side note to d.caros notice, we have a very experimental shared prometheus instance in wmcs which monitors both deployment-prep and integration projects, if you want I can send alerts (mostly instance down and puppet failures at this point) for those projects to #-releng/related mailing lists instead of cloud channels
[13:59:32] <hashar>	 majavah: don't we already got those from WMCS?
[13:59:56] <hashar>	 I do get puppet failure email for the project I am admin for
[14:20:13] <dcaro>	 yep, that alert is a bit special, might get deprecated if we move to alertmanager+prometheus
[14:23:12] <dcaro>	 hashar: should I pass the puppet failing tasks to you then?
[14:23:30] <hashar>	 dcaro: for the integration instances failing puppet:  #continuous-integration-infrastructure
[14:23:34] <hashar>	 will check
[14:23:48] <dcaro>	 yep, thanks!
[14:30:11] <majavah>	 hashar: yes, but my fancy alerts don't rely on the VM having enough resources to send those themselves (which isn't the case surprisingly often), don't take up to 24h for the timed job to notice, plus as a bonus you'll get notifications when there are local patches in the puppet or private repos that have merge conflicts
[14:38:30] <hashar>	 majavah: nice :]
[17:22:38] <Krinkle>	 rzl: effie: drafted a new sync doc with some updates, perhaps we our teams can meet again soon? https://docs.google.com/document/d/1EVBFol4pEE6A_YBw5RW3dH4i8QdpPndQDIHYyshQbmc/edit
[18:39:02] <robh>	   /whois chrisalbon
[18:39:11] <robh>	 bah, meant to pull for idle info, 
[18:55:41] <rzl>	 Krinkle: sure -- e.ffie is on vacation until next week FYI
[18:58:40] <rzl>	 and I'm happy to join that meeting but not sure you need me :) I was working on the on-host tier initially but it's pretty much all hers lately, and legoktm is your better go-to for switchdc/multi-dc
[18:59:07] <rzl>	 so, feel free to include me or not, as you like
[19:05:45] <Krinkle>	 ack, done.
[19:07:18] <Krinkle>	 I've marked you as optional. It's at the edge of your workday (for all of us actually, differnet edges)
[19:13:42] <Krinkle>	 kormat: marostegui: Can you confirm https://phabricator.wikimedia.org/T280599#7264655 from your perspectives?
[19:17:29] <rzl>	 👍
[19:18:01] <rzl>	 (re scheduling - no input from me on parsercache)
[20:00:47] <marostegui>	 Krinkle: we are waiting for them to warm up yes, I think we are going to pool them around 12th August 
[21:13:54] <brennen>	 thinking about gitlab groups.  should the "operations" namespace carry over from gerrit to gitlab?
[21:14:24] <brennen>	 (inclined to assume yes, but yell at me if there are reasons to do something different)
[21:18:11] <legoktm>	 brennen: I think so, assuming that we'll be able to rename/move stuff around once we get settled in
[21:18:52] <legoktm>	 it probably should be renamed to "SRE" and some things like operations/mediawiki-config would no longer fit under that name
[21:19:36] <James_F>	 +1. Maybe call that wikimedia/production-config or something.
[21:20:00] <James_F>	 But top-level namespaces of mediawiki, SRE, analytics, WMCS make sense.
[21:20:55] <brennen>	 ::nod::
[21:21:27] <legoktm>	 it sure would be nice if everything could be imported using the same name as Gerrit first and then moved around to new naming conventions so it would be trivial to map between the two services and rely on gitlab's redirects
[21:22:42] <brennen>	 that's sort of an interesting idea.  i'll have to see if there are any obvious downsides.
[21:22:55] <brennen>	 we'll want some kind of mapping, at any rate.
[21:32:40] <Reedy>	 does the gitlab renaming/redirect work like the github one does?
[21:35:44] <legoktm>	 yeah, that's been my experience on Salsa and gitlab.com
[21:36:36] <legoktm>	 e.g. https://salsa.debian.org/legoktm-guest/zimwriterfs
[21:36:42] <legoktm>	 "Project 'legoktm-guest/zimwriterfs' was moved to 'legoktm/zimwriterfs'. Please update any links and bookmarks that may still have the old path."
[21:37:20] <James_F>	 Neat.
[21:37:48] <legoktm>	 IIRC at the time you couldn't directly transfer a project between users and had to launder it through an organization instead, but that was ~2 years ago
[21:38:04] <James_F>	 There are so many mis-named repos I'm really looking forward to fixing.
[21:38:47] <legoktm>	 poor GeSHi
[21:39:12] <James_F>	 Oh yes, that too, but I was thinkings of the labs/ hierarchy and all the top-level repos which should live under mediawiki/libs/.
[21:39:24] <James_F>	 Or wikimedia/design/wvui.
[21:39:26] <James_F>	 Etc.
[21:39:40] <bd808>	 monorepo. all problems solved
[21:39:52] <James_F>	 bd808: Now you have E_TOOMANYREFERENCES problems.
[21:40:59] <James_F>	 Some of our repos like mediawiki/extensions.git presumably won't migrate, as GitLab doesn't have the auto-sub-module-updater concept?
[21:41:53] <bd808>	 that particular monorepo is so gross anyway I hope nobody will miss it
[21:42:26] <bd808>	 Cnad tried to convince me to love it years ago, but that never took
[21:42:30] <bd808>	 *Chad
[21:42:31] <James_F>	 I quite like it, but I'm odd (in making sets of small changes to dozens of repos, but also generally).
[21:43:06] <bd808>	 useful odd, not scary odd :)
[21:43:28] * James_F grins.
[21:44:40] <legoktm>	 probably we'll need a bot for auto-submodule bumps
[21:44:48] <legoktm>	 our core backport process still relies on it
[21:46:06] <James_F>	 Well, we've now got the pinkunicorn magic.
[21:46:12] <James_F>	 Maybe that'll be sufficient?
[21:46:38] <James_F>	 We just have to switch production to m8s before we switch MW development to GitLab, which seems likely.
[21:47:43] <legoktm>	 That still just includes what's in the wmf.XX branches
[21:47:59] <James_F>	 Oh, you mean for REL1_XX?
[21:48:14] <James_F>	 Yeah, I suppose, unless Reedy really like manual bumps.
[21:53:17] <Reedy>	 It's not a major issue...
[21:53:24] <Reedy>	 git submodule foreach git pull
[21:53:26] <Reedy>	 that sort of thing
[21:53:34] <Reedy>	 (we could create some utility scripts)
[21:55:14] <James_F>	 In some ways it'd be nice if there was a manual part of the update.
[21:55:44] <James_F>	 If you're back-porting something into the tarball you should add a line to the RELEASE-NOTES explaining what changed, so sysadmins aren't totally surprised, etc.
[22:35:53] <legoktm>	 what tools/software are people using to make all the pretty infographics/diagrams on https://wikitech.wikimedia.org/wiki/Infographics ?
[22:48:58] <brennen>	 fwiw it's on my ever-expanding list to figure out the submodule bumping thing: https://phabricator.wikimedia.org/T268283
[23:49:56] <hashar>	 legoktm: I go with Google Drawings which is good enough for small one off graphs
[23:54:24] <hashar>	 at job - 1 I used Microsoft Visio which was quite great
[23:55:00] <hashar>	 I have done a bit of Dia (open source) https://en.wikipedia.org/wiki/Dia_(software)   , but I found it a bit boring to get a nice rendering
[23:57:00] <hashar>	 the graphs created by Timo on that page, I believe he handcrafts them