[09:13:30] fyi, I'm rebooting cr1-magru for software upgrade [09:15:17] ack! [09:24:06] and back up [09:39:16] fabfur: time for cr2's upgrade, ~10min downtime [09:39:48] ack! I don't think anyone is working on magru on our team [09:39:53] (pretty sure) [09:40:22] and it shouldn't impact any work as we have redundancy [09:40:30] just some alerting noise from adjacent devices [09:51:00] fabfur: all done :) [09:51:46] tnx! [10:33:38] dear SREs with K8s interests. Please do not deploy in the admin namespace until further notice [10:34:06] effie: Ack, thanks for the head-up. [10:34:19] cheers ! [12:13:12] Who owns the golang production image? I am doing a rebuild of the production images and ERROR: image docker-registry.discovery.wmnet/golang failed to build, see logs for details [12:51:34] Logs suggest problem is missing "golang-1.14" package (which would fit, as I don't think that's in Debian proper) [13:40:55] !incidents [13:40:56] 4647 (UNACKED) db1189 (paged)/MariaDB Replica SQL: s3 (paged) [13:40:56] 4648 (UNACKED) db1175 (paged)/MariaDB Replica SQL: s3 (paged) [13:41:00] !ack 4647 [13:41:00] 4647 (ACKED) db1189 (paged)/MariaDB Replica SQL: s3 (paged) [13:41:02] !ack 4648 [13:41:02] 4648 (ACKED) db1175 (paged)/MariaDB Replica SQL: s3 (paged) [13:41:12] o/, known it looks like? [13:41:13] marostegui: do we need to be worried? [13:41:20] I will take care of that [13:41:28] <3 [13:42:58] Fixed both hosts [13:47:40] This is the task for those two hosts: https://phabricator.wikimedia.org/T364004 [13:59:57] thanks Manuel [14:27:16] hello on-callers, if everybody is ok I would like to move Cassandra instances on session store nodes to PKI [14:28:02] kask shouldn't care (more details in https://phabricator.wikimedia.org/T361964) and this is the last cluster that we move [14:28:09] so the procedure is battle tested etc.. [14:28:16] I'll do one node first, then codfw, then eqiad [14:28:31] any concern? [14:29:37] (also urandom is aware, I am not doing any rogue upgrade :D) [14:30:10] Lies!! Someone stop him! [14:30:16] * urandom is joking [14:30:52] No one stop him, that was only a joke :) [14:32:05] :D [14:46:21] ok ready to move the first instance, I'll work on sessionstore2004 [14:48:31] read. [14:48:33] ready. [14:49:10] elukey: I'm around in the hopefully-very-unlikely event that you have cfssl difficulties [14:49:28] <3 [14:50:14] restarting cassandra on 2004 with the new certs [14:55:52] elukey: lgtm, you can seen the handful of expected log entries associated with the restart (clients connection errors and reconnects), and nothing more [14:55:59] and people are still editing :) [14:56:00] urandom: I am checking the kask logs in https://logstash.wikimedia.org/goto/60bbdda50d64699e14a7b5e7435638bb, I see some handshake errors but nothing weird [14:56:07] ah nice same time :D [14:56:17] urandom: green light for 2005 and 2006? [14:56:18] TLS handshake errors are normal (or some are) [14:56:47] 'http: TLS handshake error from...' <-- that [14:56:51] exactly yes [14:56:55] okok proceeding [14:56:59] proceed! [14:59:58] Dear k8s caring SREs, you can make admin changes again, I am done for the day [15:02:47] restarts in progress :) [15:14:18] urandom: codfw done, green light for eqiad? [15:14:26] elukey: the board is green [15:14:46] super, running puppet and the kicking off the cookbook [15:15:13] marostegui: should I use a cookbook or something else in your opinion? [15:15:36] elukey: whatever you use, run it from cumin [15:15:52] right right [15:16:20] elukey: you can check cumin ip on netbox if you need to [15:17:25] marostegui: and then verify the router config via homer, right right [15:19:05] Exactly, with that you are all set. You might be able to automate all that via spicerack too [15:19:15] elukey: give it a thought ^ [15:20:13] marostegui: amazing [15:33:11] urandom: completed! \o/ [15:33:16] all cassandra clusters on PKI [15:33:23] \o/ [15:34:17] elukey: I owe you many ${beverage}, that was a bunch of work [15:34:40] urandom: I accept only if we drink the beverages together :) [15:34:48] of course, yes :) [15:34:55] while biking :D [15:35:28] urandom: last one if you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/1025811 [15:35:29] all the Maurten you can drink! [15:36:46] elukey: +1('d) [15:40:22] thanksss [15:46:21] we are going to pool the new DNS hosts in magru for authdns-updates. if you seen anything weird when running DNS cookbooks or stuff, please call it out here (and blame me for it) [16:17:01] (going afk for today, sessionstore and kask look good afaics, ping me in case anything weird pops up, I'll check later o/) [16:17:56] thanks again [17:38:53] looks like the apertium lb pool is shown at https://config-master.wikimedia.org/pybal/eqiad/apertium but not in https://config-master.wikimedia.org/pools.json . Is this on purpose? It's a kubernetes-based service FWiW [17:52:15] actually I'm finding quite a few LB pools that exist in the config-master subdirectory like "pybal/eqiad/$service" but don't exist in pools.json . Still unsure if it's an actual problem