[07:59:39] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q4): Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#9862563 (10fgiunchedi) >>! In T365265#9860976, @colewhite wrote: >>>! In T365265#98580... [08:21:01] 06serviceops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: hw troubleshooting: firmware upgrade for mw1358.eqiad.wmnet - https://phabricator.wikimedia.org/T366583#9862589 (10Clement_Goubert) Thanks! [08:34:51] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862620 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by cgoubert@cumin1002 from mw1358 to wikikube-worker1001 completed: - mw1358 (**PASS*... [09:02:32] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862714 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1001.eqiad.wmnet with OS bullseye [09:29:51] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862785 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1400 to wikikube-worker1008.eqiad.wmnet completed: - mw14... [09:32:00] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862794 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1410 to wikikube-worker1010.eqiad.wmnet completed: - mw14... [09:40:25] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862837 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1400 to wikikube-worker1008 completed: - mw1400 (**PASS**... [09:43:36] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862842 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1401 to wikikube-worker1009 completed: - mw1401 (**PASS**... [09:43:46] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862843 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1001.eqiad.wmnet with OS bullseye comp... [09:45:05] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862846 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1410 to wikikube-worker1010 completed: - mw1410 (**PASS**... [09:45:23] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862847 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1456 to wikikube-worker1012 completed: - mw1456 (**WARN**... [09:45:34] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862848 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1456 to wikikube-worker1012 completed: - mw1456 (**WARN**... [09:46:13] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862849 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1428 to wikikube-worker1011 completed: - mw1428 (**WARN**... [09:52:41] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862888 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1428 to wikikube-worker1011 completed: - mw1428 (**PASS**... [09:55:39] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862897 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1008.eqiad.wmnet with OS bullseye [09:55:53] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862899 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1009.eqiad.wmnet with OS bullseye [09:56:07] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862900 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1010.eqiad.wmnet with OS bullseye [09:57:26] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862901 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by hnowlan@cumin1002 from mw1456 to wikikube-worker1012 completed: - mw1456 (**PASS**... [10:00:20] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862939 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1011.eqiad.wmnet with OS bullseye [10:00:27] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9862941 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1012.eqiad.wmnet with OS bullseye [10:30:57] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863063 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1008.eqiad.wmnet with OS bullseye compl... [10:34:16] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863105 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1010.eqiad.wmnet with OS bullseye compl... [10:37:27] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863121 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1012.eqiad.wmnet with OS bullseye compl... [10:52:29] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863174 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1009.eqiad.wmnet with OS bullseye execu... [10:53:18] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863183 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1009.eqiad.wmnet with OS bullseye [10:53:56] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863186 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1011.eqiad.wmnet with OS bullseye execu... [10:54:08] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863187 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by hnowlan@cumin1002 for host wikikube-worker1011.eqiad.wmnet with OS bullseye [10:56:29] 06serviceops, 10Maps, 06SRE, 13Patch-For-Review: Move maps/karthoterian to PKI/cfssl - https://phabricator.wikimedia.org/T360778#9863179 (10MoritzMuehlenhoff) 05Open→03Resolved a:05jijiki→03MoritzMuehlenhoff maps is now using cfssl. [11:27:22] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863274 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1009.eqiad.wmnet with OS bullseye compl... [11:32:02] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9863290 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by hnowlan@cumin1002 for host wikikube-worker1011.eqiad.wmnet with OS bullseye compl... [11:36:41] 06serviceops, 10Sustainability (Incident Followup): 2024-04-17 mw-on-k8s eqiad outage - https://phabricator.wikimedia.org/T362766#9863312 (10jijiki) [11:36:42] 06serviceops, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar): mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690#9863313 (10jijiki) [12:36:52] hello folks! [12:37:01] I have https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1038782 lined up for the sre k8s reboot nodes cookbook [12:37:46] do you prefer to review or should I proceed? [12:38:49] 06serviceops, 07Wikimedia-production-error: registry2004 sometimes reporting: too many open files problems - https://phabricator.wikimedia.org/T366481#9863503 (10TheDJ) I suspect this is the docker registry ? [12:40:17] !oncall [12:41:29] cc: claime (for the patch above, since you are using the cookbook atm) [12:42:48] Yeah gimme a minute [12:43:05] (I've been testing your patch and to me no regression, so I'll +1 it, just need to finish something first) [12:45:42] oh okok no rush! Anytime, I wanted to ping you before pulling the trigger since you were working on it [12:46:26] thanksss [12:46:27] +1´d, merge at your leisure, I won't restart the cookbook until after the backport window [12:46:45] (and I'm using my test version anyways since I need the exclusion :P) [13:25:58] 06serviceops, 07Wikimedia-production-error: registry2004 sometimes reporting: too many open files problems - https://phabricator.wikimedia.org/T366481#9863726 (10Clement_Goubert) Thanks for the report. I think this is from `rsyslog-kafka`, may be related to the file descriptor leak we saw in {T357616} >>! In... [14:22:42] 06serviceops, 07Wikimedia-production-error: registry2004 sometimes reporting: too many open files problems - https://phabricator.wikimedia.org/T366481#9863929 (10JMeybohm) p:05Triage→03High Unfortunately this is actually nginx complaining: ` 2024/06/03 07:25:35 [crit] 4554#4554: accept4() failed (24: Too... [14:29:08] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9863961 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [14:33:34] 06serviceops, 10Cloud-Services, 06SRE, 13Patch-For-Review: Modernise memcached systemd unit / sync, and make it presentable - https://phabricator.wikimedia.org/T273950#9863991 (10MoritzMuehlenhoff) [14:57:15] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864036 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [14:57:17] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864037 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [15:05:59] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [15:34:55] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864232 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [15:46:10] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864265 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [15:52:55] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864316 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [16:01:26] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864355 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [16:10:42] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864398 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [16:35:00] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864582 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [16:38:56] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864607 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [16:46:05] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864674 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [17:04:35] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864820 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [17:12:55] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9864877 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host wikikube-ctrl1001.eqiad.... [22:49:04] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9866001 (10Scott_French) [23:09:29] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9866023 (10Scott_French) The service is turned up in staging and was verified against the commons impact metrics dataset present in cassandra staging a...