[00:08:22] PROBLEM - Check unit status of eventlogging_to_druid_network_internal_flows-sanitization_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_network_internal_flows-sanitization_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:08:28] PROBLEM - Check unit status of eventlogging_to_druid_network_internal_flows_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_network_internal_flows_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:36:52] (03PS10) 10AGueyte: WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) [06:37:15] (03CR) 10AGueyte: WIP: Basic ipinfo instrument setup (034 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [06:37:36] (03CR) 10jerkins-bot: [V: 04-1] WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [08:43:11] FYI, I'll be rebooting the VMs running Turnilo/Hue/Yarn in the next ~ 15 minutes for a maintenance task of our virtualisation cluster, each individual downtime should be brief (1-2 mins per server) [08:44:33] +1 from my side [08:50:31] ack, starting with those now [09:05:20] all done [09:06:40] I'm also rebooting the VM parts of an-test* in a bit: an-test-client1001.eqiad.wmnet an-test-druid1001.eqiad.wmnet an-test-presto1001.eqiad.wmnet an-test-ui1001.eqiad.wmnet [09:07:22] +1 [09:31:07] PROBLEM - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:46:22] (03CR) 10ZPapierski: [C: 03+1] rdf-streaming-updater: add a "reconcile" operation [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/737429 (https://phabricator.wikimedia.org/T279541) (owner: 10DCausse) [10:02:08] I'm also restarting matomo1002 (piwik.wikimedia.org) in a bit [10:02:48] RECOVERY - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:52:37] I'm also restarting archiva1002 (archiva.wikimedia.org) in a bit [11:04:42] super [11:05:02] matomo1002 may have needed a cleaner shutdown (since it hosts a mysql db) but generally it is ok [11:05:26] yeah mariadb is fine on it [12:53:46] (03CR) 10Phuedx: WIP: Basic ipinfo instrument setup (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [12:59:48] (03CR) 10Phuedx: WIP: Basic ipinfo instrument setup (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [14:14:33] 26*0.75 [14:14:36] oops :) [14:14:46] 19.5 :) [14:30:48] (03CR) 10Phuedx: "Sorry for the multiple sets of comments 😅 I was trying to get the user_groups property working locally and uncovered a flaw in the task th" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [15:02:50] 10Data-Engineering, 10Generated Data Platform, 10Platform Engineering, 10SRE, 10Patch-For-Review: Import Debian package of Cassandra 3.11.11 as 'dev' version - https://phabricator.wikimedia.org/T298805 (10MoritzMuehlenhoff) I added component/cassandradev for buster and stretch. For the import we can eith... [15:11:02] 10Analytics-Radar, 10WMDE-Technical-Wishes-Maintenance, 10WMDE-Templates-FocusArea, 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-02-03): Add missing normalization to CodeMirror Grafana board - https://phabricator.wikimedia.org/T273748 (10thiemowmde) [15:25:08] I keep forgetting that people explicitly ask for stuff that I say "in theory" somebody wants: https://phabricator.wikimedia.org/T221397 [15:25:14] (link history in this case) [16:41:14] 10Data-Engineering, 10Generated Data Platform, 10Platform Engineering, 10SRE: Import Debian package of Cassandra 3.11.11 as 'dev' version - https://phabricator.wikimedia.org/T298805 (10Eevans) >>! In T298805#7622471, @MoritzMuehlenhoff wrote: > I added component/cassandradev for buster and stretch. For the... [17:32:06] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Epic: Run Atlas on cloud services cluster - https://phabricator.wikimedia.org/T299166 (10Ottomata) Nice [19:37:56] ottomata: can I just start sending events, or should I wait for https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/745914/ to be merged first? [20:26:16] (03PS11) 10AGueyte: WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) [20:27:02] (03CR) 10jerkins-bot: [V: 04-1] WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [20:44:04] (03PS12) 10AGueyte: WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) [20:44:24] (03CR) 10AGueyte: WIP: Basic ipinfo instrument setup (035 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [20:44:37] (03CR) 10jerkins-bot: [V: 04-1] WIP: Basic ipinfo instrument setup [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/753548 (https://phabricator.wikimedia.org/T296415) (owner: 10AGueyte) [23:24:01] PROBLEM - Hadoop NodeManager on an-worker1138 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [23:43:17] RECOVERY - Hadoop NodeManager on an-worker1138 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process