[00:19:12] PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:00:16] RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:41:58] PROBLEM - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1002 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:52:02] PROBLEM - Check unit status of analytics-dumps-fetch-mediacounts on clouddumps1001 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-mediacounts https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:59:38] PROBLEM - Check unit status of analytics-dumps-fetch-pageview on clouddumps1002 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:01:48] PROBLEM - Check unit status of analytics-dumps-fetch-pageview on clouddumps1001 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:06:40] PROBLEM - Check unit status of analytics-dumps-fetch-clickstream on clouddumps1001 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-clickstream https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:08:00] PROBLEM - Check unit status of analytics-dumps-fetch-clickstream on clouddumps1002 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-clickstream https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:25:36] PROBLEM - SSH on analytics1076.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:38:38] RECOVERY - Check unit status of analytics-dumps-fetch-unique_devices on clouddumps1002 is OK: OK: Status of the systemd unit analytics-dumps-fetch-unique_devices https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:48:46] RECOVERY - Check unit status of analytics-dumps-fetch-mediacounts on clouddumps1001 is OK: OK: Status of the systemd unit analytics-dumps-fetch-mediacounts https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:56:22] RECOVERY - Check unit status of analytics-dumps-fetch-pageview on clouddumps1002 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:58:34] RECOVERY - Check unit status of analytics-dumps-fetch-pageview on clouddumps1001 is OK: OK: Status of the systemd unit analytics-dumps-fetch-pageview https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:04:32] RECOVERY - Check unit status of analytics-dumps-fetch-clickstream on clouddumps1002 is OK: OK: Status of the systemd unit analytics-dumps-fetch-clickstream https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:14:32] RECOVERY - Check unit status of analytics-dumps-fetch-clickstream on clouddumps1001 is OK: OK: Status of the systemd unit analytics-dumps-fetch-clickstream https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:26:52] RECOVERY - SSH on analytics1076.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:41:46] I'm seeing some stale records in debmonitor for an-presto1007/1009/1011, they are kept in Netbox as planning, so these are probably from an aborted installation? I'll go remove them, then. when the servers get reinstalled later, that'll recreate fresh debmonitor records [08:53:24] 10Data-Engineering, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:(Need By: TBD) rack/setup/install an-presto10[06-15].eqiad.wmnet - https://phabricator.wikimedia.org/T306835 (10BTullis) [09:05:04] moritzm: Thanks, I think you're right. I remember that these new an-presto hosts were affected by that PERC H750 issue early on, which is why two are insetup and the others aren't fully installed. I'll pick this up and get it finished. [09:08:18] ack, we're currently in the process of finishing up the RAID monitoring for Perc H750, which is last missing bit for their support [09:14:59] (03CR) 10Nik Gkountas: [C: 03+2] content_translation_event: Add new event_source for new section entrypoints [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/824762 (https://phabricator.wikimedia.org/T287403) (owner: 10MNeisler) [09:16:09] (03Merged) 10jenkins-bot: content_translation_event: Add new event_source for new section entrypoints [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/824762 (https://phabricator.wikimedia.org/T287403) (owner: 10MNeisler) [10:18:07] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10gmodena) >>! In T308017#8175110, @Ottomata wrote: > Option A: `content_slots` field that only represen... [12:20:03] 10Data-Engineering, 10Patch-For-Review: Migrate Kafka prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T309010 (10fgiunchedi) Alerts are in place, leaving the task open to see how it goes and I'll upgrade the necessary alerts to paging after a test period [12:52:14] (03CR) 10Joal: "I added comments on first request, they apply to all queries :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [13:00:10] 10Data-Engineering, 10Data Pipelines: airflow instances should use specific artifact cache directories - https://phabricator.wikimedia.org/T315374 (10Ottomata) CC @fkaelin [13:04:04] 10Analytics-Radar, 10Machine-Learning-Team, 10SRE: Running docker containers in a non-production environment - https://phabricator.wikimedia.org/T275551 (10Ottomata) > will it be possible to consume e.g. events from kafka infra, or read/write to swift? Nopers :/ > Is this the recommended way for running co... [13:09:12] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/820707 (https://phabricator.wikimedia.org/T314646) (owner: 10Gerrit maintenance bot) [13:10:42] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/820705 (https://phabricator.wikimedia.org/T314640) (owner: 10Gerrit maintenance bot) [13:12:52] (03PS2) 10Joal: Add ig.wikiquote to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/820705 (https://phabricator.wikimedia.org/T314640) (owner: 10Gerrit maintenance bot) [13:13:16] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/820705 (https://phabricator.wikimedia.org/T314640) (owner: 10Gerrit maintenance bot) [13:16:51] (03CR) 10Joal: [C: 03+1] "I assume the files have been changes automatically and without breaking stuff :) I don't have any context on potential readers of those js" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/823700 (https://phabricator.wikimedia.org/T308450) (owner: 10Ottomata) [13:30:28] 10Analytics-Radar, 10Machine-Learning-Team, 10SRE: Using docker in WMF production network outside of kubernetes - https://phabricator.wikimedia.org/T275551 (10fkaelin) [13:31:51] 10Analytics-Radar, 10Machine-Learning-Team, 10SRE: Using docker in WMF production network outside of kubernetes - https://phabricator.wikimedia.org/T275551 (10fkaelin) > I wonder an even more useful title would be "Using docker in WMF production network outside of kubernetes", as this is the real issue. Goo... [13:45:44] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Fix turnilo after upgrade - https://phabricator.wikimedia.org/T308778 (10JAllemandou) >>! In T308778#8119297, @ayounsi wrote: > @BTullis could it be possible to fix https://turnilo.wikimedia.org/#network_flows_internal/ until the overall issu... [13:54:26] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Protsack.stephan) >>! In T308017#8174867, @Ottomata wrote: > Thanks @Protsack.stephan. IIUC then, your... [13:54:28] (03CR) 10Joal: Cleanup dependencies of core/pom.xml file (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/780886 (https://phabricator.wikimedia.org/T306193) (owner: 10Aqu) [14:02:39] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) Okay, thank you! `rendered_content_slots` organizing and referencing the schema fragments a... [14:15:44] (03CR) 10Ottomata: "Thanks sorry, we might abandon this in favor of https://phabricator.wikimedia.org/T315674 TBD." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/823700 (https://phabricator.wikimedia.org/T308450) (owner: 10Ottomata) [14:16:53] 10Data-Engineering, 10Event-Platform Value Stream, 10SRE, 10serviceops: eventstreams chart should use latest common_templates - https://phabricator.wikimedia.org/T310721 (10akosiaris) Hi @Ottomata, @JArguello-WMF /me is back. Any updates on this one (even if just a rough timeline) ? Anything we can help... [14:29:46] (03CR) 10Gehel: "Note that dependency:analyze still shows a few issues:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/780886 (https://phabricator.wikimedia.org/T306193) (owner: 10Aqu) [14:34:08] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Marostegui) [16:20:18] mforns: wanna stay in the meeting? [16:20:27] yes joal ! [16:23:24] 10Data-Engineering, 10WMF-Communications: Add Sound Logo site to Matomo dashboard and provide Communications department account with access - https://phabricator.wikimedia.org/T315613 (10mpopov) Hi @Varnent our team doesn't maintain Matomo. I believe that would be @EChetty's team. Emil: here's a past ticket f... [16:53:43] 10Data-Engineering-Kanban, 10Data-Engineering-Radar, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Flink output support for Event Platform events - https://phabricator.wikimedia.org/T310218 (10Ottomata) Oo, I +2ed, and it merged. @dcausse hope that's okay. Shall I make a release? [16:54:14] 10Data-Engineering-Kanban, 10Data-Engineering-Radar, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Flink output support for Event Platform events - https://phabricator.wikimedia.org/T310218 (10dcausse) Moving to done as I believe that wikimedia-event-utilities has now the basic functionali... [16:54:33] 10Data-Engineering-Kanban, 10Data-Engineering-Radar, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Flink output support for Event Platform events - https://phabricator.wikimedia.org/T310218 (10dcausse) @Ottomata yes please :) [16:58:23] Starting build #14 for job wikimedia-event-utilities-maven-release-docker [17:01:23] Project wikimedia-event-utilities-maven-release-docker build #14: 09SUCCESS in 3 min 1 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/14/ [17:04:36] (03PS10) 10NOkafor: This commit adds the cassandra configuration to the usage sample and adjusts minor changes Bug: T311507 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) [17:06:54] 10Data-Engineering-Kanban, 10Data-Engineering-Radar, 10Event-Platform Value Stream (Sprint 00), 10Patch-For-Review: Flink output support for Event Platform events - https://phabricator.wikimedia.org/T310218 (10Ottomata) Done: https://archiva.wikimedia.org/#artifact/org.wikimedia/eventutilities-flink/1.2.0 [17:07:52] (03CR) 10NOkafor: "Latest reviews have been resolved." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/812095 (https://phabricator.wikimedia.org/T311507) (owner: 10NOkafor) [17:08:26] mforns: I think I have a lead [17:08:33] oph! [17:08:39] mforns: batcave? [17:09:10] joal: omw [17:18:41] 10Data-Engineering-Kanban, 10Data-Catalog: Custom Metadata ingestion - https://phabricator.wikimedia.org/T307714 (10Milimetric) [17:19:08] 10Data-Engineering-Kanban, 10Data-Catalog: Custom Metadata ingestion - https://phabricator.wikimedia.org/T307714 (10Milimetric) [17:19:30] 10Data-Engineering-Kanban, 10Data-Catalog: Custom Metadata ingestion - https://phabricator.wikimedia.org/T307714 (10Milimetric) [17:21:51] 10Analytics-Wikistats, 10Data-Engineering: Feature requests for Active Editors by Country - https://phabricator.wikimedia.org/T304720 (10Milimetric) a:05Milimetric→03None [17:24:50] 10Data-Engineering-Kanban, 10Data Pipelines: Projectviews by country Airflow job - https://phabricator.wikimedia.org/T303193 (10Milimetric) Dan already learned airflow from other jobs. The stale branch is in gitlab whenever someone picks this up: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dag... [17:24:58] 10Data-Engineering-Kanban, 10Data Pipelines: Projectviews by country Airflow job - https://phabricator.wikimedia.org/T303193 (10Milimetric) a:05Milimetric→03None [17:33:38] 10Analytics-Wikistats, 10Data-Engineering: Easter Egg: wikistats classic style on wikistats 2.0 - https://phabricator.wikimedia.org/T177408 (10Milimetric) 05Open→03Stalled [17:35:33] 10Analytics-Wikistats, 10Data-Engineering, 10Epic: Add ability to compare wikis - https://phabricator.wikimedia.org/T283251 (10Milimetric) [17:35:51] 10Analytics-Wikistats, 10Data-Engineering: Wikistats should allow more than one project - https://phabricator.wikimedia.org/T283254 (10Milimetric) 05Open→03Stalled Just a reminder that a lot of this codereview is still in gerrit, waiting for priority. Wikistats can't be ignored forever. [17:36:21] 10Data-Engineering, 10Data-Catalog, 10Epic: Data Catalog POC - https://phabricator.wikimedia.org/T293647 (10JArguello-WMF) 05Open→03Resolved a:03JArguello-WMF [17:37:06] 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: `spark.memory.driver` option does not get applied with "client" deployment mode. - https://phabricator.wikimedia.org/T284630 (10Milimetric) a:05Milimetric→03None With the new value stream work, it's really unclear what we're doing with wmfdata an... [17:37:20] 10Analytics-Wikistats, 10Data-Engineering: Wikistats should allow more than one project - https://phabricator.wikimedia.org/T283254 (10Milimetric) [17:37:22] 10Analytics-Wikistats, 10Data-Engineering, 10Patch-For-Review: Expand Wikiselector to allow more than one wiki - https://phabricator.wikimedia.org/T285050 (10Milimetric) 05Open→03Stalled [17:37:32] 10Analytics-Wikistats, 10Data-Engineering, 10Patch-For-Review: Expand Wikiselector to allow more than one wiki - https://phabricator.wikimedia.org/T285050 (10Milimetric) a:05Milimetric→03None [17:38:28] 10Analytics-Wikistats, 10Data-Engineering: Confusing filtering on "Active editors by country" topic - https://phabricator.wikimedia.org/T300365 (10Milimetric) a:05Milimetric→03None [17:38:54] 10Data-Engineering-Kanban, 10Product-Analytics, 10wmfdata-python, 10GitLab (Project Migration): Move Wmfdata-Python from Github to Gitlab - https://phabricator.wikimedia.org/T304544 (10Milimetric) a:05Milimetric→03None [17:39:00] 10Data-Engineering, 10Data-Catalog: Data Catalog Deployment Plan [Mile Stone 2] - https://phabricator.wikimedia.org/T299888 (10JArguello-WMF) 05Open→03Resolved a:03JArguello-WMF [17:39:12] 10Data-Engineering, 10Data-Catalog: Data Catalog Feature Matrix [Mile Stone 1] - https://phabricator.wikimedia.org/T299887 (10JArguello-WMF) 05Open→03Resolved [17:39:48] 10Data-Engineering, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 2 others: SPIKE: consider all problems that might happen when we handle Google's privacy changes - https://phabricator.wikimedia.org/T265057 (10Milimetric) a:05Milimetric→03None @emil is this on your radar? It seems relativ... [17:39:55] 10Data-Engineering, 10Data-Catalog: Data Catalog Initial Deployment. [Mile Stone 3] - https://phabricator.wikimedia.org/T299893 (10JArguello-WMF) 05Open→03Resolved a:03JArguello-WMF [17:40:21] 10Analytics-Radar, 10Pageviews-Anomaly, 10Product-Analytics: Analyse possible bot traffic for ptwiki article Ambev - https://phabricator.wikimedia.org/T282502 (10Milimetric) a:05Milimetric→03None [17:41:35] 10Data-Engineering, 10Data-Engineering-Kanban: Pageview Data loss due to wrong version of package installed on some varnishkafka instances - https://phabricator.wikimedia.org/T300164 (10Milimetric) [17:42:04] 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE, 10Traffic: Spike: Investigate creating robust alerts to notify that caching nodes are not sending traffic data - https://phabricator.wikimedia.org/T304651 (10Milimetric) 05Open→03Declined I'm declining this in favor of other work Ben is doing to imp... [18:32:22] (03PS1) 10Bearloga: content_interaction_event: Fix typo and add new val to enum [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/825871 (https://phabricator.wikimedia.org/T299055) [18:42:08] (03CR) 10MNeisler: [C: 03+2] content_interaction_event: Fix typo and add new val to enum [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/825871 (https://phabricator.wikimedia.org/T299055) (owner: 10Bearloga) [18:42:45] (03Merged) 10jenkins-bot: content_interaction_event: Fix typo and add new val to enum [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/825871 (https://phabricator.wikimedia.org/T299055) (owner: 10Bearloga) [19:26:38] 10Data-Engineering, 10Event-Platform Value Stream: Remove materialized .json files from event schema repositories - https://phabricator.wikimedia.org/T315674 (10Ottomata) > I'm not sure forcing the server and all clients to parse YAML is a good idea. Hm. Getting rid of the yaml files would be more difficult t... [19:53:33] 10Data-Engineering: Add the requestctl element of the x-analytics map to turnlio's webrequest_sampled_128 - https://phabricator.wikimedia.org/T314578 (10CDanis) ping @JAllemandou -- did I put this on the right phab tag? It'd be really awesome to have and I suspect is a pretty easy change