[01:41:43] Hi people, I want install a wikistats on my wikimedia project, but I need help for this [06:15:21] 10Data-Engineering, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) [06:16:02] 10Data-Engineering, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) a:05BPirkle→03SGupta-WMF [07:26:05] 10Data-Engineering: stat1008's /srv partition is getting full due to home dirs - https://phabricator.wikimedia.org/T337246 (10kevinbazira) Reduced 'kevinbazira' from 167G to ~80G. [08:21:12] 10Data-Engineering, 10Observability-Alerting, 10Patch-For-Review: Migrate eventgate check_prometheus checks to alertmanager - https://phabricator.wikimedia.org/T309009 (10fgiunchedi) 05Open→03Resolved All done, resolving [08:22:16] !log installing conda-analytics-0.0.17.dev_amd64.deb to an-test-worker1001 for T332765 [08:22:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:22:19] T332765: Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 [08:44:03] 10Data-Engineering, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Epic, and 2 others: AQS 2.0: Device Analytics service - https://phabricator.wikimedia.org/T288298 (10SGupta-WMF) [09:05:58] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10Stevemunene) [09:23:42] LucasCouto: I can certainly try to help you get wikistats up and running, but there are lots of moving parts to it. It's not going to be a walk in the park, I'm afraid. [09:26:45] LucasCouto: The main wikistats2 codebase is here: https://gerrit.wikimedia.org/r/admin/repos/analytics/wikistats2,general and there is some documentation here: https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Wikistats_2#Architecture [09:28:21] The main thing that you'll need is some way to replicate the functionality of AQS, which is the back-end part of the system: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 [09:32:20] For us, this back-end involves many different bits of software, such as varnish, varnishkafka, kafka, gobblin, the HDFS file system, refinery, druid, cassandra, and aqs. [09:33:56] !log reboot an-test-coord1001.eqiad.wmnetDecember 2022 Buster reboots T325132 [09:33:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:34:58] Your mediawiki project might have quite different needs, but you'd effectively need to find a way of getting these back-end data sources (druid,cassandra) updated with regular webrequest, pageview, and mediwiki_history datasets. [09:35:10] I hope that helps a little. [09:58:45] 10Data-Engineering, 10Equity-Landscape: Programs input metric (not until 2022 data update) - https://phabricator.wikimedia.org/T309277 (10KCVelaga_WMF) 05Stalled→03Resolved [09:58:48] 10Data-Engineering, 10Equity-Landscape: Extract + Transformation Raw Data into Input Metrics - https://phabricator.wikimedia.org/T306625 (10KCVelaga_WMF) [09:59:58] 10Data-Engineering, 10Equity-Landscape: Transformations Flowchart - https://phabricator.wikimedia.org/T306614 (10KCVelaga_WMF) 05In progress→03Stalled [10:00:00] 10Data-Engineering, 10Equity-Landscape: Milestone: Transformation Definitions Complete: - https://phabricator.wikimedia.org/T305474 (10KCVelaga_WMF) [10:01:20] !log reboot an-test-master1001.eqiad.wmnet December 2022 Buster reboots T325132 [10:01:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:45:12] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Patch-For-Review, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) Success! We got a clean puppet run on an-test-worker1001. Th... [10:45:31] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Patch-For-Review, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10CodeReviewBot) btullis merged https://gitlab.wikimedia.org/repos/data... [11:06:23] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10Stevemunene) We are getting an error `CRITICAL - degraded: The following units failed: rsync-published.service` Details here; ` May 23 10:34:23 stat1... [12:05:14] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10BTullis) > This probably caused by the fact that stat1009 is not part of the group of stat hosts rsync hosts_allow list mentioned [[https://github.com... [12:11:35] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) I have released version 0.0.17 of conda-analytics: https://gitlab.wikimedia.org/re... [12:24:22] 10Data-Engineering: stat1008's /srv partition is getting full due to home dirs - https://phabricator.wikimedia.org/T337246 (10Isaac) `isaacj` down from 146G -> 39G. Thanks for the nudge! [12:52:13] 10Data-Engineering, 10Data-Platform-SRE, 10Event-Platform Value Stream, 10Platform Team Workboards (Clinic Duty Team): Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10jbond) [13:14:05] 10Data-Engineering: stat1008's /srv partition is getting full due to home dirs - https://phabricator.wikimedia.org/T337246 (10fkaelin) removed ~1TB for `fab` [14:20:41] 10Data-Engineering: stat1008's /srv partition is getting full due to home dirs - https://phabricator.wikimedia.org/T337246 (10Aroraakhil) reduced 'aarora' from 822G to 161G. There is a directory not owned by me in my home account: `/home/aarora/alberto_code_data_recsys`, and it would be great if an admin can nuk... [14:36:18] 10Data-Engineering: stat1008's /srv partition is getting full due to home dirs - https://phabricator.wikimedia.org/T337246 (10BTullis) >>! In T337246#8873734, @Aroraakhil wrote: > reduced 'aarora' from 822G to 161G. There is a directory not owned by me in my home account: `/home/aarora/alberto_code_data_recsys`,... [14:39:43] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [14:39:56] 10Data-Engineering, 10JsonConfig, 10Product-Infrastructure-Team-Backlog-Deprecated, 10Wikimedia-production-error: PHP Warning: The locally stored wiki page '[page]' has unsupported content model (from Dashiki) - https://phabricator.wikimedia.org/T293295 (10Milimetric) This is not a high priority for #data-... [14:40:48] 10Data-Engineering-Planning: Data Engineering Pairing system - https://phabricator.wikimedia.org/T327790 (10JArguello-WMF) [15:07:45] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Milimetric) @Liz: how do the queries you run help you? I'm always thinking about ways to organize public data and use cases like yours are really interesting. [15:09:48] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Wbm1058) I'll echo the "why not both" sentiment of Danilo above. Quarry doesn't get a lot of support. Right. Nothing on the developers platform gets a lot of support, s... [15:15:21] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Milimetric) @Wbm1058: it's not like Quarry, for sure. Quarry was optimized to be friendly to a very specific set of use cases. But it does cause a maintenance burden... [15:21:14] 10Data-Engineering: Automating pulling schemas from eventschema to datahub - https://phabricator.wikimedia.org/T337321 (10Htriedman) [15:28:33] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Wbm1058) Won't you need to keep the Superset machines running and patched with the latest OS / security updates, etc. too? [15:29:43] 10Data-Engineering, 10serviceops-radar, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: Store Flink HA metadata in Zookeeper - https://phabricator.wikimedia.org/T331283 (10Ottomata) Welp, we finally got all the configs and egress rules right, only to discover: https://curator.apache.org/z... [15:32:21] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10Jclark-ctr) [15:44:03] 10Data-Engineering, 10Data-Platform-SRE: Trash cleanup cron spams on an-test hosts - https://phabricator.wikimedia.org/T286442 (10jbond) [16:00:21] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 13), 10Patch-For-Review: Setup config to allow lineage instrumentation - https://phabricator.wikimedia.org/T333004 (10Antoine_Quhen) Thanks all for the reviews. Even if the dag is working, it could be great to decide the single source of truce for our dat... [16:24:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [16:30:45] (03PS1) 10Jbond: udplog: /etc/udp2log shold be a folder not a file [analytics/udplog] - 10https://gerrit.wikimedia.org/r/922573 (https://phabricator.wikimedia.org/T276622) [16:32:48] (03CR) 10CI reject: [V: 04-1] udplog: /etc/udp2log shold be a folder not a file [analytics/udplog] - 10https://gerrit.wikimedia.org/r/922573 (https://phabricator.wikimedia.org/T276622) (owner: 10Jbond) [16:33:07] (03PS2) 10Jbond: udplog: /etc/udp2log should be a folder not a file [analytics/udplog] - 10https://gerrit.wikimedia.org/r/922573 (https://phabricator.wikimedia.org/T276622) [16:35:15] (03CR) 10CI reject: [V: 04-1] udplog: /etc/udp2log should be a folder not a file [analytics/udplog] - 10https://gerrit.wikimedia.org/r/922573 (https://phabricator.wikimedia.org/T276622) (owner: 10Jbond) [16:36:24] 10Data-Engineering-Planning, 10Data Pipelines, 10Event-Platform Value Stream: Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?) - https://phabricator.wikimedia.org/T330236 (10Antoine_Quhen) [17:05:06] (03CR) 10Krinkle: [C: 03+2] Add first input delay schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/907871 (https://phabricator.wikimedia.org/T332012) (owner: 10Lgaulia) [17:11:52] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): eventutilities-python manager should set up python logging with ECS format - https://phabricator.wikimedia.org/T335802 (10Ottomata) @tchin it works! https://logstash.wikimedia.org/app/discover#/doc/0fade920-6712-11eb-8327-370b46f9e7a5/ecs-defaul... [17:12:05] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): eventutilities-python manager should set up python logging with ECS format - https://phabricator.wikimedia.org/T335802 (10Ottomata) [17:13:46] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): eventutilities-python manager should set up python logging with ECS format - https://phabricator.wikimedia.org/T335802 (10Ottomata) Strangely though, it looks like Flink captures Python stdout logging and logs itself too! https://logstash.wikime... [17:17:54] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10Ottomata) @gmodena I think(?) I've deployed in dse-k8s-eqiad staging. HA has been disabled, but swift checkpointing sh... [17:19:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [17:22:44] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 11): Q4 eventutilities-python should bundle java deps. - https://phabricator.wikimedia.org/T327251 (10Ottomata) @tchin @gmodena I just noticed setuptools complaining about a misconfuration when building the wheel with the lib dir: https://gitl... [17:24:32] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10CodeReviewBot) otto opened https://gitlab.wikimedia.org/repos/data-... [17:28:17] 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10CodeReviewBot) otto merged https://gitlab.wikimedia.org/repos/data-... [17:46:45] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A): mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10gmodena) >>! In T336656#8874300, @Ottomata wrote: > @gmodena I think(?) I've deployed in dse-k8s-eqiad staging. HA has... [18:04:42] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:06:42] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:15:30] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:21:42] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:34:41] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10rook) > Won't you need to keep the Superset machines running and patched with the latest OS / security updates, etc. too? Yes, though it is much less effort than Quarr... [18:58:36] (03Abandoned) 10Joal: Add guw.wikinews to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/907826 (https://phabricator.wikimedia.org/T334459) (owner: 10Gerrit maintenance bot) [19:14:25] 10Data-Engineering-Planning, 10DC-Ops, 10SRE, 10Shared-Data-Infrastructure, 10ops-eqiad: Q3:rack/setup/install an-worker11[49-56] - https://phabricator.wikimedia.org/T327295 (10Jclark-ctr) Reset idrac. still unable to login to an-worker1150 Fixed psu1 on 49 [19:24:10] 10Data-Engineering, 10Event-Platform Value Stream, 10Discovery-Search (Current work), 10Patch-For-Review: Add support for redirects in CirrusSearch - https://phabricator.wikimedia.org/T325315 (10daniel) >>! In T325315#8871319, @pfischer wrote: > @daniel, thank you for your feedback! I reduced the [[ https:... [20:06:03] 10Data-Engineering-Planning, 10SRE-swift-storage, 10Event-Platform Value Stream (Sprint 14 A): Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing - https://phabricator.wikimedia.org/T330693 (10Eevans) >>! In T330693#8841909, @Eevans wrote: > Per a discussion with @gmo... [20:08:52] 10Quarry: Improve Superset documentation - https://phabricator.wikimedia.org/T337342 (10Snaevar) [20:09:10] 10Quarry, 10cloud-services-team (FY2022/2023-Q4): Move Quarry to be an installation of Superset - https://phabricator.wikimedia.org/T169452 (10Snaevar) I think a lot of the issues can be fixed with better documentation and UI changes. Created an specific task for documentation, so only covering UI here. 1. Pro... [20:24:04] 10Quarry, 10Documentation: Improve Superset documentation - https://phabricator.wikimedia.org/T337342 (10Peachey88) [21:42:22] 10Analytics, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Documentation, and 3 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [22:00:53] 10Analytics, 10AQS2.0, 10API Platform (AQS 2.0 Roadmap), 10Documentation, and 3 others: AQS 2.0 documentation - https://phabricator.wikimedia.org/T288664 (10apaskulin) [22:25:03] 10Data-Engineering, 10Product-Analytics, 10Research: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10Mayakp.wiki) [22:26:41] 10Data-Engineering-Icebox: Improve Bot Detection Heuristics - https://phabricator.wikimedia.org/T310846 (10Mayakp.wiki) **Update: ** Automated traffic spikes we saw in March continued into April 2023. We observed that automated traffic which usually shows up as None or Direct referer traffic is dropping and subs... [22:50:03] 10Data-Engineering, 10Product-Analytics (Kanban): Model impact of User-Agent deprecation on top line metrics - https://phabricator.wikimedia.org/T336084 (10Mayakp.wiki) For next steps, we should re-start the discussion to request User-Agent Client-Hints and get High Entropy Hints into webrequest logs T295073 t...