[00:37:04] PROBLEM - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:29:18] 10Data-Engineering: Deprecate GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10odimitrijevic) [05:32:31] 10Data-Engineering: Migrate to MaxMind GeoIP2 - https://phabricator.wikimedia.org/T302989 (10odimitrijevic) 05Open→03Declined Data engineering already uses GeoIP2 datasets. [05:38:29] 10Data-Engineering: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10odimitrijevic) [05:39:51] 10Data-Engineering, 10SRE, 10Traffic, 10Trust-and-Safety, 10serviceops: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10odimitrijevic) [10:12:13] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow, 10Data-Catalog: Complete monitoring setup of datahubsearch nodes - https://phabricator.wikimedia.org/T302818 (10BTullis) [10:16:01] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Complete monitoring setup of datahubsearch nodes - https://phabricator.wikimedia.org/T302818 (10BTullis) All checks are green, now that the prometheus exporter has been fixed. Marking this ticket as done. [10:20:20] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) I have moved this to the `monitoring_setup` state, so the cluster will be monitored by Icinga, but it will not page. I... [11:46:59] RECOVERY - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:51:58] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Generate $wgEventLoggingSchemas from $wgEventStreams - https://phabricator.wikimedia.org/T303602 (10Ottomata) Okay so I do have thoughts! We made `$wgEventLoggingStreamNames` before we talked about and decided to do https://wikitech.wikimedia.org/wiki/E... [11:52:35] 10Data-Engineering, 10MediaWiki-extensions-EventLogging: Generate $wgEventLoggingSchemas from $wgEventStreams - https://phabricator.wikimedia.org/T303602 (10Ottomata) OH WAIT, this is exactly what you are proposing! But without bothering to make the EventStreamConfig API do it. Okay great! [11:55:06] 10Data-Engineering-Kanban, 10LDAP-Access-Requests: Grant Access to LDAP wmf group for NOkafor - https://phabricator.wikimedia.org/T303512 (10Ottomata) Approved! [11:58:11] 10Data-Engineering-Kanban, 10SRE, 10SRE-Access-Requests: Requesting access to DataEngineering Team Resources for NOkafor - https://phabricator.wikimedia.org/T303516 (10BTullis) [11:58:44] 10Data-Engineering-Kanban, 10LDAP-Access-Requests: Grant Access to LDAP wmf group for NOkafor - https://phabricator.wikimedia.org/T303512 (10BTullis) [12:04:44] 10Data-Engineering-Kanban, 10LDAP-Access-Requests: Grant Access to LDAP wmf group for NOkafor - https://phabricator.wikimedia.org/T303512 (10BTullis) I have added Njideka to the wmf group in LDAP. ` btullis@mwmaint1002:~$ ldapsearch -x cn=wmf|grep nokafor btullis@mwmaint1002:~$ sudo modify-ldap-group wmf Sear... [12:11:56] 10Data-Engineering-Kanban, 10SRE, 10SRE-Access-Requests: Requesting access to DataEngineering Team Resources for NOkafor - https://phabricator.wikimedia.org/T303516 (10BTullis) LDAP membership of the `wmf` groups has been added in T303512 I have created the kerberos principal. ` btullis@krb1001:~$ sudo ma... [12:15:15] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog: Complete monitoring setup of datahubsearch nodes - https://phabricator.wikimedia.org/T302818 (10BTullis) a:05razzi→03BTullis [12:17:14] 10Data-Engineering, 10Data-Engineering-Kanban, 10Data-Catalog, 10Patch-For-Review: Define LVS load-balancing for OpenSearch cluster - https://phabricator.wikimedia.org/T301458 (10BTullis) The monitoring check in Icinga for this service is now fixed. [12:51:56] 10Data-Engineering-Radar, 10Growth-Team, 10MediaWiki-extensions-GuidedTour: Finish decommissioning the legacy GuidedTour schemas - https://phabricator.wikimedia.org/T303712 (10phuedx) [13:00:30] (03PS9) 10Ottomata: [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:01:44] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:18:28] gehel: o/ [13:18:35] i'm trying to enable CheckStyle-IDEA [13:18:42] to work with the discovery-parent-pom stuff [13:18:50] i've got the plugin insttalled and i can see where to enable it [13:19:02] but, afaict we don't have a checkstyle.xml config file? [13:19:59] OH WAIT i found docs [13:20:06] https://github.com/wikimedia/wikimedia-discovery-discovery-parent-pom#maven-checkstyle-plugin [13:20:08] should have looked firrst sorry! [13:21:02] btw, the maven central link is broken [13:30:36] (03PS10) 10Ottomata: [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:32:48] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add prometheus metrics reporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:34:31] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Send some existing Gobblin metrics to prometheus - https://phabricator.wikimedia.org/T294420 (10Ottomata) Update: It won't be possible (at least not without a lot more work) to get anything but metrics from the Gobblin... [13:37:28] (03PS11) 10Ottomata: Add PrometheusEventReporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:37:36] joal: https://gerrit.wikimedia.org/r/c/analytics/gobblin-wmf/+/767178 is ready for review! [13:38:05] jenkins is failing because of some javadoc issues in copy/ module (won't fix), and because of some spotbugs thing i don't quite understand [13:38:58] (03CR) 10jerkins-bot: [V: 04-1] Add PrometheusEventReporter [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [13:47:38] (03CR) 10Ottomata: "Ready for review!" [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/767178 (owner: 10Joal) [14:14:05] Hey ottomata - will review :) [14:23:18] ty! [14:31:42] joal: max.incremental.fetch.session.cache.slots=2000 ready to go [14:32:03] i can deploy now, perhaps it will be good to deploy and let it sit for a day or so before you try your thing? [14:32:10] to see if we don't go above 2000 in regular operations? [14:37:21] works for me ottomata :) [14:37:27] okay [14:37:50] elukey: FYI and any objections to https://gerrit.wikimedia.org/r/c/operations/puppet/+/770505 ? [14:41:20] ottomata: also, after talking with dcausse I have tested with a different config for reading data (read from consumer-groups) -> At second run the job fails almost instantly, meaning my problem seems related to data more than anything else [14:42:12] ottomata: np +1, I am wondering though what a cache slot represents, is it a client->partition consumer? (Super ignorant about it) [14:45:23] i was too! [14:45:24] elukey: https://cwiki.apache.org/confluence/display/KAFKA/KIP-227%3A+Introduce+Incremental+FetchRequests+to+Increase+Partition+Scalability [14:46:02] apparently it is a cache kept on brokers that maps FetchSessionIds (e.g. a consumer client process, or a replica fetcher process) to metadata about the partitions they are interested int [14:46:24] so that, the amount of data transferred on new connection can be reduced [14:47:55] okay meetings about to start, i will probably merge and apply tomomrrow my morn [14:48:30] sure sure, seems very complicated to judge the effects on kafka, but probably good for the jumbo use case if we are hitting limits [14:49:27] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Investigate using a HiveToGraphite connector job instead of individual jobs - https://phabricator.wikimedia.org/T303308 (10Snwachukwu) a:03Snwachukwu [14:55:45] elukey: indeed. i think the only consequences will be slightly more memory used for this cache [14:55:56] so very slightly less memory for messages in page cache [14:56:09] but i don't think it will be much, it is just partition metadata [14:58:55] yep seems something good to try [15:03:45] a-team standup [15:34:10] 10Data-Engineering, 10Data-Engineering-Kanban, 10Airflow: Unifying HDFS Sensor and FSSPEC Sensor - https://phabricator.wikimedia.org/T302392 (10EChetty) [15:43:51] 10Data-Engineering, 10Airflow, 10Platform Engineering: Replace Airflow's HDFS client (snakebite) with pyarrow - https://phabricator.wikimedia.org/T284566 (10EChetty) 05Open→03Declined [15:43:53] 10Data-Engineering, 10Airflow, 10Epic, 10Platform Team Workboards (Image Suggestion API): Airflow collaborations - https://phabricator.wikimedia.org/T282033 (10EChetty) [16:00:25] 10Data-Engineering, 10Data-Catalog: Set up karapace instance for datahub - https://phabricator.wikimedia.org/T301562 (10EChetty) a:05BTullis→03razzi [16:06:59] .7 [16:07:01] uff [16:07:05] :) [16:40:33] razzi o/ [16:40:48] hi ottomata [16:41:04] sooOoOO what's up how can I help!/ [16:41:04] ? [16:41:42] I'm thinking about how to get the python dependencies for karapace into a superset_deploy style repository [16:42:08] oh right cuz you need more than just the dependencies [16:42:15] HMMM razzi want to try the new conda_dist stuff? [16:42:16] ??? [16:42:26] instead of putting all deps in git? [16:42:32] yeah show me the way [16:42:45] https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils#building-project-conda-environments-for-distribution [16:43:02] but, we might be able to do that a little better with gitlab CI [16:43:28] but, ultimately, if you have worfklow_utils with conda-dist CLI installed on your build box (local? docker?) [16:43:33] in your python project (karapace) [16:43:37] hopefully you can just run [16:43:38] conda-dist [16:43:41] and it will do all thte right stuff [16:44:12] Where does the dist environment get stored? [16:44:54] conda dist will justt make a .tgz file of it [16:45:00] then we'll have to put it somewhere [16:45:07] the intention is to use conda-dist in your project's CI [16:45:20] to generate the conda .tgz, and then upload it somewhere, probably to gitlab [16:45:28] but...then we have to get it from gitlab to your server [16:45:45] perhaps...for karapace since we will want to remove it anyway, just copying it there manually will be okay for now? [16:46:30] or, we could use scap and the artifact syncing stuff like we set up for airflow [16:56:34] hmm, razzi we might need tot add an conda-environment.yml file with the python dep specified [16:58:05] can you screenshare me ottomata ? [17:01:24] (we sharin) [18:41:51] (03CR) 10Vivian Rook: [C: 03+2] view.js: Show full run date in UTC [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/517145 (https://phabricator.wikimedia.org/T215831) (owner: 10Framawiki) [18:46:10] (03Merged) 10jenkins-bot: view.js: Show full run date in UTC [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/517145 (https://phabricator.wikimedia.org/T215831) (owner: 10Framawiki) [18:58:26] 10Quarry, 10Patch-For-Review: Show query run date above outputs section - https://phabricator.wikimedia.org/T215831 (10rook) 05Open→03Resolved [21:03:14] a-team: sorry, I tried to run a big hive query on stat1004 and it went very sour and I can't even kill it (pid 15674) [21:03:33] bearloga: would you like me to try to kill it? [21:03:40] razzi: yes please [21:04:28] ok it is done bearloga [21:04:39] razzi: thank you!!! [21:04:49] Didn't respond to the usual kill signal so I gave it the -9 [21:05:23] oooh I saw that somewhere but didn't know how to use it or that I should [21:05:39] !log `sudo kill -9 15674` to stop unresponsive hive query [21:05:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:07:24] bearloga: it'd probably be fine if it's your own process, but since the process wouldn't have a chance to clean up so it can lead to a messy computer state [21:08:20] all looks well to me, carry on querying :) [21:16:53] thanks! :D [21:36:29] 10Data-Engineering, 10Data-Catalog, 10Patch-For-Review: Create debian package of karapace - https://phabricator.wikimedia.org/T301565 (10razzi) a:03razzi I have been working on this and there is a deb at `deneb.codfw.wmnet:/home/razzi/karapace-temp/karapace_2.1.3-py3.7-0_amd64.deb`. To build this deb, I us... [21:36:58] 10Data-Engineering, 10Data-Catalog, 10Patch-For-Review: Create debian package of karapace - https://phabricator.wikimedia.org/T301565 (10razzi) Still todo: upload the .deb to apt.wikimedia.org and iterate on https://gerrit.wikimedia.org/r/c/operations/puppet/+/770605 to install the package and set up a syste... [21:46:56] 10Data-Engineering-Radar, 10Product-Analytics: Support on understanding traffic and behaviors for users on legacy browsers (somewhat timely) - https://phabricator.wikimedia.org/T303301 (10mpopov) Not sure how it bypassed triage column and appeared straight in the backlog