[00:09:57] PROBLEM - Check unit status of drop_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:32:55] ACKNOWLEDGEMENT - Check unit status of drop_event on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit drop_event Marostegui T283126 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:17:40] 10Analytics, 10Analytics-Kanban, 10Packaging, 10Patch-For-Review: Create a debian package for Apache Airflow - https://phabricator.wikimedia.org/T277012 (10Volans) @Ottomata FYI APT is currently broken on `an-test-coord1001`, for any operation it gives: ` E: The package airflow needs to be reinstalled, bu... [10:28:12] (03PS6) 10Martaannaj: Create wd_propertysuggester/client_ab_testing and wd_propertysuggester/server_ab_testing [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/689152 [10:48:39] 10Analytics-Radar, 10LDAP-Access-Requests, 10SRE, 10SRE-Access-Requests: Account setup issues for jmixter-ctr - https://phabricator.wikimedia.org/T283250 (10Marostegui) p:05Triage→03Medium [10:54:52] elukey: hello, it looks you're currently the only person with +o here. Can you update the topic to point logs to https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-analytics/ instead? Thank you very much. [10:54:52] Hey urbanecm, you are welcome! [10:55:06] hehe [12:15:05] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 3 others: VirtualPageView Event Platform Migration - https://phabricator.wikimedia.org/T238138 (10phuedx) [12:15:16] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) 05Open→03Resolved Being **bold** and signing this off. There's been no ob... [13:18:01] urbanecm: hello! I am trying to opping myself but chan serv doesn't like me anymore [13:18:13] elukey: what does it do? :-) [13:19:42] urbanecm: it tells me that I am not on the chan anymore, also if I try /mode etc.. it says that I am not op [13:19:53] that is weird [13:19:58] that's weird bug [13:19:59] I may have messed up something [13:20:06] the access list looks good to me [13:20:10] what if you part and rejoin? [13:20:29] that's literally the only thing that comes to my mind (besides getting a libera staffer) [13:21:11] turn off and on again, yes makes sense [13:22:56] nope [13:23:10] elukey: one last thing. what if you do /msg ChanServ OP elukey explicitly? [13:23:17] \o/ [13:23:25] yeah just did it, now it works :P [13:23:32] great [13:23:37] now let's hope topic changes work too :D [13:23:41] yeah one sec [13:24:00] does it look good? [13:24:31] perfect, thanks elukey [13:44:08] urbanecm: (if you have time) - do you know how to add a chan to wm-bot (to add logging)? [13:44:15] (need it for the -ml chan) [13:44:29] elukey: join #wm-bot, and run @add [13:44:41] super easy [13:44:55] yup [13:45:03] then you need to execute @logon [13:45:05] (in the channel) [13:46:36] urbanecm: in the #ml chan you mean? [13:46:41] yup [14:40:24] 10Analytics, 10Analytics-Kanban, 10Packaging, 10Patch-For-Review: Create a debian package for Apache Airflow - https://phabricator.wikimedia.org/T277012 (10Ottomata) Removed airflow package for now! sorry about that. [15:03:44] a-team standup [15:12:26] 10Analytics-Clusters, 10Analytics-Kanban: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10hnowlan) a:05hnowlan→03None [15:13:27] 10Analytics-Clusters: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10Ottomata) a:03hnowlan [15:19:10] !log rm -rf /tmp/analytics/* on an-launcher1002 - T283126 [15:19:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:19:13] T283126: Failures registered by drop_event on an-launcher1002 - https://phabricator.wikimedia.org/T283126 [15:37:09] razzi,ottomata - ops sync? [15:37:29] or are we skipping it? [15:37:43] hi elukey, we're in the common grooming, with plans to ops sync at the end [15:38:13] but now I can see this wasn't a perfect plan as we didn't communicate with you! [15:38:28] razzi: yep not a great plan :) [15:39:56] ah sorry luca yeah you raen't in standup anymore so you don't get the memo! [15:40:03] elukey: got time in 20? [15:40:42] elukey: olja wanted to do some goals grooming stuff sorry aout that [15:41:22] ottomata: sure but we can also skip, np for me :) [15:42:18] i think would be good to sync re hadoop master os upgrade annnnd other stuff too maybe [15:43:33] sure [15:59:12] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) Thanks @elukey - I am on clinic duty this week, so we'll see if I have time for this :( [16:00:54] elukey: ok wee in ops sync meet [16:03:01] ottomata: joining [16:03:43] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) >>! In T278423#7094641, @razzi wrote: > - Failover to 1002 by running the following on 1001: > - `systemctl stop hadoop-hdfs-namenode` > - `syst... [16:26:10] 10Analytics-Radar, 10Product-Analytics: /srv/published should be structured similarly, have identical README across stat hosts describing said structure - https://phabricator.wikimedia.org/T254189 (10mpopov) 05Open→03Declined In the time since filing this task we have not encountered any problems that this... [16:31:03] 10Analytics-Radar, 10Product-Analytics, 10Product-Infrastructure-Team-Backlog, 10Epic: Re-define what constitutes a mobile pageview - https://phabricator.wikimedia.org/T257277 (10SNowick_WMF) 05Open→03Resolved See https://phabricator.wikimedia.org/T257860 > PageviewDefinition to only include /api/re... [17:46:32] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10nshahquinn-wmf) >>! In T275233#7101291, @Ottomata wrote: > @nshahquinn-wmf Hive CLI logs are certainly annoying/weird. Q though: any re... [17:57:17] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10Ottomata) Ahhh, thanks for the context I (kind of!) remember now :) I just tried pyhive with kerberos (for other reasons) and was able... [18:04:26] !log suspend failing cassandra 3 oozie loading jobs: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [18:04:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:04:45] oh mforns just saw your email [18:05:00] mforns: are those jobs loading to both cassandra 2 and cassandra 3 right now? [18:05:10] ottomata: yes, I think so [18:05:14] oh ok [18:05:16] will un-suspend [18:05:17] hm [18:05:19] It was part of the migration to Cassandra3 [18:05:49] !log resume failing cassandra 3 oozie loading jobs, they are also loading to cassandra 2: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [18:05:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:06:03] for some reason only those 2 jobs are failing... [18:06:18] indeed [18:07:08] not sure what are the next steps there, we can ask tomorrow in standup [18:15:39] ottomata: can you review this please? https://gerrit.wikimedia.org/r/c/operations/puppet/+/693931 [18:16:09] great stuff mforns ! ready for merge? [18:17:02] ottomata: I think so, VirtualPageView is fully produced through EventGate since last week no??? [18:17:19] yup should be [18:17:47] mforns: i'll run puppet and bounce eventlogging procesesor [18:17:58] let me check the amount of old-style events [18:18:03] ok [18:24:55] ottomata: about 25 old-style events per minute (for all wikis), which means about 25/60000*100=0.04% of all events [18:25:02] ok cool [18:25:11] very few [18:25:15] yea [18:25:16] i think thats good to go mforns [18:25:18] ok cool [18:25:20] bouncing el processor [18:25:21] k! [18:32:41] 10Analytics, 10I18n, 10RTL: Support right-to-left languages in Wikistats - https://phabricator.wikimedia.org/T251376 (10razzi) I worked on this at the wikimedia hackathon, and got a prototype working. Some screenshots: {F34466265} {F34466269} {F34466268} {F34466275} Here are the steps: - New (rtl) languag... [18:37:01] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 4 others: VirtualPageView Event Platform Migration - https://phabricator.wikimedia.org/T238138 (10mforns) [18:58:02] 10Analytics, 10Analytics-Kanban, 10Event-Platform: WMDEBanner* Event Platform Migration - https://phabricator.wikimedia.org/T282562 (10mforns) Hi WMDE folks! I have 1 question regarding the migration of WMDEBanner* schemas: With the old EventLogging system, the fields `client_ip` and `geocoded_data` were co... [19:00:47] razzi: hey, after elukey's comments I rearranged a bit the puppet code for rsync'ing the reportupdater logs to HDFS, and now Jenkins is not complaining! I think it's ready for review, could you please look at it when you have time? :) [19:01:01] Sounds good, I'll take a look mforns [19:01:21] oh, razzi, forgot the link: https://gerrit.wikimedia.org/r/c/operations/puppet/+/692909 [19:01:29] thanks :] [19:07:04] BTW razzi, I'd like to learn a bit about Airflow system setup, given that I'll be probably working on it soon. Could I pair/rubberduck/shadow with you sometime when you're working on it? [19:09:25] Sounds good, I have a lot to learn for airflow myself, maybe we can hang later today [19:09:51] mforns: quick question for "Rsync logs to HDFS": why put logs in /tmp/reportupdater/logs? [19:09:55] (hdfs /tmp/reportupdater/logs) [19:10:50] razzi: the idea is to mirros logs somewhere that people without access to an-launcher1002 can look at them [19:11:07] we considered Logstash, but it would be a lot more work [19:11:12] *mirrir [19:11:15] *mirror [19:11:17] hehe [19:12:07] this way, users of reportupdater can troubleshoot their job's logs without being blocked on us doing so [19:12:39] we also gave them +2 rights on the reportupdater-queries repo, so they can merge their changes [19:13:00] and we made sure they can test their changes and new jobs in stat100* machines [19:13:24] so, this is part of this task: https://phabricator.wikimedia.org/T274880 [19:14:12] the only thing they will be blocked on will be activation/deactivation of reportupdater job sets, but that happens only once per team, so I think that'll be OK [20:06:37] 10Analytics, 10Product-Analytics: Request to delete test_gsc_* datasets from Druid (& Superset/Turnilo) - https://phabricator.wikimedia.org/T283536 (10mpopov) [21:08:31] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10wmfdata-python: wmfdata-python's Hive query output includes logspam - https://phabricator.wikimedia.org/T275233 (10nshahquinn-wmf) >>! In T275233#7108441, @Ottomata wrote: > I just tried pyhive with kerberos (for other reasons) and was able to get it... [22:20:22] I'm going to failover hadoop to an-master with a manual failover to make sure everything is working before starting maintenance tomorrow