[00:01:38] <milimetric>	 Reedy: yes for startDate, it's set correctly for example in "Unique Wiki Count", but sadly no endDate (nobody asked for that, but it should be easy)
[00:03:12] <milimetric>	 Reedy: and yeah, since each version has their own set of graphs, I guess you'll have to manually update those to what you are keeping track of.  But the data's all there.
[00:04:45] <Reedy>	 So I can just move the dataRange: startDate: date from the level with title, into the individual graph object?
[00:04:55] <Reedy>	 Is there a.. spec somewhere? :)
[00:05:11] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Reedy)
[00:06:17] <milimetric>	 Reedy: sadly no spec, funding was pulled like 4 years ago :P
[00:06:27] * Reedy pretends to be shocked
[00:06:29] <milimetric>	 and no, sorry, I misunderstood, you can only set the startDate for the tab
[00:06:47] <milimetric>	 I could add a per-graph startDate if you need
[00:07:07] <Reedy>	 aha
[00:07:27] <Reedy>	 That'd be cool (but obviously not something I need right now). Want me to file a task?
[00:07:48] <milimetric>	 um, honestly if you file a task it would get declined :)
[00:07:53] <Reedy>	 haha
[00:07:58] <milimetric>	 just ping me whenever you want it and I'll do it for fun in my free time
[00:07:59] * Reedy fedex's beer
[00:08:04] <Reedy>	 Can I (currently) reverse sort the data (by date)?
[00:08:51] <milimetric>	 oh in the tabular view?  No, funny enough I wanted to make it that way and someone I forget who was *adamant* that it should start with the earliest date...
[00:09:04] <Reedy>	 heh
[00:09:07] <milimetric>	 (which makes no sense for most graphs)
[00:09:15] <milimetric>	 but I'm happy to reverse it now, since it's like you and me that care anymore
[00:09:21] <Reedy>	 <3
[00:09:41] <milimetric>	 here, if I were you I'd just write an email with your wishlist and I'll do them all that way
[00:10:07] <Reedy>	 If you want me to
[00:10:34] <milimetric>	 well, in case you come up with more, but if it's just start date per graph and reverse sort the table, I can do that
[00:11:11] <Reedy>	 Those two would be great. end date would be nice too (at some point)...
[00:11:42] <Reedy>	 I don't know where the request should lie... But what about normalisation of data? So "other" and "Other" are not different? :)
[00:15:33] <milimetric>	 oh, the normalization would be in the query itself, so the reportupdater job: https://github.com/wikimedia/analytics-reportupdater-queries/tree/master/pingback
[00:16:00] <milimetric>	 I'll write a task then and cc you, keeping it volunteer time.  I'll try and get it done this week.
[00:17:15] <Reedy>	 aha, I couldn't find where that code lived :)
[00:17:46] <Reedy>	 That'd be great. Feel free to ping/CC me for CR as it's "just" SQL :)
[00:17:55] <Reedy>	 (though I guess I could also do the patch)
[00:18:38] <milimetric>	 Reedy: you can feel free to do that patch, people maintain all their own queries there
[00:19:29] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Reedy) SQL queries are in https://github.com/wikimedia/analytics-reportupdater-queries/tree/master/pingback
[00:21:23] <wikibugs>	 10Analytics, 10Analytics-Dashiki: Dashiki fixes needed for Pinback dashboard - https://phabricator.wikimedia.org/T298929 (10Milimetric)
[00:24:11] <wikibugs>	 10Analytics, 10Analytics-Dashiki: Dashiki fixes needed for Pingback dashboard - https://phabricator.wikimedia.org/T298929 (10Reedy)
[00:26:09] <wikibugs>	 10Analytics, 10Analytics-Dashiki: Dashiki fixes needed for Pingback dashboard - https://phabricator.wikimedia.org/T298929 (10Reedy)
[00:39:24] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Reedy) Affected:  https://pingback.wmflabs.org/#unique-wiki-count
[01:19:53] <icinga-wm>	 RECOVERY - Check unit status of monitor_refine_event on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[03:01:05] <wikibugs>	 (03PS1) 10Neil P. Quinn-WMF: Make Wikipedia Preview job use new mobile device tag [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/752808 (https://phabricator.wikimedia.org/T297173)
[03:02:23] <wikibugs>	 (03CR) 10Neil P. Quinn-WMF: [V: 03+2 C: 03+2] "This job has no other maintainers and I've already deployed it, so I will self-merge." [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/752808 (https://phabricator.wikimedia.org/T297173) (owner: 10Neil P. Quinn-WMF)
[08:20:29] <wikibugs>	 10Analytics-Radar, 10fundraising-tech-ops: puppetize CA changes for kafkatee on fundraising banner loggers - https://phabricator.wikimedia.org/T296765 (10elukey) @Jgreen thanks a lot!
[08:21:21] <wikibugs>	 10Analytics-Radar, 10Event-Platform, 10SRE, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey)
[08:21:54] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Radar, 10Event-Platform, 10Patch-For-Review: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10elukey) 05Stalled→03In progress Back to in-progress, the FR kafkatee instances moved to the new bundle!
[08:26:52] <elukey>	 hello folks!
[08:27:07] <elukey>	 We are ready to move varnishkafka to the new CA bundle - https://gerrit.wikimedia.org/r/c/operations/puppet/+/742747
[08:27:27] <elukey>	 (after that, if eventgate follows, we'll be able to migrate Jumbo to the new Kafka CA intermediate)
[08:27:36] <elukey>	 let me know if there are concerns
[08:27:55] <elukey>	 (the change has been running on cp3050 up to now)
[08:36:25] <joal>	 Good morning elukey - no problem for me :)
[08:42:43] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Radar, 10Event-Platform, 10Patch-For-Review: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10elukey) Next steps:  - Move eventgate-analytics to the new cert bundle - https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/...
[08:42:47] <elukey>	 bonjour :)
[09:49:43] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Reedy) Is there some easy way for me to run/test the queries?  Is just altering the run queries enough? Or is there data on disk that needs updating?
[10:01:12] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10JAllemandou) You can test queries on any stat-machine @Reedy . For instance:  ` stat1008> reportupdater-queries/pingback/serverSoftware 2021-12-09 2022-01-08 `  Altering the queries will make...
[10:08:02] <Reedy>	 joal: turtles :)
[10:08:17] <Reedy>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater#How_to_test%3F
[10:08:26] <Reedy>	 >To install dependencies,
[10:08:34] <Reedy>	 pip isn't on stat1008 :D
[10:10:34] <joal>	 Reedy: You can create venvs on stat1008 if needed, but I wouldn't go for testing using reportupdater
[10:11:08] <joal>	 Reedy: You can test queries directly running them if you wish
[10:13:04] <Reedy>	 Directly to hive?
[10:13:12] <Reedy>	 I think I'll need to request a krb token
[10:13:24] <joal>	 correct Reedy - you'll need a ticket
[10:13:39] * Reedy files a task
[10:13:40] <Reedy>	 thanks :)
[10:14:30] <joal>	 You're welcome Reedy :)
[10:17:22] <wikibugs>	 10Data-Engineering: Kerberos password for reedy - https://phabricator.wikimedia.org/T298951 (10Reedy)
[10:17:33] <wikibugs>	 10Data-Engineering: Kerberos identity for reedy - https://phabricator.wikimedia.org/T298951 (10Reedy)
[10:21:09] * btullis picks up the task
[10:21:22] <joal>	 Hi btullis :) Thanks
[10:21:43] <wikibugs>	 10Data-Engineering: Kerberos identity for reedy - https://phabricator.wikimedia.org/T298951 (10BTullis) p:05Triage→03Medium a:03BTullis
[10:24:11] <wikibugs>	 10Data-Engineering: Kerberos identity for reedy - https://phabricator.wikimedia.org/T298951 (10BTullis) I have created the kerberos principal and the welcome email has been automatically sent. ` btullis@krb1001:~$ sudo manage_principals.py get reedy get_principal: Principal does not exist while retrieving "reedy...
[10:27:32] <taavi>	 that reminds me.. elukey: hi! let me know when you have some time to look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/751100
[10:27:41] <Reedy>	 btullis: thanks!
[10:30:11] <btullis>	 Reedy: A pleasure.
[10:30:12] <wikibugs>	 10Data-Engineering, 10Patch-For-Review: Kerberos identity for reedy - https://phabricator.wikimedia.org/T298951 (10BTullis) 05Open→03Resolved
[10:30:41] <elukey>	 taavi: hi! yep yep I'll try to review it :)
[10:52:17] <joal>	 Heya elukey - I have received errors from an-test-coord about webrequest jobs - heap space issues
[10:52:25] <joal>	 investigating
[10:52:37] <elukey>	 :(
[10:52:38] <elukey>	 thanks
[10:52:46] <elukey>	 hive heap space issues?
[10:53:29] <joal>	 mapreduce task heap space I think - triple checking
[10:58:25] <mforns>	 hey teamm :]
[10:58:40] <joal>	 Hi mforns 
[11:10:23] <joal>	 elukey: error in hive-server2 AFAICS - No map-reduce job, exec.TaskRunner: Error in executeTask java.lang.OutOfMemoryError: Java heap space in hive-server2 logs :(
[11:12:37] <elukey>	 query moritzm 
[11:12:41] <elukey>	 uff :)
[11:17:28] <elukey>	 joal: found the issue, it was an upgrade problem (likely mine)
[11:17:55] <elukey>	 the server2 was running with -Xmx256m (the default), meanwhile we add other things
[11:18:00] <elukey>	 I restarted it and now I see "-Xmx256m -Xms6g -Xmx6g ..."
[11:18:10] <elukey>	 so the heap errors were due to a misconfig, all good
[11:18:16] <joal>	 awesome elukey - thank you :)
[11:18:16] <elukey>	 does it make sense?
[11:18:42] <joal>	 elukey: misconfig due to upgrade process having messed up some of it - Am I right?
[11:18:56] <elukey>	 yes, you can summarize with "Luca's fault"
[11:19:08] <joal>	 Meh - upgrade fault :)
[11:19:21] <elukey>	 puppet needs to run after we install the packages to update the hive-env.sh etc..
[11:19:39] <elukey>	 if hive metatore/server is started before, then it picks up the default config
[11:19:52] <elukey>	 I had to rollback and rollforward many times
[11:20:00] <elukey>	 I missed the last one :D
[11:20:10] <elukey>	 anyway, we can keep monitoring some jobs, but it should be ok
[11:20:15] <joal>	 thanks a lot for the quick fix :)
[11:23:47] <btullis>	 Thanks Luca.
[11:23:51] <elukey>	 <3
[11:35:08] <Reedy>	 Which host(s) have /srv/reportupdater mounted?
[11:36:04] <elukey>	 an-launcher1002, only available to the data eng team
[11:36:14] <elukey>	 (it is deployed via scap)
[11:36:17] <joal>	 ah you beat on it elukey :)
[11:36:18] <Reedy>	 heh
[11:36:35] <Reedy>	 Can someone tar me up a copy of the pingback files from there and put it somewhere I can get access?
[11:36:37] <joal>	 Reedy: you can clone the reportupdater-query repo on any other host
[11:36:50] <Reedy>	 I've got a feeling to fix what I want,  the existing outputs probably need changing
[11:37:13] <joal>	 Reedy: a task will be a good start :)
[11:37:23] <Reedy>	 To get that data? :P
[11:37:27] <Reedy>	 unfortunately stuff like this isn't helpful
[11:37:28] <Reedy>	 reedy@stat1008:~$ reportupdater-queries/pingback/serverSoftware 2017-04-01 2022-01-08
[11:37:28] <Reedy>	 reedy@stat1008:~$
[11:37:37] <Reedy>	 (presumably erroring and error suppression)
[11:37:48] <Reedy>	 smaller date ranges work fine
[11:37:49] <Reedy>	 reedy@stat1008:~$ reportupdater-queries/pingback/serverSoftware 2021-12-09 2022-01-08
[11:37:49] <Reedy>	 date	apache	microsoft-iis	nginx	lighttpd	litespeed	other
[11:37:49] <Reedy>	 2021-12-09	25421	101	2485	60	1928	90
[11:44:13] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Reedy)
[12:25:35] <joal>	 Taking a break!
[12:47:22] <elukey>	 I have worked with Moritz today to fix an issue with reprepro, it seems that the bigtop repositories (hosted on AWS S3) changed recently and our settings didn't work anymore (the checkupdate). It has been temporarily fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/753019
[12:47:54] <elukey>	 while doing it, Moritz realized that there was a weird setting, namely our bigtop component in apt pointed to the bigtop's stretch repos, not the buster ones
[12:48:07] <elukey>	 the packages are identical, so the config was fixed
[12:48:18] <elukey>	 (simply replacing the names)
[12:48:49] <elukey>	 everything now works as before, checkupdates etc.. in reprepro is good, but keep it in mind in case we'll see some weirdnesses in the future
[12:58:54] <milimetric>	 Reedy, the output is automatically rsynced here: https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/pingback/, that's where dashiki gets it from.  You'll see the relative url in the dadhiki config, we wanted to avoid people loading random stuff
[12:59:19] <milimetric>	 (I'll mention this in the task)
[12:59:50] <Reedy>	 thanks
[13:00:28] <Reedy>	 writing something to reprocess those discrepencies isn't difficult
[13:01:05] <Reedy>	 and AFAIS is what needs to happen to fix this
[13:01:13] <Reedy>	 (or re-running all the reports for all the date periods?)
[13:03:59] <wikibugs>	 10Analytics, 10MediaWiki-General: Pingback dashboard data normalisation - https://phabricator.wikimedia.org/T298928 (10Milimetric) The output is at https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/pingback/  If you want, you can download that stuff and reply here with a transform scr...
[13:44:26] <milimetric>	 Reedy: rerunning these particular reports I think takes a pretty long time, so it's probably easier to fix the output.  If we take a backup before we do it, it's pretty safe
[13:44:55] <Reedy>	 I was probably going to throw them in a git repo, commit, run script, commit and have a diff
[14:09:37] <wikibugs>	 10Data-Engineering, 10Airflow: [Airflow] Add deletion job for old anomaly detection data - https://phabricator.wikimedia.org/T298972 (10mforns)
[14:10:14] <Reedy>	 milimetric: https://github.com/reedy/pingback-reformat/commit/a0732ee0a9bb3e286bb0b2954b6a86f76057e1d7
[14:10:20] <Reedy>	 Not verified any of it (yet) :)
[14:12:28] * Reedy needs to try a side by side diff locally
[14:16:32] <elukey>	 I opened https://issues.apache.org/jira/browse/BIGTOP-3630 for the repo issue
[14:22:24] <elukey>	 ottomata: o/
[14:22:27] <elukey>	 morninggg
[14:22:35] <elukey>	 should I just merge https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/752992 right?
[14:22:43] <elukey>	 and it will trigger a new image build
[14:22:55] <elukey>	 never done it for eventgate
[14:23:03] <ottomata>	 elukey:  that's rigth!
[14:23:16] <ottomata>	 then if you want to deploy... :) should just be the usual helmfile apply stuff in each service dir
[14:23:27] <ottomata>	 oh well
[14:23:31] <elukey>	 ah snap I don't have +2
[14:23:36] <ottomata>	 with a commit to deployment charts to bump the image version
[14:23:38] <ottomata>	 oh will merge
[14:23:41] <elukey>	 <3
[14:24:06] <elukey>	 we can probably couple it with the change in the CA cert to use
[14:24:13] <ottomata>	 oh ya
[14:24:13] <ottomata>	 okay
[14:24:14] <elukey>	 if it is easy enough
[14:24:27] <elukey>	 no rush, we can do it even next week
[14:24:38] <ottomata>	 k lemme know what/when i can help
[14:24:41] <ottomata>	 ty!
[14:26:16] <elukey>	 I will try submit a change and you can -1/-2 indefinitely until I get it right :D
[14:33:20] <milimetric>	 Reedy: cool, let me know when you think it's safe and I can back up and apply it, then you can verify the result in the dashboards
[14:33:33] <milimetric>	 (you could do this locally in theory, but I don't remember if there's anything broken in the dev setup)
[15:24:53] <wikibugs>	 10Data-Engineering-Kanban, 10SRE, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to the data engineering team resources for Antoine Qu'hen - https://phabricator.wikimedia.org/T298657 (10cmooney) a:05BTullis→03cmooney On the back of Olja's explicit approval I've added the username to the '...
[16:09:30] <wikibugs>	 10Data-Engineering, 10Data-Engineering-Kanban, 10Infrastructure-Foundations, 10SRE, and 3 others: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10JAllemandou) a:03JAllemandou
[16:13:28] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10bking)
[16:13:53] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10bking)
[16:14:37] <joal>	 ottomata: could you please quickly review https://gerrit.wikimedia.org/r/c/operations/puppet/+/738874?
[16:14:46] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10bking)
[16:15:58] <inflatador>	 Greetings! New hire here requesting Kerberos access, no rush: https://phabricator.wikimedia.org/T298981
[16:19:22] <elukey>	 inflatador: o/ if you want to have fun with kerberos - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user :)
[16:19:40] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10Ottomata) Approved.
[16:19:40] <elukey>	 (otherwise somebody will create the principal soon)
[16:20:02] <ottomata>	 inflatador: just for good measure can you get gehel to add approval on that ticket too!
[16:20:08] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10Gehel) Approved
[16:20:09] <ottomata>	 ?
[16:20:15] <urbanecm>	 that was quick :)
[16:20:16] <ottomata>	 Ohp there he goes :0
[16:20:28] <ottomata>	 ya inflatador  you can probably do it yourself now if you like
[16:21:01] <inflatador>	 ottomata if I wanna do it myself, I should follow the page that elukey just linked?
[16:24:13] <ottomata>	 yup!
[16:26:21] <inflatador>	 ACK, on it
[16:31:29] <wikibugs>	 10Analytics-Clusters, 10Data-Engineering, 10Data-Engineering-Kanban, 10Cassandra, and 2 others: Investigate high levels of garbage collection on new AQS nodes - https://phabricator.wikimedia.org/T298516 (10BTullis) The trouble is that this pattern has bene stable for 7 days now. aqs1014-b is still fairly b...
[16:33:52] <elukey>	 ottomata: I added some thoughts to https://phabricator.wikimedia.org/T298087#7613232
[16:43:05] <inflatador>	 I was able to 'kinit' successfully! Thanks ottomata and elukey !
[16:43:47] <wikibugs>	 (03CR) 10Mforns: Sanitize additional event streams (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/747065 (https://phabricator.wikimedia.org/T297679) (owner: 10Awight)
[16:45:31] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10bking) This is confirmed working, feel free to close this ticket.
[16:55:33] <ottomata>	 great inflatador !
[16:55:38] <ottomata>	 elukey:  saw that, makes total sense to me!
[16:55:47] <mforns>	 joal: what's a snak?
[16:55:52] <mforns>	 :]
[16:56:03] <wikibugs>	 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Cmjohnson) I've tried a different partman recipe. I do not know what is wrong or why the raid fails.
[16:57:02] <joal>	 mforns: it's a wikidata term (https://www.wikidata.org/wiki/Wikidata:Glossary#:~:text=Snak%20is%20a%20technical%20term,%22%20and%20%22unknown%20value%22.)
[16:57:43] <joal>	 mforns: I think it's an acronym for "Some Notation About Knoweldge" (see https://www.wikidata.org/wiki/Q86719099)
[16:57:57] <mforns>	 joal: I see, thanks!
[17:00:19] <wikibugs>	 (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/747508 (https://phabricator.wikimedia.org/T258834) (owner: 10Joal)
[17:36:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org
[17:41:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org
[17:49:04] <btullis>	 I have merged this change to the classpath on the hadoop coordinators: https://gerrit.wikimedia.org/r/c/operations/puppet/+/752673
[17:49:45] <btullis>	 Tomorrow I will upgrade the hive and oozie servers, failing over from an-coord1001 to an-coord1002 and back with DNS.
[17:55:39] <ottomata>	 btullis:  already done test cluster?
[18:08:54] <wikibugs>	 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host an-test-coord1002.eqiad.wmnet with O...
[18:09:26] <btullis>	 Yes. Upgraded hive and oozie on an-test-coord1001, oozie on an-test-client1001 , hive on an-test-worker100[1-3] - I think it's all fine.
[18:09:49] <joal>	 btullis: would the alert be timely sync with ou merging?
[18:09:50] <ottomata>	 great
[18:12:22] <btullis>	 joal: I don't think so. I merged after I saw the alert, at 5:48. I don't think that my merge or the deploy will cause any automatic restart to hive. It's just a change to `/etc/hadoop/conf/hadoop-env.sh`
[18:13:19] <joal>	 ack btullis -  those alerts are regular, we should fix them (possibly bump heap for hiveserver on test?)
[18:13:20] <btullis>	 ...but the alert reminded me that I'm going to be restarting hive anyway tomorrow.
[18:13:45] <elukey>	 joal: +1, the 80% mark is a little low for the current hive heap usage
[18:14:37] <btullis>	 joal: Yes. Agreed, I think we should exclude test from alerts, which is what it used to be. I accidentally re-enabled it. But we also got alerts from the non-test hive, so yes I can bump that value.
[18:16:01] <joal>	 yeah let's review that tomorrow - thanks btullis :)
[18:16:16] <btullis>	 Will do.
[18:20:16] <wikibugs>	 10Data-Engineering, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Analytics, 10Product-Analytics (Kanban): MobileWikiAppiOSUserHistory sending incompatible data - https://phabricator.wikimedia.org/T298721 (10ldelench_wmf) p:05Triage→03High
[18:30:07] <joal>	 btullis: your meeting for checking is a bit early on my calendar!
[18:31:58] <joal>	 btullis: 7am is very early, I think it'll be too early for all of us :)
[18:34:33] <wikibugs>	 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host an-test-coord1002.eqiad.wmnet with OS buster completed: - an-t...
[18:34:37] <wikibugs>	 10Analytics-Legal, 10SRE: Options for creating internal (NDA-requiring) dashboards based on data from Google and Big search consoles - https://phabricator.wikimedia.org/T298991 (10AndyRussG)
[18:39:39] <wikibugs>	 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10elukey) @Cmjohnson an-test-coord1002 done, there was an issue with your partman patch (it was targeting an-test-worker1002 instead of an-test-coord1002), bu...
[18:59:45] <wikibugs>	 10Analytics-Legal, 10SRE: Options for creating internal (NDA-requiring) dashboards based on data from Google and Big search consoles - https://phabricator.wikimedia.org/T298991 (10RhinosF1) #Analytics-Legal says "Public project for the Analytics and Techops team for reviewing incoming requests from WMF-Legal....
[19:01:14] <RhinosF1>	 AndyRussG: do I make sense ^
[19:02:28] <AndyRussG>	 RhinosF1: oh oops yes, thanks for that!
[19:02:59] <RhinosF1>	 AndyRussG: no problem, it's not the easiest workflow to make sense of
[19:03:13] <RhinosF1>	 Legal@wikimedia.org will be best
[19:06:37] <AndyRussG>	 RhinosF1: cool, thanks!!
[19:07:00] <RhinosF1>	 Np
[19:07:26] <RhinosF1>	 Knowing random obscure stuff about phab workflows is one of my many abilities
[19:10:46] <RhinosF1>	 I can't make most of them any nicer
[19:10:49] <RhinosF1>	 But I can be aware
[19:34:47] <wikibugs>	 10Data-Engineering, 10LDAP-Access-Requests, 10SRE, 10SRE-Access-Requests: Create Kerberos login for Brian King (bking) - https://phabricator.wikimedia.org/T298981 (10cmooney) 05Open→03Resolved a:03cmooney Ok no problem if there is anything not working just drop me a line on irc :)
[19:34:50] <btullis>	 joal: sorry. Calendar mistake.
[20:17:04] <btullis>	 Fixed the timezone in the invitation.
[20:38:24] <wikibugs>	 10Data-Engineering, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Analytics, 10Product-Analytics (Kanban): MobileWikiAppiOSUserHistory sending incompatible data - https://phabricator.wikimedia.org/T298721 (10SNowick_WMF) Thanks @Ottomata - noting here for posterity that I am able to extract data from prio...
[20:39:26] <wikibugs>	 10Data-Engineering, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Analytics, 10Product-Analytics (Kanban): MobileWikiAppiOSUserHistory sending incompatible data - https://phabricator.wikimedia.org/T298721 (10Ottomata) Ok, thank you!
[21:10:42] <wikibugs>	 10Data-Engineering-Kanban, 10Airflow: Tooling for Deploying Conda Environments - https://phabricator.wikimedia.org/T296543 (10Ottomata) Okay, got some tests in and created a merge request:  https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/merge_requests/4  It is still a little bit WIP, but...
[21:20:21] <wikibugs>	 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-test-coord1002 - https://phabricator.wikimedia.org/T293938 (10Cmjohnson) 05Open→03Resolved Thanks @elukey resolving the task
[23:20:13] <wikibugs>	 (03PS1) 10Jenniferwang: Bug: T299007 Add the mediawiki_reading_depth event platform stream to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/753178 (https://phabricator.wikimedia.org/T299007)
[23:22:08] <wikibugs>	 10Data-Engineering, 10Platform Engineering Roadmap, 10User-Eevans: Obtain a security review of AQS 2.0 - https://phabricator.wikimedia.org/T288663 (10odimitrijevic)
[23:22:10] <wikibugs>	 10Data-Engineering: Enforce authentication and authorization for webrequest_* topics in Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T294264 (10odimitrijevic)
[23:22:12] <wikibugs>	 10Data-Engineering, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10odimitrijevic)
[23:22:16] <wikibugs>	 10Data-Engineering: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10odimitrijevic)
[23:22:20] <wikibugs>	 10Data-Engineering, 10Product-Analytics, 10Epic: Reconstruct Hive & Hadoop permissions for shared database - https://phabricator.wikimedia.org/T288983 (10odimitrijevic)
[23:22:28] <wikibugs>	 10Data-Engineering: Crunch and delete many old dumps logs - https://phabricator.wikimedia.org/T280678 (10odimitrijevic)
[23:22:32] <wikibugs>	 10Data-Engineering: Address jackson version security vulnerabilities in refinery-source - https://phabricator.wikimedia.org/T272058 (10odimitrijevic)
[23:22:36] <wikibugs>	 10Data-Engineering: Add Authentication/Encryption to Kafka Jumbo's clients - https://phabricator.wikimedia.org/T250146 (10odimitrijevic)
[23:22:40] <wikibugs>	 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add logic to purging scripts that requires admin action if it's about to delete a lot of data - https://phabricator.wikimedia.org/T270433 (10odimitrijevic)
[23:22:44] <wikibugs>	 10Data-Engineering, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10odimitrijevic)
[23:22:48] <wikibugs>	 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Switch off skipTrash for some data purging - https://phabricator.wikimedia.org/T270431 (10odimitrijevic)
[23:22:52] <wikibugs>	 10Data-Engineering: Build a process to check permissions when changing datasets from non-PII to PII - https://phabricator.wikimedia.org/T273818 (10odimitrijevic)
[23:22:56] <wikibugs>	 10Data-Engineering, 10FR-Tech-Analytics, 10Privacy Engineering: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10odimitrijevic)
[23:23:00] <wikibugs>	 10Data-Engineering: Update geocode UDF to NOT lookup some addresses - https://phabricator.wikimedia.org/T271340 (10odimitrijevic)
[23:23:04] <wikibugs>	 10Data-Engineering: Add authentication and encryption to Druid Analytics clients - https://phabricator.wikimedia.org/T250484 (10odimitrijevic)
[23:23:08] <wikibugs>	 10Data-Engineering: Defining a better authentication scheme for Druid and Presto - https://phabricator.wikimedia.org/T241189 (10odimitrijevic)
[23:23:12] <wikibugs>	 10Data-Engineering, 10Research-Backlog, 10SRE, 10WMF-Legal, 10User-Elukey: Enable layered data-access and sharing for a new form of collaboration - https://phabricator.wikimedia.org/T245833 (10odimitrijevic)
[23:23:22] <wikibugs>	 10Data-Engineering: Sesssion reconstruction - evaluate  privacy threat - https://phabricator.wikimedia.org/T194058 (10odimitrijevic)
[23:23:26] <wikibugs>	 10Data-Engineering: Idea: Add 'top X bigger than Y' sanitization method to EL-to-Druid - https://phabricator.wikimedia.org/T251145 (10odimitrijevic)
[23:23:31] <wikibugs>	 10Data-Engineering, 10User-Elukey: Only hdfs (or authenticated user) should be able to run Druid indexing jobs - https://phabricator.wikimedia.org/T192959 (10odimitrijevic)
[23:23:39] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform, 10Product-Analytics: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization - https://phabricator.wikimedia.org/T276955 (10odimitrijevic)
[23:23:51] <wikibugs>	 10Analytics, 10Data-Engineering, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10odimitrijevic)
[23:47:00] <wikibugs>	 10Data-Engineering, 10Data-Engineering-Kanban, 10Product-Analytics (Kanban): Test log file and error notification - https://phabricator.wikimedia.org/T295733 (10Mayakp.wiki) @BTullis : I checked the log file again, it doesn't have the error message encountered during the Jan 9 job run.