[05:00:29] <icinga-wm>	 RECOVERY - MegaRAID on an-worker1079 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[05:34:55] <icinga-wm>	 PROBLEM - MegaRAID on an-worker1079 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
[08:25:37] <wikibugs>	 (03CR) 10Joal: "I have not looked at the login of the code - some comments about parameters setting and minor nits." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/829862 (https://phabricator.wikimedia.org/T305841) (owner: 10Mforns)
[08:45:14] <wikibugs>	 (03PS1) 10Btullis: fix(standalone-consumers) Removes Solr from spring boot application config [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/830098
[09:09:53] <wikibugs>	 (03CR) 10Btullis: [C: 03+2] fix(standalone-consumers) Removes Solr from spring boot application config [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/830098 (owner: 10Btullis)
[09:13:37] <wikibugs>	 10Quarry, 10Documentation-Review-Board, 10Key docs update 2021-22: Quarry docs - https://phabricator.wikimedia.org/T307011 (10KBach) 05Resolved→03In progress
[09:13:43] <wikibugs>	 10Quarry, 10Documentation-Review-Board, 10Key docs update 2021-22: Quarry docs - https://phabricator.wikimedia.org/T307011 (10KBach) 05In progress→03Resolved To me, this task is complete. @apaskulin, @Aklapper - please let me know if you have any comments. If not, I'll resolve this one in the coming weeks.
[09:26:25] <wikibugs>	 (03CR) 10Joal: [V: 03+2] Update cassandra hql loading file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/828518 (https://phabricator.wikimedia.org/T311507) (owner: 10Joal)
[09:35:02] <wikibugs>	 (03Merged) 10jenkins-bot: fix(standalone-consumers) Removes Solr from spring boot application config [analytics/datahub] (wmf) - 10https://gerrit.wikimedia.org/r/830098 (owner: 10Btullis)
[09:55:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp4022 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4022%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[09:55:50] <btullis>	 !log merged and deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/821695
[09:55:51] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[10:00:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp4022 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4022%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[10:00:57] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01): Create cassandra loading HQL files from their oozie definition - https://phabricator.wikimedia.org/T311507 (10EChetty)
[10:01:02] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01), 10Patch-For-Review: Build and install spark3 assembly - https://phabricator.wikimedia.org/T310578 (10EChetty)
[10:01:12] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10EChetty)
[10:33:48] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 01), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10gmodena) > Adding idea discussed with @Ottomata earlier on. It's probably interesting to separate strea...
[10:40:52] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data Engineering Planning: Get visibility which pages are being heavily edited, plundered, which need patrolling - https://phabricator.wikimedia.org/T315196 (10EChetty)
[10:41:04] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data Engineering Planning: Merge Ks-Arab and Ks-Deva to ks - https://phabricator.wikimedia.org/T314476 (10EChetty)
[10:42:25] <wikibugs>	 10Analytics-Radar, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Platform Team Workboards (MW Expedition): Decouple EventBus and EventFactory - https://phabricator.wikimedia.org/T292121 (10EChetty)
[10:42:29] <wikibugs>	 10Analytics-Radar, 10Data Engineering Planning, 10Metrics-Platform, 10CSS: Schema code samples popup appears under the JSON table - https://phabricator.wikimedia.org/T272857 (10EChetty)
[10:42:49] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning: Investigate Gobblin dataloss during namenode failure - https://phabricator.wikimedia.org/T311263 (10EChetty)
[10:42:53] <wikibugs>	 10Analytics-Kanban, 10Data Engineering Planning, 10Pageviews-Anomaly: Article on Carles Puigdemont has inflated pageviews in many projects - https://phabricator.wikimedia.org/T263908 (10EChetty)
[10:42:59] <wikibugs>	 10Quarry: test tox on PR - https://phabricator.wikimedia.org/T317092 (10rook)
[10:43:03] <wikibugs>	 10Analytics-Radar, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Internet-Archive, 10The-Wikipedia-Library: Store page-links-change data in a database table and make available through a Special page - https://phabricator.wikimedia.org/T221397 (10EChetty)
[10:43:07] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Platform Engineering: EventStreams sending same data over and over (page links change) - https://phabricator.wikimedia.org/T290211 (10EChetty)
[10:43:11] <wikibugs>	 10Analytics-Radar, 10Data Engineering Planning, 10MediaWiki-extensions-EventLogging: SearchSatisfaction has validation errors for event.query - https://phabricator.wikimedia.org/T257331 (10EChetty)
[10:43:29] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream: mediawiki/page/properties-change schema should use map type for added and removed page properties - https://phabricator.wikimedia.org/T281483 (10EChetty)
[10:43:33] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10EChetty)
[10:43:41] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Patch-For-Review: Enable canary events for all streams - https://phabricator.wikimedia.org/T266798 (10EChetty)
[10:43:49] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream: Refine event pipeline at this time refines data in hourly partitions without knowing if the partition  is complete - https://phabricator.wikimedia.org/T252585 (10EChetty)
[10:43:53] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Metrics-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10EChetty)
[10:44:05] <wikibugs>	 10Analytics-Radar, 10Data Engineering Planning, 10Pageviews-API, 10Tool-Pageviews: 429 Too Many Requests hit despite throttling to 100 req/sec - https://phabricator.wikimedia.org/T219857 (10EChetty)
[10:44:11] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Goal: Event Platform: Stream Connectors - https://phabricator.wikimedia.org/T214430 (10EChetty)
[10:44:33] <wikibugs>	 10Analytics-Kanban, 10Data Engineering Planning, 10Event-Platform Value Stream, 10MediaWiki-extensions-EventLogging, and 3 others: Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10EChetty)
[10:44:39] <wikibugs>	 10Analytics-Wikistats, 10Data Engineering Planning: Non-mobile UAs on mobile (2g/gprs, etc) IP-blocks - https://phabricator.wikimedia.org/T58628 (10EChetty)
[10:44:59] <wikibugs>	 10Analytics, 10Data Engineering Planning, 10Event-Platform Value Stream, 10Metrics-Platform: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10EChetty)
[10:46:42] <wikibugs>	 10Analytics, 10Analytics-Wikistats, 10Data Engineering Planning, 10Data Pipelines: Merge Ks-Arab and Ks-Deva to ks - https://phabricator.wikimedia.org/T314476 (10EChetty)
[11:05:12] <jinxer-wm>	 (VarnishkafkaNoMessages) firing: varnishkafka on cp4034 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4034%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[11:10:12] <jinxer-wm>	 (VarnishkafkaNoMessages) resolved: varnishkafka on cp4034 is not sending enough cache_upload requests - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka - https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=ulsfo%20prometheus/ops&var-cp_cluster=cache_upload&var-instance=cp4034%3A9132&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DVarnishkafkaNoMessages
[12:29:30] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 01), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10JAllemandou) > Do we already have a set of use cases for this layout?
[12:36:42] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream (Sprint 01), 10Patch-For-Review: Design Schema for page state and page state with content (enriched) streams - https://phabricator.wikimedia.org/T308017 (10Ottomata) > This sounds like a separate thread though. Maybe we can spike some work on it? +1, just wan...
[12:36:56] <ottomata>	 o/ 
[12:37:06] <joal>	 good morning ottomata
[12:50:06] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Input Metrics - https://phabricator.wikimedia.org/T309274 (10ntsako) Added tables:   `    SELECT country_code,         metric_value growth_rate_unique_devices_column_ab,         year   FROM ntsako.georeadership_input_metrics   WHERE year = 2021      AND metric...
[12:50:48] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Input Metrics - https://phabricator.wikimedia.org/T309274 (10ntsako) Hi @JAnstee_WMF,  Please can you review this.
[12:50:57] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Input Metrics - https://phabricator.wikimedia.org/T309274 (10ntsako) a:05ntsako→03JAnstee_WMF
[12:52:57] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Readership input metrics - https://phabricator.wikimedia.org/T309273 (10ntsako) Added tables:   `    SELECT country_code,         metric_value growth_rate_unique_devices_column_ab,         year   FROM ntsako.georeadership_input_metrics   WHERE year = 2021      AND metric...
[13:05:19] <joal>	 btullis: meeting?
[13:07:23] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Editorship Input Metrics - https://phabricator.wikimedia.org/T309274 (10ntsako) Added tables:   ` SELECT country_code,        commons         commons_column_ja,        mediawiki       mediawiki_column_jb,        wikidata        wikidata_column_jc,        wikipedia...
[13:07:45] <wikibugs>	 10Data-Engineering, 10Equity-Landscape: Readership input metrics - https://phabricator.wikimedia.org/T309273 (10ntsako) a:05ntsako→03JAnstee_WMF
[13:20:45] <wikibugs>	 10Data-Engineering-Kanban, 10Event-Platform Value Stream (Sprint 01), 10Patch-For-Review: [BUG] jsonschema-tools materializes fields in yaml in a different order than in json files - https://phabricator.wikimedia.org/T308450 (10gmodena) >>! In T308450#8161644, @Ottomata wrote: > @JAllemandou @Milimetric @phu...
[13:25:57] <wikibugs>	 10Analytics, 10Data-Engineering, 10Event-Platform Value Stream, 10Patch-For-Review, 10User-Elukey: Port architecture of irc-recentchanges to Kafka - https://phabricator.wikimedia.org/T234234 (10Ottomata)
[13:28:52] <cdanis>	 milimetric: good morning!  I think https://phabricator.wikimedia.org/T314578 is on track to get deployed today but please do let me know if anything else is needed from me
[13:32:16] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: Add tl.wikiquote to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830165 (https://phabricator.wikimedia.org/T317113)
[13:32:48] <wikibugs>	 (03PS1) 10Gerrit maintenance bot: Add az.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830167 (https://phabricator.wikimedia.org/T317119)
[13:42:24] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for today's deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830165 (https://phabricator.wikimedia.org/T317113) (owner: 10Gerrit maintenance bot)
[13:43:06] <wikibugs>	 (03PS2) 10Joal: Add az.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830167 (https://phabricator.wikimedia.org/T317119) (owner: 10Gerrit maintenance bot)
[13:43:22] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging got today's deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830167 (https://phabricator.wikimedia.org/T317119) (owner: 10Gerrit maintenance bot)
[14:27:28] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01): Create conda-base-env with last pyspark - https://phabricator.wikimedia.org/T309227 (10EChetty) 05Open→03Resolved
[14:32:40] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01), 10Patch-For-Review: Build and install spark3 assembly - https://phabricator.wikimedia.org/T310578 (10Antoine_Quhen) Last resolution about this ticket: * forget about complete automatization (puppet or ci) * add doc + a sh...
[14:44:33] <wikibugs>	 10Data-Engineering-Operations, 10Data Engineering Planning, 10Mail, 10SRE: Add xcollazo@wikimedia.org to the analytics-alerts mailing list - https://phabricator.wikimedia.org/T315486 (10jbond) p:05Triage→03Medium
[14:57:16] <wikibugs>	 10Data-Engineering-Kanban, 10Data Engineering Planning, 10Data Pipelines (Sprint 01): Investigate why airflow sensor tasks fail without sending errors - https://phabricator.wikimedia.org/T311976 (10EChetty)
[16:08:56] <wikibugs>	 (03CR) 10Bearloga: [C: 03+2] movement_metrics: Update global market active editors query [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/826911 (https://phabricator.wikimedia.org/T316398) (owner: 10Mayakpwiki)
[16:36:14] <andrewbogott>	 joal: joining us?
[16:36:23] <wikibugs>	 10Quarry: test tox on PR - https://phabricator.wikimedia.org/T317092 (10rook) https://github.com/toolforge/quarry/pull/3
[16:36:36] <wikibugs>	 10Quarry: test irc integration - https://phabricator.wikimedia.org/T316961 (10rook) https://github.com/toolforge/quarry/pull/2
[16:36:36] <joal>	 andrewbogott: Heya - I'm in meeting with my team now - can I follow on IRC? :S
[16:36:44] <wikibugs>	 10Quarry: build container on PR - https://phabricator.wikimedia.org/T316958 (10rook) https://github.com/toolforge/quarry/pull/1
[16:37:03] <andrewbogott>	 sure, or we can delay 30 minutes
[16:37:29] <joal>	 I'll be in interview in 30 - so better now for me :)
[16:40:11] <andrewbogott>	 ok :)
[16:42:57] <andrewbogott>	 I'm going to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/828102 as soon as CI is ready
[16:44:05] <andrewbogott>	 over here, apergos :)
[16:45:18] <andrewbogott>	 I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/828102.  Now forcing puppet runs on some toolforge nodes as spot checks...
[16:45:38] <joal>	 ack
[16:46:35] <apergos>	 hello to the fouurth member of the cabal!
[16:46:49] <JustHannah>	 yaay!
[16:47:02] <joal>	 heya :)
[16:47:52] <andrewbogott>	 joal: can you pick one of your nfs clients and do a puppet refresh to make sure things are still OK?
[16:48:10] <joal>	 andrewbogott: I don't have root :(
[16:48:18] <andrewbogott>	 then just tell me the name of one :)
[16:48:24] <joal>	 btullis: could you please help with that --^ ? 
[16:48:39] <joal>	 an-launcher1002.eqiad.wmnet would be one andrewbogott 
[16:48:51] <andrewbogott>	 thx!
[16:49:35] <btullis>	 Running puppet on an-launcher1002 now.
[16:49:46] <andrewbogott>	 oops I beat you to it :)
[16:49:54] <joal>	 thanks btullis :)
[16:50:25] <joal>	 andrewbogott: I can read nfs stuff, no issue
[16:50:27] <andrewbogott>	 my minimal cloud-vps/toolforge spot checks seem fine
[16:50:38] <andrewbogott>	 btullis: it's still mid-move so I'll want you to check again in 5
[16:50:55] <joal>	 ack
[16:51:46] <andrewbogott>	 ok, it's done.  can you check again, and tell me which path you're checking?
[16:52:21] <joal>	 andrewbogott: I cd into cd /mnt/data/xmldatadumps/public/other/pageviews
[16:52:42] <joal>	 andrewbogott: I can cd into subfolders, view files etc
[16:52:53] <andrewbogott>	 nice.  that's using the new mount so I think we're good.
[16:53:02] <joal>	 \o/
[16:53:27] <andrewbogott>	 so joal I think all that's left is for you to keep an eye on whether the dumps remain up-to-date in the next few days.  And also make sure y'all aren't relying on the old mount points /mnt/nfs/dumps-labstore1007.wikimedia.org and /mnt/nfs/dumps-labstore1006.wikimedia.org
[16:53:29] <joal>	 similarly andrewbogott, I can also access the analytics dumps data from the internet, so that seems working as well
[16:53:39] <joal>	 I'll surely do that andrewbogott 
[16:53:46] <andrewbogott>	 apergos: I'm feeling lucky, ok if I apply that dns change too?
[16:54:10] <apergos>	 yes I think so
[16:54:41] <apergos>	 noting two places that still have labstore related names in puppet: 
[16:54:43] <apergos>	 hieradata/codfw/profile/openstack/codfw1dev/networktests.yaml:    DUMPSFILE: /mnt/nfs/dumps-labstore1006.wikimedia.org/index.html
[16:54:50] <apergos>	 hieradata/eqiad/profile/openstack/eqiad1/networktests.yaml:    DUMPSFILE: /mnt/nfs/dumps-labstore1006.wikimedia.org/index.html
[16:55:04] <apergos>	 I assume neither of these are big deals but they will need to be fixed up I guess
[16:55:34] <andrewbogott>	 apergos: oh thanks! I'm pretty much the only one who runs that script lately but I will update it.
[16:55:38] <andrewbogott>	 dns change is rolling out now
[16:55:44] <apergos>	 crossing fingers
[16:56:45] <apergos>	 1 hour
[16:56:51] <apergos>	 that's going to be a long wait, ugh
[16:57:05] <andrewbogott>	 hm... how do I know if my browser is using the new cname?  I guess maybe by waiting an hour :/
[16:57:23] <apergos>	 wait an hour, try nslookup or dig from the command line, see what your isp does, heh
[16:57:34] <andrewbogott>	 dig on my laptop already shows the new one
[16:57:55] <andrewbogott>	 https://www.irccloud.com/pastebin/SrTMWbpQ/
[16:57:57] <apergos>	 just tried it on an internal host
[16:58:01] <apergos>	 looks good
[16:58:03] <andrewbogott>	 But I don't necessarily trust the browser to be using that.
[16:58:11] <andrewbogott>	 oh, nice.
[16:58:19] <apergos>	 same on laptop
[16:58:21] <andrewbogott>	 Someone did a very nice, thorough job of puppetizing those hosts.
[16:58:28] <apergos>	 so we know that the syntax is right and the name is beign served
[16:58:56] <apergos>	 a few different people over time worked on those manifests
[16:59:01] <apergos>	 most recent was probably Brooke
[16:59:37] <andrewbogott>	 I mean, reproducibility is the goal with puppet but the fact that I was able to rebuild all the same functionality on Bullseye very pleasing.
[17:00:06] <andrewbogott>	 OK folks, I think we're done for now apart from waiting for the other shoe to drop. Please ping me here immediately if you find any surprises in the next 24h or so.
[17:00:20] <andrewbogott>	 thank you all!
[17:00:29] <apergos>	 it's 8 pm for me so I won't be great about notifications for the next
[17:00:33] <joal>	 Hey andrewbogott, apergos and JustHannah - I'm going into interview mode - I'll be back in 1 hour :) Thanks a lot to all of you for the migration :)
[17:00:34] <apergos>	 well 12-13 hours probably
[17:00:57] <apergos>	 feel free to ping me (and Hannah) here or maybe better in the usual sre or security channel
[17:01:03] <apergos>	 if something comes up.
[17:01:16] <apergos>	 thanks for all the work!
[17:02:28] <andrewbogott>	 So far I'm just coasting on work that other folks did :)  The hard bit was the hdfs port which I suspect caused btullis to cry tears of blood.
[17:02:57] <apergos>	 I watched those package builds move along day by painful day.  but it got done in the end, kudos!
[17:04:31] <taavi>	 andrewbogott: switching the active nfs host broke toolforge and paws k8s containers, because they don't have the volume defined
[17:04:49] <apergos>	 aww crap
[17:04:59] <andrewbogott>	 taavi: tell me more?  I left all the old pieces in place so I'd expect them to just coast along on that
[17:05:15] <apergos>	 I guess they get the new name from someplace?
[17:06:47] <taavi>	 andrewbogott: there are symlinks in /public/dumps which are defined by puppet to point to "/mnt/nfs/dumps-${dumps_active_server}/${stuff}"
[17:06:57] <taavi>	 you switched dumps_active_server to something that's not mounted on the containers yet
[17:07:21] <apergos>	 those are the ones you use, yeah
[17:07:33] <andrewbogott>	 Ah, and those mounts are internal to the container rather than on the host VM
[17:08:02] <taavi>	 the containers only mount the host paths it's told to mount, and the new dumps hosts are not included
[17:08:09] <andrewbogott>	 hmmmm
[17:08:27] <andrewbogott>	 That's not fixable at runtime, right? only by building fresh containers and restarting everything?
[17:09:14] <taavi>	 you don't need to rebuild new containers, but yeah, adjusting some config (which is very painful on toolforge) and restarting stuff
[17:09:51] <andrewbogott>	 ok. so time to revert, yes?
[17:10:19] <taavi>	 for toolforge the painful part comes from the fact that we hardcode those paths in the PodSecurityPolicy objects, and there's one (or two, don't remember) of those for each tool
[17:10:26] <taavi>	 dumps_dist_active_vps needs to be reverted, everything else can stay
[17:11:07] <andrewbogott>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/830218
[17:11:24] <andrewbogott>	 Oh, hm... ok, let's just do that then
[17:11:28] * andrewbogott updates that revert
[17:11:38] <apergos>	 yeah because otherwise we also have the dns patch
[17:14:09] <andrewbogott>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/830218
[17:14:14] <apergos>	 let's wait for jenkins given its whine aminute ago
[17:14:34] <andrewbogott>	 heh, and it turns out the dns patch was 'wrong' inasmuch as dns was already pointing to 1006 even though 1007 was marked as the web server in puppet :/
[17:14:44] <andrewbogott>	 So my revert is trying to sort that out as well.
[17:15:14] <apergos>	 1007 was active for web? really? huh
[17:16:30] <apergos>	 so that might mean that web logs copied over to analytics all this time from the active web server... weren't very useful
[17:16:32] <apergos>	 huh
[17:16:36] <andrewbogott>	 ummm no I think I'm confused
[17:16:48] <andrewbogott>	 I think it was 1006, I just made a mistake in my original patch
[17:16:55] <apergos>	 ah ok!
[17:16:55] <andrewbogott>	 anyway after that 'revert' patch things should be consistent
[17:17:07] <apergos>	 good good
[17:17:28] <apergos>	 jenkins likes it, shall I +!?
[17:17:41] <andrewbogott>	 sure
[17:17:51] <apergos>	 done
[17:18:51] <wikibugs>	 10Data-Engineering, 10Product-Analytics, 10wmfdata-python: Support importing a Parquet file into HDFS using wmfdata-python - https://phabricator.wikimedia.org/T273196 (10nshahquinn-wmf) p:05Medium→03Low
[17:20:42] <andrewbogott>	 taavi: can you check https://phabricator.wikimedia.org/T317144 for accuracy?  And then, when you have time, add details about how to actually change all that :/
[17:21:46] <apergos>	 I went to subscribe to it and found I already was :-)
[17:22:25] <apergos>	 I suppose the old mounts need to go away before the old hosts can be taken out of service too
[17:22:50] <andrewbogott>	 yeah, although maybe in k8s we can just replace rather than add
[17:23:00] <andrewbogott>	 oh, no we can't.  Dang
[17:23:22] <taavi>	 oh, correction, in the PSPs we permit all of /mnt/nfs instead of the individual hosts. this makes it much less painful!
[17:24:08] <taavi>	 so you'd need to modify volume-admission-controller config files and something in PAWS too
[17:24:19] <andrewbogott>	 thank you for the catch, btw taavi.  Are things back to working OK?
[17:26:34] <taavi>	 not yet, but I think that's just puppet not running everywhere yet
[17:27:47] <apergos>	 at least some of this ought to be !logged I guess
[17:27:50] <taavi>	 running puppet manually on a single host looks good
[17:43:31] <milimetric>	 joal: sorry for the delay, I will deploy but later, do you still have time today to sync?
[17:45:15] <joal>	 Yes milimetric, in 15 minutes
[18:07:16] <joal>	 milimetric: heya - wanna chat now?
[18:07:36] <milimetric>	 joal: yes give me one min
[18:07:40] <joal>	 sure
[18:07:47] <milimetric>	 somehow didn't see your ping
[18:10:29] <milimetric>	 ok joal batcave?
[18:10:37] <joal>	 OMW!
[18:28:47] <milimetric>	 !log weekly deployment train starting
[18:28:48] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:32:13] <wikibugs>	 (03CR) 10Milimetric: Fix Array UDFs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828566 (owner: 10Nmaphophe)
[18:32:36] <wmf-insecte>	 Starting build #111 for job analytics-refinery-maven-release-docker
[18:45:46] <wmf-insecte>	 Project analytics-refinery-maven-release-docker build #111: 09SUCCESS in 13 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/111/
[18:45:53] <wikibugs>	 10Data-Engineering, 10API Platform, 10Platform Engineering Roadmap, 10User-Eevans: Pageviews integration testing - https://phabricator.wikimedia.org/T299735 (10codebug)
[18:47:20] <wmf-insecte>	 Starting build #70 for job analytics-refinery-update-jars-docker
[18:47:54] <wmf-insecte>	 Project analytics-refinery-update-jars-docker build #70: 09SUCCESS in 33 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/70/
[18:47:55] <wikibugs>	 (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.6 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830237
[18:48:27] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add refinery-source jars for v0.2.6 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830237 (owner: 10Maven-release-user)
[18:49:12] <milimetric>	 !log finished refinery-source 0.2.6 deploy, waiting 5 minutes and starting refinery deploy
[18:49:14] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:53:57] <wikibugs>	 (03CR) 10Ottomata: Fix Array UDFs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/828566 (owner: 10Nmaphophe)
[19:48:49] <milimetric>	 ottomata: some more errors scap deploying refinery, stat1006 this time:
[19:48:52] <milimetric>	 https://www.irccloud.com/pastebin/VbwhBXEE/
[19:49:34] <milimetric>	 IOError: [Errno 28] No space left on device\nerror: external filter 'git-fat filter-clean' failed
[19:49:37] <milimetric>	 yeah, no space
[20:08:19] <icinga-wm>	 PROBLEM - Disk space on an-launcher1002 is CRITICAL: DISK CRITICAL - free space: /srv 0 MB (0% inode=98%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops
[20:28:15] <wikibugs>	 (03PS1) 10Milimetric: Delete unused jars [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830260
[20:29:10] <milimetric>	 I think I need SRE access to make space, ottomata, help?
[20:29:31] <milimetric>	 an-launcher is in trouble
[20:29:46] <milimetric>	 (btw, I made https://gerrit.wikimedia.org/r/c/analytics/refinery/+/830260 to just delete these old unused jars)
[20:35:15] <milimetric>	 did someone clear the space?  I'm confused
[20:35:21] <milimetric>	 I guess I'll try deploying again...
[20:36:06] <milimetric>	 maybe the rollback deleted them... but there should be space then...
[20:46:19] <icinga-wm>	 RECOVERY - Disk space on an-launcher1002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=an-launcher1002&var-datasource=eqiad+prometheus/ops
[20:48:55] <ottomata>	 milimetric:  hi sorry!
[20:48:56] <ottomata>	 here!
[20:49:16] <ottomata>	 still problems?  how can I help asap?
[20:49:21] <milimetric>	 sok, I think something auto-recovered on an-launcher
[20:49:34] <milimetric>	 or someone was fixing it behind the scenes or something
[20:49:43] <milimetric>	 it was out of space and now it's not
[20:49:53] <milimetric>	 the deploy was broken but I deployed -f again and it seems to work now
[20:50:21] <milimetric>	 thanks ottomata, I'll ping if anything breaks again... and thanks to whoever freed up space on an-launcher
[20:50:40] <ottomata>	 hm.
[20:50:41] <ottomata>	 okay
[20:50:49] <ottomata>	 and sstat1006?
[20:51:07] <milimetric>	 seemed fine after deploy -f too
[20:51:13] <ottomata>	 h okahy, yeah there is lots of space free there
[20:51:14] <milimetric>	 but that one never threw a disk space alarm, just failed
[20:51:21] <ottomata>	 oh okay
[20:51:56] <milimetric>	 I deleted like 90% of the jars in the artifacts directory, not just old jars, like all versions we don't use
[20:52:01] <milimetric>	 (not merged yet)
[20:52:18] <milimetric>	 that would be nice to merge sometime
[20:52:53] <ottomata>	 milimetric: lets do it tomorrow
[20:53:17] <milimetric>	 maybe thursday, tomorrow's kinda nuts for me
[21:06:56] <ottomata>	 k
[21:07:04] <wikibugs>	 10Quarry, 10GitLab (Project Migration): Move quarry to gitlab or github - https://phabricator.wikimedia.org/T308978 (10rook)
[21:07:36] <wikibugs>	 (03PS1) 10Milimetric: Fix groupby (hive is unfortunately like mysql here) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830263
[21:07:53] <milimetric>	 argh, I gotta redeploy, there's a bug in that query ^
[21:07:54] <wikibugs>	 10Quarry, 10GitLab (Project Migration): Move quarry to gitlab or github - https://phabricator.wikimedia.org/T308978 (10rook)
[21:08:00] <wikibugs>	 10Quarry: test tox on PR - https://phabricator.wikimedia.org/T317092 (10rook)
[21:08:06] <wikibugs>	 10Quarry: test irc integration - https://phabricator.wikimedia.org/T316961 (10rook)
[21:08:12] <wikibugs>	 10Quarry: build container on PR - https://phabricator.wikimedia.org/T316958 (10rook)
[21:08:15] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix groupby (hive is unfortunately like mysql here) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/830263 (owner: 10Milimetric)
[21:32:42] <wikibugs>	 10Data-Engineering, 10Product-Analytics: [REQUEST] Add new Fundraising dimensions to druid.pageviews_daily & druid.pageviews_hourly - https://phabricator.wikimedia.org/T304571 (10Mayakp.wiki) Per discussion in today's Board Refinement meeting moving this task to Tracking for Product Analytics as the scope of t...
[21:37:47] <wikibugs>	 10Data-Engineering: Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10fkaelin)
[21:45:13] <milimetric>	 !log cleared logs earlier than September 1st from an-launcher1002:/srv/airflow-analytics/logs/scheduler
[21:45:13] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[21:48:56] <wikibugs>	 10Data-Engineering: Support for moving data from HDFS to public http file server - https://phabricator.wikimedia.org/T317167 (10Ottomata) Context: - https://wikitech.wikimedia.org/wiki/Analytics/Web_publication - https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/rsync/published.pp...
[21:57:29] <milimetric>	 !log finished cleaning up bad state and re-deploying refinery
[21:57:30] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[21:57:58] <milimetric>	 it took 3.5 HOURS!!! :( :(
[22:18:16] <milimetric>	 !log restarted webrequest load bundle
[22:18:17] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[22:18:25] <milimetric>	 !log restarted referrer daily coordinator
[22:18:26] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[22:18:35] <milimetric>	 !log restarted webrequest druid daily and hourly jobs
[22:18:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log