[03:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [03:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [06:46:26] (03CR) 10Elukey: [C: 03+1] "Aiko, do double check - did you follow all the procedures in https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Modifying_schemas " [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/944183 (https://phabricator.wikimedia.org/T343002) (owner: 10AikoChou) [07:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [07:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [09:16:08] (03CR) 10AikoChou: [C: 03+1] Update mediawiki/page/prediction_classification_change to 1.1.0 (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/944183 (https://phabricator.wikimedia.org/T343002) (owner: 10AikoChou) [09:18:57] (03CR) 10Elukey: [C: 03+2] Update mediawiki/page/prediction_classification_change to 1.1.0 [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/944183 (https://phabricator.wikimedia.org/T343002) (owner: 10AikoChou) [09:19:27] (03Merged) 10jenkins-bot: Update mediawiki/page/prediction_classification_change to 1.1.0 [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/944183 (https://phabricator.wikimedia.org/T343002) (owner: 10AikoChou) [09:32:42] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:35:10] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:45:38] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:47:42] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:58:40] hi, puppet runs on idp-test1002 are failing, this is caused by the client_secret that was pushed to the private repo, it needs a corresponding entry in hieradata/role/common/idp_test.yaml [11:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [11:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [12:25:22] 10Data-Platform-SRE, 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343318 (10LSobanski) [12:26:24] 10Data-Platform-SRE, 10sre-alert-triage: Alert triage: overdue alert [warning] - https://phabricator.wikimedia.org/T343319 (10LSobanski) [12:42:43] !log Redeploy of analytics_product Airflow instance to see it it clears a Spark issue [12:42:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:32:52] (03CR) 10DCausse: Provide internal schema for CirrusSearch update-pipeline updates. (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [13:41:05] (03CR) 10DCausse: Provide internal schema for CirrusSearch update-pipeline updates. (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [13:41:35] (03PS11) 10DCausse: Provide internal schema for CirrusSearch update-pipeline updates. [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [13:41:37] (03PS3) 10DCausse: Add mediawiki/cirrussearch/page_rerender [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/935697 (https://phabricator.wikimedia.org/T325565) [14:08:33] (03CR) 10DCausse: [WIP] cirrussearch: add fetch_failure schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/854572 (https://phabricator.wikimedia.org/T317609) (owner: 10DCausse) [14:51:45] (03CR) 10DCausse: "updated the corresponding java code in this commit: https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_re" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/856507 (https://phabricator.wikimedia.org/T317202) (owner: 10Peter Fischer) [15:00:28] 10Data-Engineering, 10Dumps 2.0, 10Data Products (Sprint 0): Develop Dumps Triage Runbook - https://phabricator.wikimedia.org/T343325 (10JEbe-WMF) [15:00:32] 10Data-Engineering, 10Dumps 2.0, 10Data Products (Sprint 0): Develop Dumps Triage Runbook - https://phabricator.wikimedia.org/T343325 (10JEbe-WMF) [15:16:04] 10Data-Engineering, 10Observability-Logging, 10Event-Platform: eventgate logs field explosion - https://phabricator.wikimedia.org/T343342 (10colewhite) [15:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [15:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [15:35:40] (03PS1) 10Tsevener: Change diff open action for iOS watchlists schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944942 (https://phabricator.wikimedia.org/T341896) [15:36:20] (03CR) 10CI reject: [V: 04-1] Change diff open action for iOS watchlists schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944942 (https://phabricator.wikimedia.org/T341896) (owner: 10Tsevener) [16:58:58] (03PS2) 10Tsevener: Change diff open action for iOS watchlists schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944942 (https://phabricator.wikimedia.org/T341896) [16:59:25] (03CR) 10CI reject: [V: 04-1] Change diff open action for iOS watchlists schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944942 (https://phabricator.wikimedia.org/T341896) (owner: 10Tsevener) [17:03:49] (03Abandoned) 10Tsevener: Change diff open action for iOS watchlists schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944942 (https://phabricator.wikimedia.org/T341896) (owner: 10Tsevener) [17:05:42] 10Data-Platform-SRE, 10DC-Ops, 10SRE, 10ops-codfw: Q1:rack/setup/install wdqs20[23-25].codfw.wmnet - https://phabricator.wikimedia.org/T342659 (10Jhancock.wm) a:03Jhancock.wm [17:09:03] (03PS1) 10Tsevener: Change diff open action for iOS watchlists schema (2nd attempt) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944965 (https://phabricator.wikimedia.org/T341896) [18:07:44] (03PS1) 10Milimetric: Adapt to nulls in rev_actor and rev_comment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944974 [18:17:34] (03CR) 10Milimetric: [C: 03+1] Adapt to nulls in rev_actor and rev_comment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944974 (owner: 10Milimetric) [18:19:42] (03CR) 10Mazevedo: [C: 03+2] Change diff open action for iOS watchlists schema (2nd attempt) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944965 (https://phabricator.wikimedia.org/T341896) (owner: 10Tsevener) [18:20:13] (03Merged) 10jenkins-bot: Change diff open action for iOS watchlists schema (2nd attempt) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/944965 (https://phabricator.wikimedia.org/T341896) (owner: 10Tsevener) [18:21:27] (03CR) 10Xcollazo: [C: 03+2] "Dan, Marcel and I live coded this fix. A function that attempts to help on the join skew was failing because it doesn't support nulls. We " [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944974 (owner: 10Milimetric) [18:30:11] (03Merged) 10jenkins-bot: Adapt to nulls in rev_actor and rev_comment [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944974 (owner: 10Milimetric) [18:38:17] (03PS1) 10Xcollazo: Update changelog for v0.2.21 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944979 [18:39:20] (03CR) 10Xcollazo: [V: 03+2 C: 03+2] "Just a changelog change. Self merging." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/944979 (owner: 10Xcollazo) [18:50:57] Starting build #126 for job analytics-refinery-maven-release-docker [19:03:19] Project analytics-refinery-maven-release-docker build #126: 09SUCCESS in 12 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/126/ [19:09:42] Starting build #85 for job analytics-refinery-update-jars-docker [19:10:05] Project analytics-refinery-update-jars-docker build #85: 09SUCCESS in 23 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/85/ [19:10:05] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.21 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/944345 [19:13:46] (03CR) 10Xcollazo: [V: 03+2 C: 03+2] "LGTM." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/944345 (owner: 10Maven-release-user) [19:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [19:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [19:30:57] !log deploying refinery to try and fix https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.wikimedia.org/thread/QKXYMYKMWXGRNYZ77CENA5F2EGA66QQ2/ [19:30:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:42:05] !log deployed latest for Airflow analytics instance. [20:42:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [23:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability