[03:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [03:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [07:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [07:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [07:19:15] (EventgateValidationErrors) firing: ... [07:19:16] eventgate-analytics-external stream eventlogging_MobileWebUIActionsTracking validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationError [07:34:15] (EventgateValidationErrors) resolved: ... [07:34:16] eventgate-analytics-external stream eventlogging_MobileWebUIActionsTracking validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationError [08:02:42] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:04:18] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:16:12] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:17:42] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:36:06] (03PS3) 10DCausse: cirrussearch: add fetch_failure schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/854572 (https://phabricator.wikimedia.org/T317609) [09:01:15] 10Data-Engineering, 10Data-Platform-SRE: Get datahub-staging.wikimedia.org working with the staging deployment of datahub - https://phabricator.wikimedia.org/T343236 (10Stevemunene) >>! In T343236#9067427, @MoritzMuehlenhoff wrote: > JFTR, Puppet runs on idp-test1002 are currently failing, this is caused by th... [09:12:39] 10Data-Engineering, 10Data-Platform-SRE, 10Event-Platform: Upgrade eventgate Docker image to Bullseye and nodejs 12 - https://phabricator.wikimedia.org/T343510 (10elukey) [09:39:39] 10Data-Platform-SRE: Decommission wdqs200[4-6] - https://phabricator.wikimedia.org/T342035 (10Gehel) 05Open→03Resolved [09:40:12] 10Data-Platform-SRE, 10Patch-For-Review: Migrate analytics_test airflow instance to bullseye an-test-client1002 - https://phabricator.wikimedia.org/T341700 (10Gehel) 05Open→03Resolved [09:40:15] 10Data-Platform-SRE: Upgrade Airflow instances to Bullseye - https://phabricator.wikimedia.org/T335261 (10Gehel) [09:40:22] 10Data-Platform-SRE, 10Patch-For-Review: Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10Gehel) [09:41:31] 10Data-Platform-SRE, 10sre-alert-triage: 404 from nginx on wcqs2001 - https://phabricator.wikimedia.org/T342762 (10Gehel) 05Open→03Resolved a:03Gehel [09:41:42] 10Data-Platform-SRE, 10Wikidata: Create WDQS Lag SLO dashboard with Grizzly && documentation - https://phabricator.wikimedia.org/T324811 (10Gehel) 05Open→03Resolved [09:41:50] 10Data-Platform-SRE, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Create WDQS uptime SLO - https://phabricator.wikimedia.org/T313751 (10Gehel) [10:35:50] 10Data-Platform-SRE, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10phuedx) [10:50:22] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Improve df_to_remarkup formatting for wmfdata-python - https://phabricator.wikimedia.org/T341589 (10AndrewTavis_WMDE) @nshahquinn-wmf, just FYI I do have this on my radar. Sorry it's taking so long... I'm in the process of waiting for a new computer... [11:02:44] (SystemdUnitFailed) firing: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:04:30] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:16:24] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:16:35] I have a question about the monthly pageview_complete dumps described here https://dumps.wikimedia.org/other/pageview_complete/readme.html [11:17:44] (SystemdUnitFailed) resolved: produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:18:27] How up-to-date is the documentation? [11:18:29] (MediawikiPageContentChangeEnrichAvailability) firing: ... [11:18:29] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [11:29:26] How are redirects and stuff measured? [11:29:26] for example these entries: [11:29:27] nl.wikipedia !Xoo 2338869 desktop 1 P1 [11:29:27] nl.wikipedia !Xoo 171933 desktop 1 [1 [11:29:28] nl.wikipedia !Xoop 233341 desktop 2 P1Q1 [11:29:28] nl.wikipedia !Xoop 171933 mobile-web 1 Y1 [11:29:29] nl.wikipedia !Xóõ 171933 desktop 35 A3B1D1E7F1I2J1K2P3R2S4X1[1]4^1_1 [11:29:29] nl.wikipedia !Xóõ null mobile-app 4 B1M1R1[1 [11:29:30] nl.wikipedia !Xóõ 171933 mobile-web 32 A3D1E2H1I1K1L2N1P2Q1R1T1V1W1X2Z1[3\1]2^1_3 [11:29:30] nl.wikipedia !Xóõ_(taal) 171933 mobile-web 2 Y2 [11:29:31] nl.wikipedia !Xóõ_(taal) 1375634 desktop 2 N1P1 [11:29:31] nl.wikipedia !Xõó 2500099 desktop 1 P1 [11:29:32] nl.wikipedia !Xõó 171933 mobile-web 1 Y1 [11:29:47] Is there any documentation on how to interpret this? [11:30:01] some are statistics for redirects.. i guess [11:30:35] It seems like https://nl.wikipedia.org/w/index.php?curid=171933 is the real page [11:31:10] it is listed 6 times [11:33:04] Does anyone read this channel btw? [11:53:10] 10Data-Platform-SRE, 10decommission-hardware: decommission an-test-client1001.eqiad.wmnet - https://phabricator.wikimedia.org/T343520 (10Stevemunene) [12:06:36] a-team [12:18:17] 'a-team' [13:07:25] a-team? [13:20:56] VeniVidiVicipedi I do, but I'm not much help with this issue I'm afraid [13:21:31] I recommend filing a ticket via https://phabricator.wikimedia.org/ . If you have any questions about that process, let me know [13:21:53] VeniVidiVicipedi: I am not in the a-team but I used those data recently, the redirects are not summed, I needed to write a code to sum the redirects, I think you are using the monthly pageviews, they are not explained in the doc but is very similar to daily pageviews [13:21:53] in your example "2338869 desktop 1 P1" means: page_id = 2338869, desktop = accessed from desktop, 1 = total pageviews in the month, P1 = 1 pageview at day 16 (A=1, B=2,...,P=16,...,Z=26, [=27, \=28, ]=29, ^=30, _=31) [13:32:25] Thanks for the info guys! [13:39:34] (03PS1) 10Gerrit maintenance bot: Add blk.wiktionary to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/945023 (https://phabricator.wikimedia.org/T343542) [13:41:20] (03PS1) 10Gerrit maintenance bot: Add su.wikisource to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/945025 (https://phabricator.wikimedia.org/T343548) [13:53:33] I created an issue on Phabricator [15:00:04] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wcqs2001.codfw.wmnet with OS bullseye [15:18:30] (MediawikiPageContentChangeEnrichAvailability) firing: ... [15:18:30] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [17:24:34] 10Data-Platform-SRE, 10Discovery-Search (Current work): Document SRE steps for deploying a new WDQS (and WCQS) host - https://phabricator.wikimedia.org/T330714 (10bking) I updated [[ https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service#I've_got_a_new_WDQS_host,_how_do_I_get_it_ready_for_production? | the... [17:24:49] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wcqs2001.codfw.wmnet with OS bullseye executed with errors: - wcqs2001 (**FAIL**... [18:15:18] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin1001 for host wcqs2002.codfw.wmnet with OS bullseye [18:19:35] (03CR) 10Dbrant: [C: 04-1] Android: New schema for image recommendations feature (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [18:30:40] (DruidSegmentsUnavailable) firing: More than 10 segments have been unavailable for mediawiki_history_reduced_2023_07 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [18:41:38] (03CR) 10Sharvaniharan: "Hi Shay," [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [18:50:40] (DruidSegmentsUnavailable) resolved: More than 10 segments have been unavailable for mediawiki_history_reduced_2023_07 on the druid_public Druid cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid/Alerts#Druid_Segments_Unavailable - https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&var-cluster=druid_public&panelId=49&fullscreen&orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DDruidSegmentsUnavailable [18:57:55] (03PS6) 10Sharvaniharan: Android: New schema for image recommendations feature [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 [19:00:40] (03CR) 10Sharvaniharan: "@Snowick. Please let me know your thoughts on the review comments." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/940266 (owner: 10Sharvaniharan) [19:11:57] 10Data-Platform-SRE, 10Epic: [Epic] Migrate all Search Platform servers to Debian Bullseye - https://phabricator.wikimedia.org/T323921 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by bking@cumin1001 for host wcqs2002.codfw.wmnet with OS bullseye executed with errors: - wcqs2002 (**FAIL**... [19:12:13] 10Data-Engineering, 10Product-Analytics, 10Wmfdata-Python: Improve df_to_remarkup formatting for wmfdata-python - https://phabricator.wikimedia.org/T341589 (10nshahquinn-wmf) @AndrewTavis_WMDE no worries at all! [19:18:30] (MediawikiPageContentChangeEnrichAvailability) firing: ... [19:18:30] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [19:32:03] (03PS1) 10Milimetric: pageview: Add pk.wikimedia.org to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/945860 [19:32:15] (03CR) 10Milimetric: [V: 03+2 C: 03+2] pageview: Add pk.wikimedia.org to the allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/945860 (owner: 10Milimetric) [23:18:30] (MediawikiPageContentChangeEnrichAvailability) firing: ... [23:18:30] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability