analytics

Commit Graph

Author	SHA1	Message	Date
Adam Rutkowski	38381195f8	Reapply+bugfix: goals with custom props (#5936 ) (#5944 ) * Reapply "Goals with custom props (Stats API queries, funnels) (#5936)" (#5943) This reverts commit `45116bda7b`. * Gracefully handle `nil` for existing goal.custom_props * Revert "Gracefully handle `nil` for existing goal.custom_props" This reverts commit `8e38748775`. * Migration: make `goals.custom_props` non-null * Adjust test	2025-12-11 13:09:34 +00:00
Adam Rutkowski	45116bda7b	Revert "Goals with custom props (Stats API queries, funnels) (#5936 )" (#5943 ) This reverts commit `b6b9c2c0bf`.	2025-12-11 09:17:33 +00:00
Adam Rutkowski	b6b9c2c0bf	Goals with custom props (Stats API queries, funnels) (#5936 ) * Migration: add custom propos to goals + revisit unique constraints * Update constraints in goal schema (and move module) * Add a comment, not really related but useful? * Implement querying for goals with custom props * Optimize goal_join_data (down to one iteration) + include goal custom props * Test goal custom propos addition + new constraints * Test querying for goals with custom propos attached * Test funnels made of goals with custom props * Format * Fixup test name * Fixup migration * Unified goal join macro * Remove dupe test * Clean up user_id usage * Fixup test to match the description * Revert "Temporary: make room for pre/post migration constraint names (#5942)" This reverts commit `e4bc6b8715`. --------- Co-authored-by: Uku Taht <uku.taht@gmail.com>	2025-12-11 08:39:46 +00:00
Adrian Gruntkowski	a2ba1256d2	Show revenue data in all breakdowns (#5767 ) * Include revenue data for all detailed API responses except entry/exit pages * Expose revenue data in all breakdown modals except entry/exit pages * Add revenue metrics to breakdown response only on EE * Change query builder to enable querying event metrics \w session dimension * Add revenue metrics to entry and exit pages breakdowns * Expose revenue data in entry and exit pages breakdowns * Use `argMax` for `exit_page` and `exit_page_hostname` dimensions (h/t @ukutaht) * Don't handle event-only dimensions with session-only metrics for now * Add tests for all breakdowns * Add clarifying comments in code * Mark revenue tests as EE-only	2025-11-18 11:24:54 +00:00
Karl-Aksel Puulmann	1531386b76	FULL join for time:hour as well as time:minute (#5715 ) * FULL join for time:hour as well as time:minute Follow-up to https://github.com/plausible/analytics/pull/5694/files#r2321567271 A session might be active over multiple hours, but not (currently) reported as such when requesting only specific metrics per hour. This fixes that problem. * Handle full join logic correctly --------- Co-authored-by: Uku Taht <Uku.taht@gmail.com>	2025-09-15 14:14:59 +00:00
Karl-Aksel Puulmann	db448d7404	Stats: Rebuild session smearing for timeseries (#5694 ) * Refactor table_decider#partition_metrics * Refactor query pipeline to return a list of subqueries after splitting * Move order_by out of join logic * Refactor joining logic in query_builder 1. JOIN type is now set in QueryOptimizer 2. JOIN logic is now table and list-size agnostic * Comment an edge case * Rebuild session/visit smearing Previously, whenever graphing any visit metric hourly/realtime, visit_duration and other visit metrics would be way higher than expected, due to long sessions dragging each bucket up and up. Now visits/visitors metrics are still smeared and other visit metrics are counted under last bucket user was active in. visits metric was also overcounted (see new tests). * Remove unneeded case * Unit test for smearing in tabledecider	2025-09-08 06:21:12 +00:00
RobertJoonas	c7bf54325e	Fix visitors.csv vs main graph discrepancy (#5230 ) * fix typo to stop logging error in test output * add time:minute interval to internal api schema * always get visitors and visits from sessions table when time:minute dimension used * query-api generate types * changelog update	2025-03-25 16:18:48 +00:00
Karl-Aksel Puulmann	e9b30e0ba5	time-on-page: query (#5159 ) * Default to time_on_page * Add new columns to schema * Read from new column in legacy query * Read/write new imported_pages columns * Remove time_on_page column from imported_pages * Simple, stupid new_time_on_page metric * Update csv_importer schema * Refactor: consistent __internal helpers, this will help with joining the query * Refactor select_joined_metrics * Refactor: pass `query` to event_metric * Refactor: remove needless site argument from various calls * Legacy joining query attempt * Move test around * Add more tests for both legacy and new time_on_page metrics in query API * time_on_page reported in seconds * timeseries test for metric * WIP * Wrap main query in subquery - without this run into trouble performing the join * Calculate time_on_page in main query, no more new_time_on_page * Add some TODOs * Return NULL over 0 when no visits with time-on-page data * Update moduledoc * Update some tests that were not expecting integers * Add a TODO * Update tests * Make graphing time series with combined metrics work. * Slightly more consistent approach to flag updating in APIv2 * Seeds with engagement data * Make graphing time series when cutoff is in the middle work Bakes less assumptions into everything as well. * Rename to legacy_time_on_page_cutoff * Fixup lib/plausible_web/controllers/api/external_query_api_controller.ex * Remove a todo and dead/misleading code * Remove a resolved todo * Remove needless rounding * gen types * Update pages test * Remove needless columns from select * Update tests: timestamps and remove comment * Flip branches	2025-03-11 11:19:58 +00:00
Karl-Aksel Puulmann	8aad9b1adc	APIv2: Fix `percentage` metric 500s (#5099 ) * Refactor: remove metrics argument from merge_imported() * Support querying percentage without visitors metric * Fix ordering by special metrics with imports causing a 500 We don't calculate all metrics directly on imports, hence cannot order the import by them either. * Changelog	2025-02-26 07:07:17 +00:00
RobertJoonas	4d05245036	Page Scroll Goals (#5029 ) * migration: add scroll_threshold to goals * update goal schema * setup simple UI for creating scroll goals * add ability to filter and breakdown scroll goals * fix goals form tests * add valiation for page path exists * move todo comments to expression.ex * move tests * make it clear that scroll_threshold is optional * avoid calling Plausible.Goal.type() too many times * do not consider 255 scroll depth a conversion * migration: add scroll_threshold to goals * do not drop the old index yet * More efficient goals join again * Refactor: move goals stats code explicitly under Stats.Goals module * Move code under Plausible.Stats.Goals * 254 -> 100 * add scroll_threshold field to goal schema + new unique constraint * adjust test to test what it claims to * mix format * add migration * consider imported query unsupported when page scroll goal filter * add missing tests * pattern match imported argument * silence credo * Update lib/plausible/stats/sql/expression.ex Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> * use site_imports populated in test setup --------- Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com> Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>	2025-02-04 10:37:04 +00:00
Karl-Aksel Puulmann	9a4969e105	APIv2: Behavioral filtering (#4980 ) * Expose site_id and site_native_stats_start_at via query This allows to do more query-building without exposing and passing `site` directly. * Very basic has_done/has_done_not operator support No event:goal support yet, no validations * Add validations that only event: dimensions can be used within has_done/has_done_not * Allow event:goal filters nested within has_done/has_done_not behavioral filters * Minor fix for do_decide_custom_prop_table * has_done support for goals Minor changes along the way: - preloaded_goals structure changes - event:goal restrictions were loosened within has_done - we don't allow nesting has_done anymore * Dont query imports when behavioral filters are present * Update callsites of filtering_on_dimension? to work with new behavioral filters * has_done_not -> has_not_done * Changelog entry * Typegen * credo cleanup * Fix changelog * Remove changelog * Mark has_done as internal-only * combine two validations into a single loop * has_done is now session-based not user-based * Update a test * Update transform_tree	2025-01-21 12:03:26 +00:00
Karl-Aksel Puulmann	0c8fed9570	Fix: Add missing SAMPLE BY to query (#4880 ) * Fix: Add missing SAMPLE BY to query * Update lib/plausible/stats/sql/query_builder.ex	2024-12-09 08:54:39 +00:00
Karl-Aksel Puulmann	bec14ee77c	Improve report performance with high-cardinality import joins (#4848 ) * Improve report performance in cases where site has a lot of unique pathnames Ref: https://3.basecamp.com/5308029/buckets/39750953/card_tables/cards/8052057081 JOINs in ClickHouse are slow. In one degenerate case I found a user had over 20 million unique paths in an import, which resulted in extremely slow JOINs. This introduces a sort-of hacky solution to it by limiting the amount of data analyzed. Query timing without this change: ``` 9 rows in set. Elapsed: 11.383 sec. Processed 49.16 million rows, 5.75 GB (4.32 million rows/s., 505.29 MB/s.) Peak memory usage: 14.75 GiB. ``` After: ``` 9 rows in set. Elapsed: 0.572 sec. Processed 49.18 million rows, 5.75 GB (86.03 million rows/s., 10.06 GB/s.) Peak memory usage: 9.01 GiB. ``` * Splitting should no longer remove pagination. Handle special cases in special_metrics.ex * select_merge_as in imports This sets up selected_as aliases which will be used in a subsequent commit * Add explicit ORDER BY to import * Rewrite comment * quoting * merge conflict * Split test * Merge conflict fail fix	2024-12-05 10:05:57 +00:00
RobertJoonas	6822b29016	Average Scroll Depth Metric: put scroll depth on the dashboard under a feature flag (#4832 ) * migration: add scroll_depth to events_v2 * (cherry-pick) ingest scroll depth * replace convoluted test with more concise ones * QueryParser: parse internal scroll_depth metric + validation * turn QueryComparisonsTest into QueryInternalTest * rename file * (cherry pick) query scroll depth `15b14d3` ...and move the tests into `internal_query_test.exs` * review feedback * Get rid of unnecessary separation between aggregate and group scroll depth * Drop irrelevant other metrics in tests * add test ensuring scroll depth unavailable in Stats API v1 * Put scroll depth on the dashboard * Top Stats * Main Graph * Top Pages > Details * feature flag for dashboard scroll depth access * ignore credo warning * enable scroll_depth flag in tests * remove duplication * write timestamps explicitly in a test * revert moving tests around * Add query_comparisons_test back * Move scroll_depth tests into query_test * Delete query_internal_test * rename setup util (got updated on master) * use pageleave_factory where applicable * Use the correct generated query-api.d.ts * npm format	2024-11-20 13:13:04 +00:00
Karl-Aksel Puulmann	ef57502854	APIv2: Implement pagination and `include.total_rows` (#4575 ) Offset-based pagination is used to make sure Looker integration is able to work as efficiently as possible. To know how many requests users need to do `include.total_rows` option was added.	2024-09-12 15:51:18 +03:00
Karl-Aksel Puulmann	8fa3a83129	APIv2: and/or/not support (#4480 ) * First approximation of AND/OR/NOT support Broken by this: - Goal filtering - Table deciding - Imports * TableDecider handle nesting * Query.remove_top_level_filters * Plausible.Stats.Imported.SQL.Expression * Handle AND/OR/NOT with imported data, create Plausible.Stats.Imported.SQL.WhereBuilder * Add parser validations for event:goal, event:hostname and event:props:x filters top level constraints * Move module around * Query.get_filter -> Filters.filtering_on_dimension? in some callsites * Filters.get_toplevel_filter * TableDecider.sessions_join_events?, remove old method * Transforming filters in query_optimizer * Query API tests for and/or/not * Reorder parser steps * Post-merge test fixups * Solve merge issue * Simplify filtering_on_dimension? * Update transformer code * dimensions_used_in_filters min_depth option, simplify parser validations * rename_dimensions_used_in_filter * fix rename_dimensions_used_in_filter * Rename a test	2024-09-04 15:44:03 +03:00
ruslandoga	48dfce7a8b	update ecto_ch and use ecto hints for array joins (#4456 )	2024-08-21 06:28:25 +02:00
Karl-Aksel Puulmann	3ab47e6401	Remove SAMPLE BY from an events query that joins sessions (#4341 ) Some users are reporting wrong results for stats querying from events table but with session filters. This seems to be caused by sampling: Namely both tables end up sampled and the incorrect sample rate seems to get used. This is sadly impossible to test for in our test suite, but reveals itself in production with the following query: ```sql SELECT toUInt64 (round(uniq (se0.user_id) * any(_sample_factor))) AS visitors FROM events_v2 AS se0 SAMPLE 20000000 INNER JOIN ( SELECT sss0.session_id AS session_id, any(_sample_factor) AS _sample_factor FROM sessions_v2 AS sss0 SAMPLE 20000000 WHERE (sss0.site_id = _CAST (8195000, 'Int64')) AND (sss0.timestamp >= _CAST (1720497600, 'DateTime')) AND (sss0.start < _CAST (1720584000, 'DateTime')) AND (sss0.entry_page IN ('/')) AND (sss0.sign = 1) GROUP BY sss0.session_id ) AS ss1 ON se0.session_id = ss1.session_id WHERE (se0.site_id = _CAST (8195000, 'Int64')) AND (se0.timestamp >= _CAST (1720497600, 'DateTime')) AND (se0.timestamp < _CAST (1720584000, 'DateTime')) ``` By removing the inner query SAMPLE clause the result changes >50% Related old PR: https://github.com/plausible/analytics/pull/2962 This will likely require more work in the future so looking forward to that - after my vacation.	2024-07-11 15:08:20 +03:00
Karl-Aksel Puulmann	0114689779	fix(TimeSeries): Work around weird time zones (#4337 ) * Refactor Expression.dimension to accept q * Handle quarter- and half-hour timezones Previously APIv2 output didn't start at a full hour for these time zones and main graph was blank The core reasoning is that ClickHouse `timeSlots` is not time-zone aware and works off of unix epoch - meaning that in time zones which have an offset of 5:45 the "hours" reported would start at minute :45. The fix is kind of silly - we now divide each hour into 4 and handle things that way. Related basecamp issue: https://3.basecamp.com/5308029/buckets/36789884/card_tables/cards/7590936581 * Fix test typo	2024-07-11 11:48:16 +03:00
Karl-Aksel Puulmann	a9676546dc	APIv2: Cleanup (#4308 ) * Remove a dead method * Move select_event/session_metrics to within QueryBuilder * Make a method private * Move page_regex util around * Move utc_boundaries helper around * Fixups for utc_boundaries * Add legacy notices * Move Stats.query method around * include_sentry_replay_info consistently * Move filters out of select_group_fields * Collapse conditions under select_group_fields * Shorten some imported methods * Use dimension over dim in group_by * Separate SQL query building from imported.ex * props.ex -> legacy_dimensions.ex * Move some query building out of Query.ex * Remove unneeded method * put_filter -> add_filter * Remove some query setters * Moduledoc * Split out validations and import tests from query_test * Move tests around * Split event:goal tests from query_test * Remove redundant filters * Remove dead code * Split special metrics tests from query_test * Legacy module	2024-07-09 14:31:45 +03:00
Karl-Aksel Puulmann	a181f3eab3	APIv2: TimeSeries using QueryBuilder, release `experimental_session_count` (#4305 ) * Move fragments module under Plausible.Stats.SQL * Introduce select_merge_as macro This simplifies some select_merge calls * Simplify select_join_fields * Remove a needless dynamic * wrap_select_columns macro * Move metrics from base.ex to expression.ex * Move WhereBuilder under Plausible.Stats.SQL * Moduledoc * Improved macros * Wrap more code * select_merge_as more * Move defp to the end * include.time_labels parsing * include.time_labels in result Note that the previous implementation of the labels from TimeSeries.ex was broken * Apply consistent function in imports and timeseries.ex * Remove boilerplate * WIP: Limited support for timeseries-with-querybuilder * time:week dimension * cleanup: property -> dimension * Make querying with time series work * Refactor: Move special metrics (percentage, conversion rate) to own module * Explicitly format datetimes * Consistent include_imported in special metrics * Solve week-related crash * conversion_rate hacking * Keep include_imported consistent after splitting the query * Simplify do_decide_tables * Handle time dimensions in imports cleaner * Allow time dimensions in custom property queries * time:week handling continued * cast_revenue_metrics_to_money * fix `full_intervals` support * Handle minute/realtime graphs * experimental_session_count? with timeseries This becomes required as we try to include visits from sessions by default * Support hourly data in imports * Update bounce_rate in more csv tests * Update some time-series query tests * Fix for meta.warning being included incorrectly * Simplify imported.ex * experimental_session_count flag removal * moduledoc * Split interval and time modules	2024-07-09 14:25:02 +03:00
Karl-Aksel Puulmann	05ac840078	APIv2: macros, SQL cleanup (#4286 ) * Move fragments module under Plausible.Stats.SQL * Introduce select_merge_as macro This simplifies some select_merge calls * Simplify select_join_fields * Remove a needless dynamic * wrap_select_columns macro * Move metrics from base.ex to expression.ex * Move WhereBuilder under Plausible.Stats.SQL * Moduledoc * Improved macros * Wrap more code * select_merge_as more * Move defp to the end * wrap_alias	2024-07-03 16:32:25 +03:00
Karl-Aksel Puulmann	0594478add	APIv2: Replace breakdown module with QueryBuilder (#4293 ) * Revert "Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)" This reverts commit `ef5e0e0382`. * Allow querying events and pageviews from sessions table This is not strictly accurate, especially with shorter time frames, but is useful for a fallback mechanism. I'll figure out something around shorter time frames in the future. See also: https://github.com/plausible/analytics/pull/4292 * Only query events and pageviews in legacy breakdowns	2024-07-01 12:50:01 +03:00
Karl-Aksel Puulmann	ef5e0e0382	Revert "APIv2: Replace breakdown module with QueryBuilder (#4283 )" (#4292 ) This reverts commit `7dd12d1dd6`.	2024-07-01 10:50:44 +03:00
Karl-Aksel Puulmann	7dd12d1dd6	APIv2: Replace breakdown module with QueryBuilder (#4283 ) * WIP: Breakdown using QueryBuilder * Revert "Remove problematic test" This reverts commit `b442bb5d1f`. * Get more breakdown tests passing * Preload goals, sort when dealing with time_on_page * Handle conversion_rate in breakdowns * Simplify ordering by using selected_as consistently for dimensions * Get breakdown tests passing * Strings to atoms in keys for StatsController.transform_keys calls to work * Handle revenue metrics removal * Add test for nil-removal case * Include percentage metric * Fix and test with imported locations * Fixup time-on-page * Fix country/region automatic filters * Handle multiple imports (os/browser version) in importsv2 * Filter goals * Default to ordering by page as well * Calculate conversion rate on sessions if needed * Order by event dimensions - handles event:page special case * Update tests * Update more tests, handle goal=0 case in imports * Handle event:goal breakdowns correctly with filters * Revenue to money * Improved table deciding * Also update event:page filters on event:page breakdown * bounce_rate to 0 Previous behavior relied on two queries being made - new query leads to 0 naturally * Update pagination test * dont count non-pageviews as path goal completions * Make revenue logic breakdown-specific Its hard to fit into the new schema and likely needs a rethink for apiv2 * Retain previous behavior for TimeSeries module * Get GA4 test passing Most failures are related to ordering, pageviews shouldnt be read off of sessions * Clean up old methods * Simplify imported.ex * Dont crash on garbage filters * Reflect ordering-related change in test * Fix test data * Update table_decider * Re-simplify get_revenue_tracking_currency * Revert revenue changes * Use Query.set * Remove a TODO * csv importer: no pageviews Pageviews were incorrectly fetched from sessions table before, causing issues * csv importer tweaking * Remove use Plausible * to_existing_atom	2024-07-01 09:03:33 +03:00
Karl-Aksel Puulmann	2eeaf7a152	APIv2: Aggregates, timeseries, conversion_rate, hostname (#4251 ) * Add some aggregates tests * Port aggregates tests to do with filtering * Session metrics can be queried with event: filters * Solve a typo * Update a validation message * Add validations for views_per_visit * Port an aggregation/imports test * Optimize time dimension, add tests * Add first timeseries test, update parsing tests * Docs for SQL.Expression * Test timeseries more * Allow time explicitly in order_by * Add multiple breakdowns test * Refactor QueryOptimizer not to care about time dimension placement in dimensions array * Add test breaking down by event:hostname * Add hostname filtering logic to QueryOptimizer, unblock some tests * WIP: Breakdown by goal * conversion rate logic for query api * Update more tests * Set default order_by * dimension_label * preloaded_goals in tests * inline load_goals * Use Date functions over Timex * Comments * is_binary * Remove special form used in tests * Fix defmodule * WIP: Fix memory leak, event:page breakdown logic * Enable more tests, fix for group_conversion_rate without explicit visitors metric * Re-enable a partially commented test * Re-enable a partially commented test * Get last test passing * No imports order_by in apiv2 * Add a TODO * Remove redundant Util call * Update aggregate.ex * Remove problematic test	2024-06-28 08:59:54 +03:00
Karl-Aksel Puulmann	58a66a952c	APIv2 - initial PR (#4216 ) * WIP new querying * WIP: Move some aggregate code under new command * WIP: Add joins, handling less metrics * join events table to sessions if needed * Merge imported results with built query * Remove dead code * WIP: /api/v2/query * Allow grouping by time * Use JOIN for main query * Build query result * update parse_time * Make joinless order by work * First test * more breakdown tests * Serialize event:goal filters in an json-encodable way/reflection * Handle inner vs outer ORDER BY clauses properly * Handle single conversion_rate metric * Update more tests * Get parsing tests passing again * Validate filtered goal filter is configured * Enable more validation tests * Enable more event:name breakdown tests * Enable more breakdown tests * Validate site has access to custom props * Validate conversion_rate metric which is only allowed in some situations * Validate that empty event:props: is not valid * handle query.dimensions properly in table_decider * test more validations on metrics/dimensions * Validate session metrics in combination with event dimension(s) * Tests cleanup * Parse include.imports * Get imports working with new querying * Make more imports tests work * Make event:props:path imports-adjacent test work * Get query imports warning-related tests running * Remove dead pagination tests * Solve dead import * Solve some warnings * Update aggregate metrics tests * credo * Improve test naming * Lazy goal loading * Use datetime methods * Ecto -> SQL module name * Remove Expression.dimension mode option	2024-06-25 09:27:19 +03:00

27 Commits