analytics

Commit Graph

Author	SHA1	Message	Date
Adam Rutkowski	b64a2355a0	Platform upgrade: elixir 1.19.4 and otp 27.3.4.6 (#5920 ) * Platform upgrade: elixir 1.19.4 and otp 27.3.4.6 * !fixup * credo * credo * Bump cache * fix docker image tag * hum * hum * Match docker images * Define ALPINE_VERSION once * fixup	2025-12-01 12:50:49 +00:00
Karl-Aksel Puulmann	1531386b76	FULL join for time:hour as well as time:minute (#5715 ) * FULL join for time:hour as well as time:minute Follow-up to https://github.com/plausible/analytics/pull/5694/files#r2321567271 A session might be active over multiple hours, but not (currently) reported as such when requesting only specific metrics per hour. This fixes that problem. * Handle full join logic correctly --------- Co-authored-by: Uku Taht <Uku.taht@gmail.com>	2025-09-15 14:14:59 +00:00
Adrian Gruntkowski	8d6d828d1d	Reduce reliance on `Timex` and use native time API where feasible (#5712 ) * Replace usages of `Timex.to_unix` with native API * Wrap call to `Timex.is_valid_timezone?` * Wrap calls to `Timex.today(tz)` * Replace `Timex.today()` with `Date.utc_today()` * Replace `Timex.now()` with `DateTime.utc_now()` * Replace `Timex.compare` with `Date.compare` * Wrap `Timex.diff` calls * Replace `Timex.Timezone.convert` with `DateTime.shift_zone!` * Wrap `Timex.parse!` * Replace `Timex.to_date` with native API calls * Replace `Timex.beginning\|end_of...` with native API calls for Date * Wrap `Timex.beginning\|end_of...` for DateTimes and Dates for years * Replace `Timex.format(!)` with native API calls * Replace `Timex.to_naive_datetime` with native API calls * Wrap time humanizing routines using Timex * Remove unnecessary `use Timex` instances * Replace `Timex.shift` with native API calls * Make `QueryParser.parse_date` handle gaps and ambiguities gracefully * Replace `Timex.now(tz)` with `DateTime.now!(tz)` * Use a more suitable Date function for comparison (h/t @aerosol)	2025-09-10 10:21:36 +00:00
Karl-Aksel Puulmann	db448d7404	Stats: Rebuild session smearing for timeseries (#5694 ) * Refactor table_decider#partition_metrics * Refactor query pipeline to return a list of subqueries after splitting * Move order_by out of join logic * Refactor joining logic in query_builder 1. JOIN type is now set in QueryOptimizer 2. JOIN logic is now table and list-size agnostic * Comment an edge case * Rebuild session/visit smearing Previously, whenever graphing any visit metric hourly/realtime, visit_duration and other visit metrics would be way higher than expected, due to long sessions dragging each bucket up and up. Now visits/visitors metrics are still smeared and other visit metrics are counted under last bucket user was active in. visits metric was also overcounted (see new tests). * Remove unneeded case * Unit test for smearing in tabledecider	2025-09-08 06:21:12 +00:00
Karl-Aksel Puulmann	6216ade4ee	Trim comparisons for year/month (#5702 )	2025-09-04 10:28:04 +00:00
Karl-Aksel Puulmann	9af40a278d	Trim month, year, day periods to local now on main graph (#5698 ) * Revert "Revert "Trim `month`, `year`, `day` periods to local now on main graph (#5668)" (#5684)" This reverts commit `2d11681f25`. * Does not trim for comparisons * Include the current hour in the trimmed time range	2025-09-04 09:13:17 +00:00
Adrian Gruntkowski	2d11681f25	Revert "Trim `month`, `year`, `day` periods to local now on main graph (#5668 )" (#5684 ) This reverts commit `563c3d22ba`.	2025-08-28 14:58:57 +00:00
Adam Rutkowski	563c3d22ba	Trim `month`, `year`, `day` periods to local now on main graph (#5668 ) * Trim `month`, `year`, `day` periods to now on main graph * Revert "Trim `month`, `year`, `day` periods to now on main graph" This reverts commit 4f3930111d3a2737a51686e067d9b64f0d85ad58. * Re-implement trimming in query optimizer instead * Update JS types * This is getting confusing * Trim in stats_controller * Set `query.now` based on query_praser and date results * query.period -> query.input_date_range * Changelog * Test for response query.date_range --------- Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>	2025-08-28 09:51:56 +00:00
Karl-Aksel Puulmann	0fcbcc3a8c	time-on-page: csv import/export support, preparation for release (#5274 ) * CSV import/export support for time-on-page Note only the new time-on-page metric is exported this way * visibility check for graphing of time_on_page * FE no longer receives/sends legacy_time_on_page_cutoff * Remove current_user from exports * Remove legacy_time_on_page_cutoff from query.include, make behavior work off of site.legacy_time_on_page_cutoff explicitly * Remove dead function * More current_user_id removals	2025-04-08 06:25:11 +00:00
Karl-Aksel Puulmann	e9b30e0ba5	time-on-page: query (#5159 ) * Default to time_on_page * Add new columns to schema * Read from new column in legacy query * Read/write new imported_pages columns * Remove time_on_page column from imported_pages * Simple, stupid new_time_on_page metric * Update csv_importer schema * Refactor: consistent __internal helpers, this will help with joining the query * Refactor select_joined_metrics * Refactor: pass `query` to event_metric * Refactor: remove needless site argument from various calls * Legacy joining query attempt * Move test around * Add more tests for both legacy and new time_on_page metrics in query API * time_on_page reported in seconds * timeseries test for metric * WIP * Wrap main query in subquery - without this run into trouble performing the join * Calculate time_on_page in main query, no more new_time_on_page * Add some TODOs * Return NULL over 0 when no visits with time-on-page data * Update moduledoc * Update some tests that were not expecting integers * Add a TODO * Update tests * Make graphing time series with combined metrics work. * Slightly more consistent approach to flag updating in APIv2 * Seeds with engagement data * Make graphing time series when cutoff is in the middle work Bakes less assumptions into everything as well. * Rename to legacy_time_on_page_cutoff * Fixup lib/plausible_web/controllers/api/external_query_api_controller.ex * Remove a todo and dead/misleading code * Remove a resolved todo * Remove needless rounding * gen types * Update pages test * Remove needless columns from select * Update tests: timestamps and remove comment * Flip branches	2025-03-11 11:19:58 +00:00
Karl-Aksel Puulmann	714f7f4603	Refactor: Remove `filter_key` terminology from backend (#4994 ) * Remove filter_key terminology from the backend This resurfaced in a recent review, `dimension` or `filter_dimension` is the correct terminology in the backend * Update table_decider * Solve new issues	2025-01-21 15:36:34 +00:00
Karl-Aksel Puulmann	bec14ee77c	Improve report performance with high-cardinality import joins (#4848 ) * Improve report performance in cases where site has a lot of unique pathnames Ref: https://3.basecamp.com/5308029/buckets/39750953/card_tables/cards/8052057081 JOINs in ClickHouse are slow. In one degenerate case I found a user had over 20 million unique paths in an import, which resulted in extremely slow JOINs. This introduces a sort-of hacky solution to it by limiting the amount of data analyzed. Query timing without this change: ``` 9 rows in set. Elapsed: 11.383 sec. Processed 49.16 million rows, 5.75 GB (4.32 million rows/s., 505.29 MB/s.) Peak memory usage: 14.75 GiB. ``` After: ``` 9 rows in set. Elapsed: 0.572 sec. Processed 49.18 million rows, 5.75 GB (86.03 million rows/s., 10.06 GB/s.) Peak memory usage: 9.01 GiB. ``` * Splitting should no longer remove pagination. Handle special cases in special_metrics.ex * select_merge_as in imports This sets up selected_as aliases which will be used in a subsequent commit * Add explicit ORDER BY to import * Rewrite comment * quoting * merge conflict * Split test * Merge conflict fail fix	2024-12-05 10:05:57 +00:00
ruslandoga	40f28ed151	rm Timex.diff/3 (#4695 )	2024-11-04 09:18:04 +00:00
Karl-Aksel Puulmann	472f4f181c	Comparisons pagination fixes (#4697 ) * Add filter clauses for each main result filter This handles the case where main query has a limit and results change. Doesnt handle metrics like percentage. * Fix percentage calculations by ignoring breakdown-related filters in totals queries * Refactor comparisons test suite * Move comparisons logic to comparisons module * New route for internal query tests Only to be used in testing * Support comparison queries with imports/breakdowns * time dimension predicate extraction * Clean up a test * Update docstring * Update route test * fix a typo	2024-10-22 10:23:40 +00:00
Karl-Aksel Puulmann	141eea88ff	APIv2: Revenue metrics (#4659 ) * WIP: Start refactoring revenue metrics * Hacks to make things work * Remove old revenue code, remove revenue metrics if needed * Update query_optimizer docs * Minor fixes * Add tests around average/total revenue when non-revenue goal filtering going on * Optimize, calculate filters as expected (OR-ing clauses) * Revenue: Handle cases where revenue metrics should not be returned or nil * Expose revenue metrics in internal schema, add tests * Docstring * Remove TODO * Typegen * Solve warnings * Remove nesting * ce_test fix * Tag tests as ee_only * Fix: When filtering by revenue goal and no conversions, return 0.0 instead of nil * More straight-forward preloading logic	2024-10-09 10:18:48 +00:00
Karl-Aksel Puulmann	5ad743c8d3	APIv2: Comparisons for breakdowns, timeseries, time_on_page (#4647 ) * Refactor comparisons to a new options format Prerequisite for APIv2 comparison work * Experiment with default include deduplication * WIP Oops, breaks `include.total_rows` * WIP * Refactor breakdown.ex * Pagination fix: dont paginate split subqueries * Timeseries tests pass * Aggregate tests use QueryExecutor * Simplify QueryExecutor * Handle legacy time-on-page metric in query_executor.ex No behavioral changes * Remove keep_requested_metrics * Clean up imports * Refactor aggregate.ex to be more straight-forward in output format building * top stats: compute comparison via apiv2 * Minor cleanups * WIP: Pipelines * WIP: refactor for code cleanliness * QueryExecutor to QueryRunner * Make compilable * Comparisons for timeseries works Except for comparisons where comparison window is bigger than source query window * Add special case for timeseries * JSON schema tests for comparisons * Test comparisons with the new API * comparison date range parsing improvement * Make comparisons api internal-only * typegen * credo * Different schemata * get_comparison_query * Add comment on timeseries result format * comparisons typegen * Percent change for revenue metrics fix * Use defstruct for query_runner over map * Remove preloading atoms	2024-10-08 10:13:04 +00:00
Karl-Aksel Puulmann	bd11b4cf67	APIv2: Standard iso8601 timestamps, operate on UTC (#4563 ) * query.date_range is now in UTC instead of user timezone This simplifies things down the line and fixes several bugs where query.date_range is cast to naivedatetime for ecto purposes Many places still remain broken: - comparison queries - `to_date_range` calls * Make default_for_date_range not care about time zones * Make timezone parameter mandatory for to_date_range * Simplify utc_date_range, update legacy query builder * Fix more cases where query date range is needed * query.date_range -> query.utc_time_range * Query.date_range/1 function * ensure_include_imported update * Clean up send_email_report	2024-09-11 09:21:59 +03:00
Karl-Aksel Puulmann	8fa3a83129	APIv2: and/or/not support (#4480 ) * First approximation of AND/OR/NOT support Broken by this: - Goal filtering - Table deciding - Imports * TableDecider handle nesting * Query.remove_top_level_filters * Plausible.Stats.Imported.SQL.Expression * Handle AND/OR/NOT with imported data, create Plausible.Stats.Imported.SQL.WhereBuilder * Add parser validations for event:goal, event:hostname and event:props:x filters top level constraints * Move module around * Query.get_filter -> Filters.filtering_on_dimension? in some callsites * Filters.get_toplevel_filter * TableDecider.sessions_join_events?, remove old method * Transforming filters in query_optimizer * Query API tests for and/or/not * Reorder parser steps * Post-merge test fixups * Solve merge issue * Simplify filtering_on_dimension? * Update transformer code * dimensions_used_in_filters min_depth option, simplify parser validations * rename_dimensions_used_in_filter * fix rename_dimensions_used_in_filter * Rename a test	2024-09-04 15:44:03 +03:00
RobertJoonas	f04c47f881	Support realtime periods in API v2 (#4469 ) * add realtime date_ranges into the private API schema This commit starts parsing date ranges into a new NaiveDateTimeRange struct, rather than a simple Date.Range. * transform realtime labels into negative integers + test * move schema type argument to last position in helper functions * allow passing a date param + tests * Update test/plausible/stats/query_parser_test.exs Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> * Update test/plausible/stats/query_parser_test.exs Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> * Update test/plausible/stats/query_parser_test.exs Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> * Update test/plausible/stats/query_parser_test.exs Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> * keep test file structure consistent * Turn NaiveDateTimeRange into DateTimeRange * change 'now' field from NaiveDateTime to DateTime in v2 query * fix minute interval labels + add missing tests * return query_result.date_range as iso8601 timestamps with timezone * allow timestamps with tz as date_range arguments in API v2 * delete Plausible.Timezones.to_utc_datetime * simplify returning comparison periods * add comment about realtime not supported in comparisons * pass only now instead of test_opts * drop redundant else branch * separate tests * stick to a single check_date_range function in tests * fix credo error --------- Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>	2024-09-02 12:56:58 +03:00
Karl-Aksel Puulmann	a181f3eab3	APIv2: TimeSeries using QueryBuilder, release `experimental_session_count` (#4305 ) * Move fragments module under Plausible.Stats.SQL * Introduce select_merge_as macro This simplifies some select_merge calls * Simplify select_join_fields * Remove a needless dynamic * wrap_select_columns macro * Move metrics from base.ex to expression.ex * Move WhereBuilder under Plausible.Stats.SQL * Moduledoc * Improved macros * Wrap more code * select_merge_as more * Move defp to the end * include.time_labels parsing * include.time_labels in result Note that the previous implementation of the labels from TimeSeries.ex was broken * Apply consistent function in imports and timeseries.ex * Remove boilerplate * WIP: Limited support for timeseries-with-querybuilder * time:week dimension * cleanup: property -> dimension * Make querying with time series work * Refactor: Move special metrics (percentage, conversion rate) to own module * Explicitly format datetimes * Consistent include_imported in special metrics * Solve week-related crash * conversion_rate hacking * Keep include_imported consistent after splitting the query * Simplify do_decide_tables * Handle time dimensions in imports cleaner * Allow time dimensions in custom property queries * time:week handling continued * cast_revenue_metrics_to_money * fix `full_intervals` support * Handle minute/realtime graphs * experimental_session_count? with timeseries This becomes required as we try to include visits from sessions by default * Support hourly data in imports * Update bounce_rate in more csv tests * Update some time-series query tests * Fix for meta.warning being included incorrectly * Simplify imported.ex * experimental_session_count flag removal * moduledoc * Split interval and time modules	2024-07-09 14:25:02 +03:00
Karl-Aksel Puulmann	0594478add	APIv2: Replace breakdown module with QueryBuilder (#4293 ) * Revert "Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)" This reverts commit `ef5e0e0382`. * Allow querying events and pageviews from sessions table This is not strictly accurate, especially with shorter time frames, but is useful for a fallback mechanism. I'll figure out something around shorter time frames in the future. See also: https://github.com/plausible/analytics/pull/4292 * Only query events and pageviews in legacy breakdowns	2024-07-01 12:50:01 +03:00
Karl-Aksel Puulmann	ef5e0e0382	Revert "APIv2: Replace breakdown module with QueryBuilder (#4283 )" (#4292 ) This reverts commit `7dd12d1dd6`.	2024-07-01 10:50:44 +03:00
Karl-Aksel Puulmann	7dd12d1dd6	APIv2: Replace breakdown module with QueryBuilder (#4283 ) * WIP: Breakdown using QueryBuilder * Revert "Remove problematic test" This reverts commit `b442bb5d1f`. * Get more breakdown tests passing * Preload goals, sort when dealing with time_on_page * Handle conversion_rate in breakdowns * Simplify ordering by using selected_as consistently for dimensions * Get breakdown tests passing * Strings to atoms in keys for StatsController.transform_keys calls to work * Handle revenue metrics removal * Add test for nil-removal case * Include percentage metric * Fix and test with imported locations * Fixup time-on-page * Fix country/region automatic filters * Handle multiple imports (os/browser version) in importsv2 * Filter goals * Default to ordering by page as well * Calculate conversion rate on sessions if needed * Order by event dimensions - handles event:page special case * Update tests * Update more tests, handle goal=0 case in imports * Handle event:goal breakdowns correctly with filters * Revenue to money * Improved table deciding * Also update event:page filters on event:page breakdown * bounce_rate to 0 Previous behavior relied on two queries being made - new query leads to 0 naturally * Update pagination test * dont count non-pageviews as path goal completions * Make revenue logic breakdown-specific Its hard to fit into the new schema and likely needs a rethink for apiv2 * Retain previous behavior for TimeSeries module * Get GA4 test passing Most failures are related to ordering, pageviews shouldnt be read off of sessions * Clean up old methods * Simplify imported.ex * Dont crash on garbage filters * Reflect ordering-related change in test * Fix test data * Update table_decider * Re-simplify get_revenue_tracking_currency * Revert revenue changes * Use Query.set * Remove a TODO * csv importer: no pageviews Pageviews were incorrectly fetched from sessions table before, causing issues * csv importer tweaking * Remove use Plausible * to_existing_atom	2024-07-01 09:03:33 +03:00
Karl-Aksel Puulmann	2eeaf7a152	APIv2: Aggregates, timeseries, conversion_rate, hostname (#4251 ) * Add some aggregates tests * Port aggregates tests to do with filtering * Session metrics can be queried with event: filters * Solve a typo * Update a validation message * Add validations for views_per_visit * Port an aggregation/imports test * Optimize time dimension, add tests * Add first timeseries test, update parsing tests * Docs for SQL.Expression * Test timeseries more * Allow time explicitly in order_by * Add multiple breakdowns test * Refactor QueryOptimizer not to care about time dimension placement in dimensions array * Add test breaking down by event:hostname * Add hostname filtering logic to QueryOptimizer, unblock some tests * WIP: Breakdown by goal * conversion rate logic for query api * Update more tests * Set default order_by * dimension_label * preloaded_goals in tests * inline load_goals * Use Date functions over Timex * Comments * is_binary * Remove special form used in tests * Fix defmodule * WIP: Fix memory leak, event:page breakdown logic * Enable more tests, fix for group_conversion_rate without explicit visitors metric * Re-enable a partially commented test * Re-enable a partially commented test * Get last test passing * No imports order_by in apiv2 * Add a TODO * Remove redundant Util call * Update aggregate.ex * Remove problematic test	2024-06-28 08:59:54 +03:00
Karl-Aksel Puulmann	58a66a952c	APIv2 - initial PR (#4216 ) * WIP new querying * WIP: Move some aggregate code under new command * WIP: Add joins, handling less metrics * join events table to sessions if needed * Merge imported results with built query * Remove dead code * WIP: /api/v2/query * Allow grouping by time * Use JOIN for main query * Build query result * update parse_time * Make joinless order by work * First test * more breakdown tests * Serialize event:goal filters in an json-encodable way/reflection * Handle inner vs outer ORDER BY clauses properly * Handle single conversion_rate metric * Update more tests * Get parsing tests passing again * Validate filtered goal filter is configured * Enable more validation tests * Enable more event:name breakdown tests * Enable more breakdown tests * Validate site has access to custom props * Validate conversion_rate metric which is only allowed in some situations * Validate that empty event:props: is not valid * handle query.dimensions properly in table_decider * test more validations on metrics/dimensions * Validate session metrics in combination with event dimension(s) * Tests cleanup * Parse include.imports * Get imports working with new querying * Make more imports tests work * Make event:props:path imports-adjacent test work * Get query imports warning-related tests running * Remove dead pagination tests * Solve dead import * Solve some warnings * Update aggregate metrics tests * credo * Improve test naming * Lazy goal loading * Use datetime methods * Ecto -> SQL module name * Remove Expression.dimension mode option	2024-06-25 09:27:19 +03:00

25 Commits