Commit Graph

25 Commits

Author SHA1 Message Date
Adam Rutkowski b64a2355a0
Platform upgrade: elixir 1.19.4 and otp 27.3.4.6 (#5920)
* Platform upgrade: elixir 1.19.4 and otp 27.3.4.6

* !fixup

* credo

* credo

* Bump cache

* fix docker image tag

* hum

* hum

* Match docker images

* Define ALPINE_VERSION once

* fixup
2025-12-01 12:50:49 +00:00
Karl-Aksel Puulmann 1531386b76
FULL join for time:hour as well as time:minute (#5715)
* FULL join for time:hour as well as time:minute

Follow-up to https://github.com/plausible/analytics/pull/5694/files#r2321567271

A session might be active over multiple hours, but not (currently)
reported as such when requesting only specific metrics per hour.
This fixes that problem.

* Handle full join logic correctly

---------

Co-authored-by: Uku Taht <Uku.taht@gmail.com>
2025-09-15 14:14:59 +00:00
Adrian Gruntkowski 8d6d828d1d
Reduce reliance on `Timex` and use native time API where feasible (#5712)
* Replace usages of `Timex.to_unix` with native API

* Wrap call to `Timex.is_valid_timezone?`

* Wrap calls to `Timex.today(tz)`

* Replace `Timex.today()` with `Date.utc_today()`

* Replace `Timex.now()` with `DateTime.utc_now()`

* Replace `Timex.compare` with `Date.compare`

* Wrap `Timex.diff` calls

* Replace `Timex.Timezone.convert` with `DateTime.shift_zone!`

* Wrap `Timex.parse!`

* Replace `Timex.to_date` with native API calls

* Replace `Timex.beginning|end_of...` with native API calls for Date

* Wrap `Timex.beginning|end_of...` for DateTimes and Dates for years

* Replace `Timex.format(!)` with native API calls

* Replace `Timex.to_naive_datetime` with native API calls

* Wrap time humanizing routines using Timex

* Remove unnecessary `use Timex` instances

* Replace `Timex.shift` with native API calls

* Make `QueryParser.parse_date` handle gaps and ambiguities gracefully

* Replace `Timex.now(tz)` with `DateTime.now!(tz)`

* Use a more suitable Date function for comparison (h/t @aerosol)
2025-09-10 10:21:36 +00:00
Karl-Aksel Puulmann db448d7404
Stats: Rebuild session smearing for timeseries (#5694)
* Refactor table_decider#partition_metrics

* Refactor query pipeline to return a list of subqueries after splitting

* Move order_by out of join logic

* Refactor joining logic in query_builder

1. JOIN type is now set in QueryOptimizer
2. JOIN logic is now table and list-size agnostic

* Comment an edge case

* Rebuild session/visit smearing

Previously, whenever graphing any visit metric hourly/realtime, visit_duration and other
visit metrics would be way higher than expected, due to long sessions
dragging each bucket up and up. Now visits/visitors metrics are still
smeared and other visit metrics are counted under last bucket user was
active in.

visits metric was also overcounted (see new tests).

* Remove unneeded case

* Unit test for smearing in tabledecider
2025-09-08 06:21:12 +00:00
Karl-Aksel Puulmann 6216ade4ee
Trim comparisons for year/month (#5702) 2025-09-04 10:28:04 +00:00
Karl-Aksel Puulmann 9af40a278d
Trim month, year, day periods to local now on main graph (#5698)
* Revert "Revert "Trim `month`, `year`, `day` periods to local now on main graph (#5668)" (#5684)"

This reverts commit 2d11681f25.

* Does not trim for comparisons

* Include the current hour in the trimmed time range
2025-09-04 09:13:17 +00:00
Adrian Gruntkowski 2d11681f25
Revert "Trim `month`, `year`, `day` periods to local now on main graph (#5668)" (#5684)
This reverts commit 563c3d22ba.
2025-08-28 14:58:57 +00:00
Adam Rutkowski 563c3d22ba
Trim `month`, `year`, `day` periods to local now on main graph (#5668)
* Trim `month`, `year`, `day` periods to now on main graph

* Revert "Trim `month`, `year`, `day` periods to now on main graph"

This reverts commit 4f3930111d3a2737a51686e067d9b64f0d85ad58.

* Re-implement trimming in query optimizer instead

* Update JS types

* This is getting confusing

* Trim in stats_controller

* Set `query.now` based on query_praser and date results

* query.period -> query.input_date_range

* Changelog

* Test for response query.date_range

---------

Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
2025-08-28 09:51:56 +00:00
Karl-Aksel Puulmann 0fcbcc3a8c
time-on-page: csv import/export support, preparation for release (#5274)
* CSV import/export support for time-on-page

Note only the new time-on-page metric is exported this way

* visibility check for graphing of time_on_page

* FE no longer receives/sends legacy_time_on_page_cutoff

* Remove current_user from exports

* Remove legacy_time_on_page_cutoff from query.include, make behavior work off of site.legacy_time_on_page_cutoff explicitly

* Remove dead function

* More current_user_id removals
2025-04-08 06:25:11 +00:00
Karl-Aksel Puulmann e9b30e0ba5
time-on-page: query (#5159)
* Default to time_on_page

* Add new columns to schema

* Read from new column in legacy query

* Read/write new imported_pages columns

* Remove time_on_page column from imported_pages

* Simple, stupid new_time_on_page metric

* Update csv_importer schema

* Refactor: consistent __internal helpers, this will help with joining the query

* Refactor select_joined_metrics

* Refactor: pass `query` to event_metric

* Refactor: remove needless site argument from various calls

* Legacy joining query attempt

* Move test around

* Add more tests for both legacy and new time_on_page metrics in query API

* time_on_page reported in seconds

* timeseries test for metric

* WIP

* Wrap main query in subquery - without this run into trouble performing the join

* Calculate time_on_page in main query, no more new_time_on_page

* Add some TODOs

* Return NULL over 0 when no visits with time-on-page data

* Update moduledoc

* Update some tests that were not expecting integers

* Add a TODO

* Update tests

* Make graphing time series with combined metrics work.

* Slightly more consistent approach to flag updating in APIv2

* Seeds with engagement data

* Make graphing time series when cutoff is in the middle work

Bakes less assumptions into everything as well.

* Rename to legacy_time_on_page_cutoff

* Fixup lib/plausible_web/controllers/api/external_query_api_controller.ex

* Remove a todo and dead/misleading code

* Remove a resolved todo

* Remove needless rounding

* gen types

* Update pages test

* Remove needless columns from select

* Update tests: timestamps and remove comment

* Flip branches
2025-03-11 11:19:58 +00:00
Karl-Aksel Puulmann 714f7f4603
Refactor: Remove `filter_key` terminology from backend (#4994)
* Remove filter_key terminology from the backend

This resurfaced in a recent review, `dimension` or `filter_dimension` is the correct terminology in the backend

* Update table_decider

* Solve new issues
2025-01-21 15:36:34 +00:00
Karl-Aksel Puulmann bec14ee77c
Improve report performance with high-cardinality import joins (#4848)
* Improve report performance in cases where site has a lot of unique pathnames

Ref: https://3.basecamp.com/5308029/buckets/39750953/card_tables/cards/8052057081

JOINs in ClickHouse are slow. In one degenerate case I found a user had
over 20 million unique paths in an import, which resulted in extremely slow
JOINs. This introduces a sort-of hacky solution to it by limiting the
amount of data analyzed.

Query timing without this change:
```
9 rows in set. Elapsed: 11.383 sec. Processed 49.16 million rows, 5.75 GB (4.32 million rows/s., 505.29 MB/s.)
Peak memory usage: 14.75 GiB.
```

After:
```
9 rows in set. Elapsed: 0.572 sec. Processed 49.18 million rows, 5.75 GB (86.03 million rows/s., 10.06 GB/s.)
Peak memory usage: 9.01 GiB.
```

* Splitting should no longer remove pagination. Handle special cases in special_metrics.ex

* select_merge_as in imports

This sets up selected_as aliases which will be used in a subsequent commit

* Add explicit ORDER BY to import

* Rewrite comment

* quoting

* merge conflict

* Split test

* Merge conflict fail fix
2024-12-05 10:05:57 +00:00
ruslandoga 40f28ed151
rm Timex.diff/3 (#4695) 2024-11-04 09:18:04 +00:00
Karl-Aksel Puulmann 472f4f181c
Comparisons pagination fixes (#4697)
* Add filter clauses for each main result filter

This handles the case where main query has a limit and results change. Doesnt handle metrics like percentage.

* Fix percentage calculations by ignoring breakdown-related filters in totals queries

* Refactor comparisons test suite

* Move comparisons logic to comparisons module

* New route for internal query tests

Only to be used in testing

* Support comparison queries with imports/breakdowns

* time dimension predicate extraction

* Clean up a test

* Update docstring

* Update route test

* fix a typo
2024-10-22 10:23:40 +00:00
Karl-Aksel Puulmann 141eea88ff
APIv2: Revenue metrics (#4659)
* WIP: Start refactoring revenue metrics

* Hacks to make things work

* Remove old revenue code, remove revenue metrics if needed

* Update query_optimizer docs

* Minor fixes

* Add tests around average/total revenue when non-revenue goal filtering going on

* Optimize, calculate filters as expected (OR-ing clauses)

* Revenue: Handle cases where revenue metrics should not be returned or nil

* Expose revenue metrics in internal schema, add tests

* Docstring

* Remove TODO

* Typegen

* Solve warnings

* Remove nesting

* ce_test fix

* Tag tests as ee_only

* Fix: When filtering by revenue goal and no conversions, return 0.0 instead of nil

* More straight-forward preloading logic
2024-10-09 10:18:48 +00:00
Karl-Aksel Puulmann 5ad743c8d3
APIv2: Comparisons for breakdowns, timeseries, time_on_page (#4647)
* Refactor comparisons to a new options format

Prerequisite for APIv2 comparison work

* Experiment with default include deduplication

* WIP

Oops, breaks `include.total_rows`

* WIP

* Refactor breakdown.ex

* Pagination fix: dont paginate split subqueries

* Timeseries tests pass

* Aggregate tests use QueryExecutor

* Simplify QueryExecutor

* Handle legacy time-on-page metric in query_executor.ex

No behavioral changes

* Remove keep_requested_metrics

* Clean up imports

* Refactor aggregate.ex to be more straight-forward in output format building

* top stats: compute comparison via apiv2

* Minor cleanups

* WIP: Pipelines

* WIP: refactor for code cleanliness

* QueryExecutor to QueryRunner

* Make compilable

* Comparisons for timeseries works

Except for comparisons where comparison window is bigger than source query window

* Add special case for timeseries

* JSON schema tests for comparisons

* Test comparisons with the new API

* comparison date range parsing improvement

* Make comparisons api internal-only

* typegen

* credo

* Different schemata

* get_comparison_query

* Add comment on timeseries result format

* comparisons typegen

* Percent change for revenue metrics fix

* Use defstruct for query_runner over map

* Remove preloading atoms
2024-10-08 10:13:04 +00:00
Karl-Aksel Puulmann bd11b4cf67
APIv2: Standard iso8601 timestamps, operate on UTC (#4563)
* query.date_range is now in UTC instead of user timezone

This simplifies things down the line and fixes several bugs where
query.date_range is cast to naivedatetime for ecto purposes

Many places still remain broken:
- comparison queries
- `to_date_range` calls

* Make default_for_date_range not care about time zones

* Make timezone parameter mandatory for to_date_range

* Simplify utc_date_range, update legacy query builder

* Fix more cases where query date range is needed

* query.date_range -> query.utc_time_range

* Query.date_range/1 function

* ensure_include_imported update

* Clean up send_email_report
2024-09-11 09:21:59 +03:00
Karl-Aksel Puulmann 8fa3a83129
APIv2: and/or/not support (#4480)
* First approximation of AND/OR/NOT support

Broken by this:
- Goal filtering
- Table deciding
- Imports

* TableDecider handle nesting

* Query.remove_top_level_filters

* Plausible.Stats.Imported.SQL.Expression

* Handle AND/OR/NOT with imported data, create Plausible.Stats.Imported.SQL.WhereBuilder

* Add parser validations for event:goal, event:hostname and event:props:x filters top level constraints

* Move module around

* Query.get_filter -> Filters.filtering_on_dimension? in some callsites

* Filters.get_toplevel_filter

* TableDecider.sessions_join_events?, remove old method

* Transforming filters in query_optimizer

* Query API tests for and/or/not

* Reorder parser steps

* Post-merge test fixups

* Solve merge issue

* Simplify filtering_on_dimension?

* Update transformer code

* dimensions_used_in_filters min_depth option, simplify parser validations

* rename_dimensions_used_in_filter

* fix rename_dimensions_used_in_filter

* Rename a test
2024-09-04 15:44:03 +03:00
RobertJoonas f04c47f881
Support realtime periods in API v2 (#4469)
* add realtime date_ranges into the private API schema

This commit starts parsing date ranges into a new NaiveDateTimeRange
struct, rather than a simple Date.Range.

* transform realtime labels into negative integers + test

* move schema type argument to last position in helper functions

* allow passing a date param + tests

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* keep test file structure consistent

* Turn NaiveDateTimeRange into DateTimeRange

* change 'now' field from NaiveDateTime to DateTime in v2 query

* fix minute interval labels + add missing tests

* return query_result.date_range as iso8601 timestamps with timezone

* allow timestamps with tz as date_range arguments in API v2

* delete Plausible.Timezones.to_utc_datetime

* simplify returning comparison periods

* add comment about realtime not supported in comparisons

* pass only now instead of test_opts

* drop redundant else branch

* separate tests

* stick to a single check_date_range function in tests

* fix credo error

---------

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
2024-09-02 12:56:58 +03:00
Karl-Aksel Puulmann a181f3eab3
APIv2: TimeSeries using QueryBuilder, release `experimental_session_count` (#4305)
* Move fragments module under Plausible.Stats.SQL

* Introduce select_merge_as macro

This simplifies some select_merge calls

* Simplify select_join_fields

* Remove a needless dynamic

* wrap_select_columns macro

* Move metrics from base.ex to expression.ex

* Move WhereBuilder under Plausible.Stats.SQL

* Moduledoc

* Improved macros

* Wrap more code

* select_merge_as more

* Move defp to the end

* include.time_labels parsing

* include.time_labels in result

Note that the previous implementation of the labels from TimeSeries.ex was broken

* Apply consistent function in imports and timeseries.ex

* Remove boilerplate

* WIP: Limited support for timeseries-with-querybuilder

* time:week dimension

* cleanup: property -> dimension

* Make querying with time series work

* Refactor: Move special metrics (percentage, conversion rate) to own module

* Explicitly format datetimes

* Consistent include_imported in special metrics

* Solve week-related crash

* conversion_rate hacking

* Keep include_imported consistent after splitting the query

* Simplify do_decide_tables

* Handle time dimensions in imports cleaner

* Allow time dimensions in custom property queries

* time:week handling continued

* cast_revenue_metrics_to_money

* fix `full_intervals` support

* Handle minute/realtime graphs

* experimental_session_count? with timeseries

This becomes required as we try to include visits from sessions by default

* Support hourly data in imports

* Update bounce_rate in more csv tests

* Update some time-series query tests

* Fix for meta.warning being included incorrectly

* Simplify imported.ex

* experimental_session_count flag removal

* moduledoc

* Split interval and time modules
2024-07-09 14:25:02 +03:00
Karl-Aksel Puulmann 0594478add
APIv2: Replace breakdown module with QueryBuilder (#4293)
* Revert "Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)"

This reverts commit ef5e0e0382.

* Allow querying events and pageviews from sessions table

This is not strictly accurate, especially with shorter time frames, but
is useful for a fallback mechanism. I'll figure out something around
shorter time frames in the future.

See also: https://github.com/plausible/analytics/pull/4292

* Only query events and pageviews in legacy breakdowns
2024-07-01 12:50:01 +03:00
Karl-Aksel Puulmann ef5e0e0382
Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)
This reverts commit 7dd12d1dd6.
2024-07-01 10:50:44 +03:00
Karl-Aksel Puulmann 7dd12d1dd6
APIv2: Replace breakdown module with QueryBuilder (#4283)
* WIP: Breakdown using QueryBuilder

* Revert "Remove problematic test"

This reverts commit b442bb5d1f.

* Get more breakdown tests passing

* Preload goals, sort when dealing with time_on_page

* Handle conversion_rate in breakdowns

* Simplify ordering by using selected_as consistently for dimensions

* Get breakdown tests passing

* Strings to atoms in keys for StatsController.transform_keys calls to work

* Handle revenue metrics removal

* Add test for nil-removal case

* Include percentage metric

* Fix and test with imported locations

* Fixup time-on-page

* Fix country/region automatic filters

* Handle multiple imports (os/browser version) in importsv2

* Filter goals

* Default to ordering by page as well

* Calculate conversion rate on sessions if needed

* Order by event dimensions - handles event:page special case

* Update tests

* Update more tests, handle goal=0 case in imports

* Handle event:goal breakdowns correctly with filters

* Revenue to money

* Improved table deciding

* Also update event:page filters on event:page breakdown

* bounce_rate to 0

Previous behavior relied on two queries being made - new query leads to 0 naturally

* Update pagination test

* dont count non-pageviews as path goal completions

* Make revenue logic breakdown-specific

Its hard to fit into the new schema and likely needs a rethink for apiv2

* Retain previous behavior for TimeSeries module

* Get GA4 test passing

Most failures are related to ordering, pageviews shouldnt be read off of sessions

* Clean up old methods

* Simplify imported.ex

* Dont crash on garbage filters

* Reflect ordering-related change in test

* Fix test data

* Update table_decider

* Re-simplify get_revenue_tracking_currency

* Revert revenue changes

* Use Query.set

* Remove a TODO

* csv importer: no pageviews

Pageviews were incorrectly fetched from sessions table before, causing issues

* csv importer tweaking

* Remove use Plausible

* to_existing_atom
2024-07-01 09:03:33 +03:00
Karl-Aksel Puulmann 2eeaf7a152
APIv2: Aggregates, timeseries, conversion_rate, hostname (#4251)
* Add some aggregates tests

* Port aggregates tests to do with filtering

* Session metrics can be queried with event: filters

* Solve a typo

* Update a validation message

* Add validations for views_per_visit

* Port an aggregation/imports test

* Optimize time dimension, add tests

* Add first timeseries test, update parsing tests

* Docs for SQL.Expression

* Test timeseries more

* Allow time explicitly in order_by

* Add multiple breakdowns test

* Refactor QueryOptimizer not to care about time dimension placement in dimensions array

* Add test breaking down by event:hostname

* Add hostname filtering logic to QueryOptimizer, unblock some tests

* WIP: Breakdown by goal

* conversion rate logic for query api

* Update more tests

* Set default order_by

* dimension_label

* preloaded_goals in tests

* inline load_goals

* Use Date functions over Timex

* Comments

* is_binary

* Remove special form used in tests

* Fix defmodule

* WIP: Fix memory leak, event:page breakdown logic

* Enable more tests, fix for group_conversion_rate without explicit visitors metric

* Re-enable a partially commented test

* Re-enable a partially commented test

* Get last test passing

* No imports order_by in apiv2

* Add a TODO

* Remove redundant Util call

* Update aggregate.ex

* Remove problematic test
2024-06-28 08:59:54 +03:00
Karl-Aksel Puulmann 58a66a952c
APIv2 - initial PR (#4216)
* WIP new querying

* WIP: Move some aggregate code under new command

* WIP: Add joins, handling less metrics

* join events table to sessions if needed

* Merge imported results with built query

* Remove dead code

* WIP: /api/v2/query

* Allow grouping by time

* Use JOIN for main query

* Build query result

* update parse_time

* Make joinless order by work

* First test

* more breakdown tests

* Serialize event:goal filters in an json-encodable way/reflection

* Handle inner vs outer ORDER BY clauses properly

* Handle single conversion_rate metric

* Update more tests

* Get parsing tests passing again

* Validate filtered goal filter is configured

* Enable more validation tests

* Enable more event:name breakdown tests

* Enable more breakdown tests

* Validate site has access to custom props

* Validate conversion_rate metric which is only allowed in some situations

* Validate that empty event:props: is not valid

* handle query.dimensions properly in table_decider

* test more validations on metrics/dimensions

* Validate session metrics in combination with event dimension(s)

* Tests cleanup

* Parse include.imports

* Get imports working with new querying

* Make more imports tests work

* Make event:props:path imports-adjacent test work

* Get query imports warning-related tests running

* Remove dead pagination tests

* Solve dead import

* Solve some warnings

* Update aggregate metrics tests

* credo

* Improve test naming

* Lazy goal loading

* Use datetime methods

* Ecto -> SQL module name

* Remove Expression.dimension mode option
2024-06-25 09:27:19 +03:00