Commit Graph

27 Commits

Author SHA1 Message Date
Adam Rutkowski 38381195f8
Reapply+bugfix: goals with custom props (#5936) (#5944)
* Reapply "Goals with custom props (Stats API queries, funnels) (#5936)" (#5943)

This reverts commit 45116bda7b.

* Gracefully handle `nil` for existing goal.custom_props

* Revert "Gracefully handle `nil` for existing goal.custom_props"

This reverts commit 8e38748775.

* Migration: make `goals.custom_props` non-null

* Adjust test
2025-12-11 13:09:34 +00:00
Adam Rutkowski 45116bda7b
Revert "Goals with custom props (Stats API queries, funnels) (#5936)" (#5943)
This reverts commit b6b9c2c0bf.
2025-12-11 09:17:33 +00:00
Adam Rutkowski b6b9c2c0bf
Goals with custom props (Stats API queries, funnels) (#5936)
* Migration: add custom propos to goals + revisit unique constraints

* Update constraints in goal schema (and move module)

* Add a comment, not really related but useful?

* Implement querying for goals with custom props

* Optimize goal_join_data (down to one iteration) + include goal custom props

* Test goal custom propos addition + new constraints

* Test querying for goals with custom propos attached

* Test funnels made of goals with custom props

* Format

* Fixup test name

* Fixup migration

* Unified goal join macro

* Remove dupe test

* Clean up user_id usage

* Fixup test to match the description

* Revert "Temporary: make room for pre/post migration constraint names (#5942)"

This reverts commit e4bc6b8715.

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2025-12-11 08:39:46 +00:00
Adrian Gruntkowski a2ba1256d2
Show revenue data in all breakdowns (#5767)
* Include revenue data for all detailed API responses except entry/exit pages

* Expose revenue data in all breakdown modals except entry/exit pages

* Add revenue metrics to breakdown response only on EE

* Change query builder to enable querying event metrics \w session dimension

* Add revenue metrics to entry and exit pages breakdowns

* Expose revenue data in entry and exit pages breakdowns

* Use `argMax` for `exit_page` and `exit_page_hostname` dimensions (h/t @ukutaht)

* Don't handle event-only dimensions with session-only metrics for now

* Add tests for all breakdowns

* Add clarifying comments in code

* Mark revenue tests as EE-only
2025-11-18 11:24:54 +00:00
Karl-Aksel Puulmann 1531386b76
FULL join for time:hour as well as time:minute (#5715)
* FULL join for time:hour as well as time:minute

Follow-up to https://github.com/plausible/analytics/pull/5694/files#r2321567271

A session might be active over multiple hours, but not (currently)
reported as such when requesting only specific metrics per hour.
This fixes that problem.

* Handle full join logic correctly

---------

Co-authored-by: Uku Taht <Uku.taht@gmail.com>
2025-09-15 14:14:59 +00:00
Karl-Aksel Puulmann db448d7404
Stats: Rebuild session smearing for timeseries (#5694)
* Refactor table_decider#partition_metrics

* Refactor query pipeline to return a list of subqueries after splitting

* Move order_by out of join logic

* Refactor joining logic in query_builder

1. JOIN type is now set in QueryOptimizer
2. JOIN logic is now table and list-size agnostic

* Comment an edge case

* Rebuild session/visit smearing

Previously, whenever graphing any visit metric hourly/realtime, visit_duration and other
visit metrics would be way higher than expected, due to long sessions
dragging each bucket up and up. Now visits/visitors metrics are still
smeared and other visit metrics are counted under last bucket user was
active in.

visits metric was also overcounted (see new tests).

* Remove unneeded case

* Unit test for smearing in tabledecider
2025-09-08 06:21:12 +00:00
RobertJoonas c7bf54325e
Fix visitors.csv vs main graph discrepancy (#5230)
* fix typo to stop logging error in test output

* add time:minute interval to internal api schema

* always get visitors and visits from sessions table when time:minute dimension used

* query-api generate types

* changelog update
2025-03-25 16:18:48 +00:00
Karl-Aksel Puulmann e9b30e0ba5
time-on-page: query (#5159)
* Default to time_on_page

* Add new columns to schema

* Read from new column in legacy query

* Read/write new imported_pages columns

* Remove time_on_page column from imported_pages

* Simple, stupid new_time_on_page metric

* Update csv_importer schema

* Refactor: consistent __internal helpers, this will help with joining the query

* Refactor select_joined_metrics

* Refactor: pass `query` to event_metric

* Refactor: remove needless site argument from various calls

* Legacy joining query attempt

* Move test around

* Add more tests for both legacy and new time_on_page metrics in query API

* time_on_page reported in seconds

* timeseries test for metric

* WIP

* Wrap main query in subquery - without this run into trouble performing the join

* Calculate time_on_page in main query, no more new_time_on_page

* Add some TODOs

* Return NULL over 0 when no visits with time-on-page data

* Update moduledoc

* Update some tests that were not expecting integers

* Add a TODO

* Update tests

* Make graphing time series with combined metrics work.

* Slightly more consistent approach to flag updating in APIv2

* Seeds with engagement data

* Make graphing time series when cutoff is in the middle work

Bakes less assumptions into everything as well.

* Rename to legacy_time_on_page_cutoff

* Fixup lib/plausible_web/controllers/api/external_query_api_controller.ex

* Remove a todo and dead/misleading code

* Remove a resolved todo

* Remove needless rounding

* gen types

* Update pages test

* Remove needless columns from select

* Update tests: timestamps and remove comment

* Flip branches
2025-03-11 11:19:58 +00:00
Karl-Aksel Puulmann 8aad9b1adc
APIv2: Fix `percentage` metric 500s (#5099)
* Refactor: remove metrics argument from merge_imported()

* Support querying percentage without visitors metric

* Fix ordering by special metrics with imports causing a 500

We don't calculate all metrics directly on imports, hence cannot order
the import by them either.

* Changelog
2025-02-26 07:07:17 +00:00
RobertJoonas 4d05245036
Page Scroll Goals (#5029)
* migration: add scroll_threshold to goals

* update goal schema

* setup simple UI for creating scroll goals

* add ability to filter and breakdown scroll goals

* fix goals form tests

* add valiation for page path exists

* move todo comments to expression.ex

* move tests

* make it clear that scroll_threshold is optional

* avoid calling Plausible.Goal.type() too many times

* do not consider 255 scroll depth a conversion

* migration: add scroll_threshold to goals

* do not drop the old index yet

* More efficient goals join again

* Refactor: move goals stats code explicitly under Stats.Goals module

* Move code under Plausible.Stats.Goals

* 254 -> 100

* add scroll_threshold field to goal schema + new unique constraint

* adjust test to test what it claims to

* mix format

* add migration

* consider imported query unsupported when page scroll goal filter

* add missing tests

* pattern match imported argument

* silence credo

* Update lib/plausible/stats/sql/expression.ex

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* use site_imports populated in test setup

---------

Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
2025-02-04 10:37:04 +00:00
Karl-Aksel Puulmann 9a4969e105
APIv2: Behavioral filtering (#4980)
* Expose site_id and site_native_stats_start_at via query

This allows to do more query-building without exposing and passing `site` directly.

* Very basic has_done/has_done_not operator support

No event:goal support yet, no validations

* Add validations that only event: dimensions can be used within has_done/has_done_not

* Allow event:goal filters nested within has_done/has_done_not behavioral filters

* Minor fix for do_decide_custom_prop_table

* has_done support for goals

Minor changes along the way:
- preloaded_goals structure changes
- event:goal restrictions were loosened within has_done
- we don't allow nesting has_done anymore

* Dont query imports when behavioral filters are present

* Update callsites of filtering_on_dimension? to work with new behavioral filters

* has_done_not -> has_not_done

* Changelog entry

* Typegen

* credo cleanup

* Fix changelog

* Remove changelog

* Mark has_done as internal-only

* combine two validations into a single loop

* has_done is now session-based not user-based

* Update a test

* Update transform_tree
2025-01-21 12:03:26 +00:00
Karl-Aksel Puulmann 0c8fed9570
Fix: Add missing SAMPLE BY to query (#4880)
* Fix: Add missing SAMPLE BY to query

* Update lib/plausible/stats/sql/query_builder.ex
2024-12-09 08:54:39 +00:00
Karl-Aksel Puulmann bec14ee77c
Improve report performance with high-cardinality import joins (#4848)
* Improve report performance in cases where site has a lot of unique pathnames

Ref: https://3.basecamp.com/5308029/buckets/39750953/card_tables/cards/8052057081

JOINs in ClickHouse are slow. In one degenerate case I found a user had
over 20 million unique paths in an import, which resulted in extremely slow
JOINs. This introduces a sort-of hacky solution to it by limiting the
amount of data analyzed.

Query timing without this change:
```
9 rows in set. Elapsed: 11.383 sec. Processed 49.16 million rows, 5.75 GB (4.32 million rows/s., 505.29 MB/s.)
Peak memory usage: 14.75 GiB.
```

After:
```
9 rows in set. Elapsed: 0.572 sec. Processed 49.18 million rows, 5.75 GB (86.03 million rows/s., 10.06 GB/s.)
Peak memory usage: 9.01 GiB.
```

* Splitting should no longer remove pagination. Handle special cases in special_metrics.ex

* select_merge_as in imports

This sets up selected_as aliases which will be used in a subsequent commit

* Add explicit ORDER BY to import

* Rewrite comment

* quoting

* merge conflict

* Split test

* Merge conflict fail fix
2024-12-05 10:05:57 +00:00
RobertJoonas 6822b29016
Average Scroll Depth Metric: put scroll depth on the dashboard under a feature flag (#4832)
* migration: add scroll_depth to events_v2

* (cherry-pick) ingest scroll depth

* replace convoluted test with more concise ones

* QueryParser: parse internal scroll_depth metric + validation

* turn QueryComparisonsTest into QueryInternalTest

* rename file

* (cherry pick) query scroll depth 15b14d3

...and move the tests into `internal_query_test.exs`

* review feedback

* Get rid of unnecessary separation between aggregate and group scroll depth
* Drop irrelevant other metrics in tests

* add test ensuring scroll depth unavailable in Stats API v1

* Put scroll depth on the dashboard

* Top Stats
* Main Graph
* Top Pages > Details

* feature flag for dashboard scroll depth access

* ignore credo warning

* enable scroll_depth flag in tests

* remove duplication

* write timestamps explicitly in a test

* revert moving tests around

* Add query_comparisons_test back
* Move scroll_depth tests into query_test
* Delete query_internal_test

* rename setup util (got updated on master)

* use pageleave_factory where applicable

* Use the correct generated query-api.d.ts

* npm format
2024-11-20 13:13:04 +00:00
Karl-Aksel Puulmann ef57502854
APIv2: Implement pagination and `include.total_rows` (#4575)
Offset-based pagination is used to make sure Looker integration
is able to work as efficiently as possible. To know how many
requests users need to do `include.total_rows` option was added.
2024-09-12 15:51:18 +03:00
Karl-Aksel Puulmann 8fa3a83129
APIv2: and/or/not support (#4480)
* First approximation of AND/OR/NOT support

Broken by this:
- Goal filtering
- Table deciding
- Imports

* TableDecider handle nesting

* Query.remove_top_level_filters

* Plausible.Stats.Imported.SQL.Expression

* Handle AND/OR/NOT with imported data, create Plausible.Stats.Imported.SQL.WhereBuilder

* Add parser validations for event:goal, event:hostname and event:props:x filters top level constraints

* Move module around

* Query.get_filter -> Filters.filtering_on_dimension? in some callsites

* Filters.get_toplevel_filter

* TableDecider.sessions_join_events?, remove old method

* Transforming filters in query_optimizer

* Query API tests for and/or/not

* Reorder parser steps

* Post-merge test fixups

* Solve merge issue

* Simplify filtering_on_dimension?

* Update transformer code

* dimensions_used_in_filters min_depth option, simplify parser validations

* rename_dimensions_used_in_filter

* fix rename_dimensions_used_in_filter

* Rename a test
2024-09-04 15:44:03 +03:00
ruslandoga 48dfce7a8b
update ecto_ch and use ecto hints for array joins (#4456) 2024-08-21 06:28:25 +02:00
Karl-Aksel Puulmann 3ab47e6401
Remove SAMPLE BY from an events query that joins sessions (#4341)
Some users are reporting wrong results for stats querying from events table
but with session filters.

This seems to be caused by sampling: Namely both tables end up sampled and the
incorrect sample rate seems to get used.

This is sadly impossible to test for in our test suite, but reveals itself in
production with the following query:

```sql
SELECT
    toUInt64 (round(uniq (se0.user_id) * any(_sample_factor))) AS visitors
FROM
    events_v2 AS se0 SAMPLE 20000000
    INNER JOIN (
        SELECT
            sss0.session_id AS session_id,
            any(_sample_factor) AS _sample_factor
        FROM
            sessions_v2 AS sss0 SAMPLE 20000000
        WHERE
            (sss0.site_id = _CAST (8195000, 'Int64'))
            AND (sss0.timestamp >= _CAST (1720497600, 'DateTime'))
            AND (sss0.start < _CAST (1720584000, 'DateTime'))
            AND (sss0.entry_page IN ('/'))
            AND (sss0.sign = 1)
        GROUP BY
            sss0.session_id
    ) AS ss1 ON se0.session_id = ss1.session_id
WHERE
    (se0.site_id = _CAST (8195000, 'Int64'))
    AND (se0.timestamp >= _CAST (1720497600, 'DateTime'))
    AND (se0.timestamp < _CAST (1720584000, 'DateTime'))
```

By removing the inner query SAMPLE clause the result changes >50%

Related old PR: https://github.com/plausible/analytics/pull/2962

This will likely require more work in the future so looking forward to that -
after my vacation.
2024-07-11 15:08:20 +03:00
Karl-Aksel Puulmann 0114689779
fix(TimeSeries): Work around weird time zones (#4337)
* Refactor Expression.dimension to accept q

* Handle quarter- and half-hour timezones

Previously APIv2 output didn't start at a full hour for these time zones
and main graph was blank

The core reasoning is that ClickHouse `timeSlots` is not time-zone
aware and works off of unix epoch - meaning that in time zones which
have an offset of 5:45 the "hours" reported would start at minute :45.

The fix is kind of silly - we now divide each hour into 4 and handle
things that way.

Related basecamp issue: https://3.basecamp.com/5308029/buckets/36789884/card_tables/cards/7590936581

* Fix test typo
2024-07-11 11:48:16 +03:00
Karl-Aksel Puulmann a9676546dc
APIv2: Cleanup (#4308)
* Remove a dead method

* Move select_event/session_metrics to within QueryBuilder

* Make a method private

* Move page_regex util around

* Move utc_boundaries helper around

* Fixups for utc_boundaries

* Add legacy notices

* Move Stats.query method around

* include_sentry_replay_info consistently

* Move filters out of select_group_fields

* Collapse conditions under select_group_fields

* Shorten some imported methods

* Use dimension over dim in group_by

* Separate SQL query building from imported.ex

* props.ex -> legacy_dimensions.ex

* Move some query building out of Query.ex

* Remove unneeded method

* put_filter -> add_filter

* Remove some query setters

* Moduledoc

* Split out validations and import tests from query_test

* Move tests around

* Split event:goal tests from query_test

* Remove redundant filters

* Remove dead code

* Split special metrics tests from query_test

* Legacy module
2024-07-09 14:31:45 +03:00
Karl-Aksel Puulmann a181f3eab3
APIv2: TimeSeries using QueryBuilder, release `experimental_session_count` (#4305)
* Move fragments module under Plausible.Stats.SQL

* Introduce select_merge_as macro

This simplifies some select_merge calls

* Simplify select_join_fields

* Remove a needless dynamic

* wrap_select_columns macro

* Move metrics from base.ex to expression.ex

* Move WhereBuilder under Plausible.Stats.SQL

* Moduledoc

* Improved macros

* Wrap more code

* select_merge_as more

* Move defp to the end

* include.time_labels parsing

* include.time_labels in result

Note that the previous implementation of the labels from TimeSeries.ex was broken

* Apply consistent function in imports and timeseries.ex

* Remove boilerplate

* WIP: Limited support for timeseries-with-querybuilder

* time:week dimension

* cleanup: property -> dimension

* Make querying with time series work

* Refactor: Move special metrics (percentage, conversion rate) to own module

* Explicitly format datetimes

* Consistent include_imported in special metrics

* Solve week-related crash

* conversion_rate hacking

* Keep include_imported consistent after splitting the query

* Simplify do_decide_tables

* Handle time dimensions in imports cleaner

* Allow time dimensions in custom property queries

* time:week handling continued

* cast_revenue_metrics_to_money

* fix `full_intervals` support

* Handle minute/realtime graphs

* experimental_session_count? with timeseries

This becomes required as we try to include visits from sessions by default

* Support hourly data in imports

* Update bounce_rate in more csv tests

* Update some time-series query tests

* Fix for meta.warning being included incorrectly

* Simplify imported.ex

* experimental_session_count flag removal

* moduledoc

* Split interval and time modules
2024-07-09 14:25:02 +03:00
Karl-Aksel Puulmann 05ac840078
APIv2: macros, SQL cleanup (#4286)
* Move fragments module under Plausible.Stats.SQL

* Introduce select_merge_as macro

This simplifies some select_merge calls

* Simplify select_join_fields

* Remove a needless dynamic

* wrap_select_columns macro

* Move metrics from base.ex to expression.ex

* Move WhereBuilder under Plausible.Stats.SQL

* Moduledoc

* Improved macros

* Wrap more code

* select_merge_as more

* Move defp to the end

* wrap_alias
2024-07-03 16:32:25 +03:00
Karl-Aksel Puulmann 0594478add
APIv2: Replace breakdown module with QueryBuilder (#4293)
* Revert "Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)"

This reverts commit ef5e0e0382.

* Allow querying events and pageviews from sessions table

This is not strictly accurate, especially with shorter time frames, but
is useful for a fallback mechanism. I'll figure out something around
shorter time frames in the future.

See also: https://github.com/plausible/analytics/pull/4292

* Only query events and pageviews in legacy breakdowns
2024-07-01 12:50:01 +03:00
Karl-Aksel Puulmann ef5e0e0382
Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)
This reverts commit 7dd12d1dd6.
2024-07-01 10:50:44 +03:00
Karl-Aksel Puulmann 7dd12d1dd6
APIv2: Replace breakdown module with QueryBuilder (#4283)
* WIP: Breakdown using QueryBuilder

* Revert "Remove problematic test"

This reverts commit b442bb5d1f.

* Get more breakdown tests passing

* Preload goals, sort when dealing with time_on_page

* Handle conversion_rate in breakdowns

* Simplify ordering by using selected_as consistently for dimensions

* Get breakdown tests passing

* Strings to atoms in keys for StatsController.transform_keys calls to work

* Handle revenue metrics removal

* Add test for nil-removal case

* Include percentage metric

* Fix and test with imported locations

* Fixup time-on-page

* Fix country/region automatic filters

* Handle multiple imports (os/browser version) in importsv2

* Filter goals

* Default to ordering by page as well

* Calculate conversion rate on sessions if needed

* Order by event dimensions - handles event:page special case

* Update tests

* Update more tests, handle goal=0 case in imports

* Handle event:goal breakdowns correctly with filters

* Revenue to money

* Improved table deciding

* Also update event:page filters on event:page breakdown

* bounce_rate to 0

Previous behavior relied on two queries being made - new query leads to 0 naturally

* Update pagination test

* dont count non-pageviews as path goal completions

* Make revenue logic breakdown-specific

Its hard to fit into the new schema and likely needs a rethink for apiv2

* Retain previous behavior for TimeSeries module

* Get GA4 test passing

Most failures are related to ordering, pageviews shouldnt be read off of sessions

* Clean up old methods

* Simplify imported.ex

* Dont crash on garbage filters

* Reflect ordering-related change in test

* Fix test data

* Update table_decider

* Re-simplify get_revenue_tracking_currency

* Revert revenue changes

* Use Query.set

* Remove a TODO

* csv importer: no pageviews

Pageviews were incorrectly fetched from sessions table before, causing issues

* csv importer tweaking

* Remove use Plausible

* to_existing_atom
2024-07-01 09:03:33 +03:00
Karl-Aksel Puulmann 2eeaf7a152
APIv2: Aggregates, timeseries, conversion_rate, hostname (#4251)
* Add some aggregates tests

* Port aggregates tests to do with filtering

* Session metrics can be queried with event: filters

* Solve a typo

* Update a validation message

* Add validations for views_per_visit

* Port an aggregation/imports test

* Optimize time dimension, add tests

* Add first timeseries test, update parsing tests

* Docs for SQL.Expression

* Test timeseries more

* Allow time explicitly in order_by

* Add multiple breakdowns test

* Refactor QueryOptimizer not to care about time dimension placement in dimensions array

* Add test breaking down by event:hostname

* Add hostname filtering logic to QueryOptimizer, unblock some tests

* WIP: Breakdown by goal

* conversion rate logic for query api

* Update more tests

* Set default order_by

* dimension_label

* preloaded_goals in tests

* inline load_goals

* Use Date functions over Timex

* Comments

* is_binary

* Remove special form used in tests

* Fix defmodule

* WIP: Fix memory leak, event:page breakdown logic

* Enable more tests, fix for group_conversion_rate without explicit visitors metric

* Re-enable a partially commented test

* Re-enable a partially commented test

* Get last test passing

* No imports order_by in apiv2

* Add a TODO

* Remove redundant Util call

* Update aggregate.ex

* Remove problematic test
2024-06-28 08:59:54 +03:00
Karl-Aksel Puulmann 58a66a952c
APIv2 - initial PR (#4216)
* WIP new querying

* WIP: Move some aggregate code under new command

* WIP: Add joins, handling less metrics

* join events table to sessions if needed

* Merge imported results with built query

* Remove dead code

* WIP: /api/v2/query

* Allow grouping by time

* Use JOIN for main query

* Build query result

* update parse_time

* Make joinless order by work

* First test

* more breakdown tests

* Serialize event:goal filters in an json-encodable way/reflection

* Handle inner vs outer ORDER BY clauses properly

* Handle single conversion_rate metric

* Update more tests

* Get parsing tests passing again

* Validate filtered goal filter is configured

* Enable more validation tests

* Enable more event:name breakdown tests

* Enable more breakdown tests

* Validate site has access to custom props

* Validate conversion_rate metric which is only allowed in some situations

* Validate that empty event:props: is not valid

* handle query.dimensions properly in table_decider

* test more validations on metrics/dimensions

* Validate session metrics in combination with event dimension(s)

* Tests cleanup

* Parse include.imports

* Get imports working with new querying

* Make more imports tests work

* Make event:props:path imports-adjacent test work

* Get query imports warning-related tests running

* Remove dead pagination tests

* Solve dead import

* Solve some warnings

* Update aggregate metrics tests

* credo

* Improve test naming

* Lazy goal loading

* Use datetime methods

* Ecto -> SQL module name

* Remove Expression.dimension mode option
2024-06-25 09:27:19 +03:00