* FULL join for time:hour as well as time:minute
Follow-up to https://github.com/plausible/analytics/pull/5694/files#r2321567271
A session might be active over multiple hours, but not (currently)
reported as such when requesting only specific metrics per hour.
This fixes that problem.
* Handle full join logic correctly
---------
Co-authored-by: Uku Taht <Uku.taht@gmail.com>
* Replace usages of `Timex.to_unix` with native API
* Wrap call to `Timex.is_valid_timezone?`
* Wrap calls to `Timex.today(tz)`
* Replace `Timex.today()` with `Date.utc_today()`
* Replace `Timex.now()` with `DateTime.utc_now()`
* Replace `Timex.compare` with `Date.compare`
* Wrap `Timex.diff` calls
* Replace `Timex.Timezone.convert` with `DateTime.shift_zone!`
* Wrap `Timex.parse!`
* Replace `Timex.to_date` with native API calls
* Replace `Timex.beginning|end_of...` with native API calls for Date
* Wrap `Timex.beginning|end_of...` for DateTimes and Dates for years
* Replace `Timex.format(!)` with native API calls
* Replace `Timex.to_naive_datetime` with native API calls
* Wrap time humanizing routines using Timex
* Remove unnecessary `use Timex` instances
* Replace `Timex.shift` with native API calls
* Make `QueryParser.parse_date` handle gaps and ambiguities gracefully
* Replace `Timex.now(tz)` with `DateTime.now!(tz)`
* Use a more suitable Date function for comparison (h/t @aerosol)
* Refactor table_decider#partition_metrics
* Refactor query pipeline to return a list of subqueries after splitting
* Move order_by out of join logic
* Refactor joining logic in query_builder
1. JOIN type is now set in QueryOptimizer
2. JOIN logic is now table and list-size agnostic
* Comment an edge case
* Rebuild session/visit smearing
Previously, whenever graphing any visit metric hourly/realtime, visit_duration and other
visit metrics would be way higher than expected, due to long sessions
dragging each bucket up and up. Now visits/visitors metrics are still
smeared and other visit metrics are counted under last bucket user was
active in.
visits metric was also overcounted (see new tests).
* Remove unneeded case
* Unit test for smearing in tabledecider
* Revert "Revert "Trim `month`, `year`, `day` periods to local now on main graph (#5668)" (#5684)"
This reverts commit 2d11681f25.
* Does not trim for comparisons
* Include the current hour in the trimmed time range
* Trim `month`, `year`, `day` periods to now on main graph
* Revert "Trim `month`, `year`, `day` periods to now on main graph"
This reverts commit 4f3930111d3a2737a51686e067d9b64f0d85ad58.
* Re-implement trimming in query optimizer instead
* Update JS types
* This is getting confusing
* Trim in stats_controller
* Set `query.now` based on query_praser and date results
* query.period -> query.input_date_range
* Changelog
* Test for response query.date_range
---------
Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
* CSV import/export support for time-on-page
Note only the new time-on-page metric is exported this way
* visibility check for graphing of time_on_page
* FE no longer receives/sends legacy_time_on_page_cutoff
* Remove current_user from exports
* Remove legacy_time_on_page_cutoff from query.include, make behavior work off of site.legacy_time_on_page_cutoff explicitly
* Remove dead function
* More current_user_id removals
* Default to time_on_page
* Add new columns to schema
* Read from new column in legacy query
* Read/write new imported_pages columns
* Remove time_on_page column from imported_pages
* Simple, stupid new_time_on_page metric
* Update csv_importer schema
* Refactor: consistent __internal helpers, this will help with joining the query
* Refactor select_joined_metrics
* Refactor: pass `query` to event_metric
* Refactor: remove needless site argument from various calls
* Legacy joining query attempt
* Move test around
* Add more tests for both legacy and new time_on_page metrics in query API
* time_on_page reported in seconds
* timeseries test for metric
* WIP
* Wrap main query in subquery - without this run into trouble performing the join
* Calculate time_on_page in main query, no more new_time_on_page
* Add some TODOs
* Return NULL over 0 when no visits with time-on-page data
* Update moduledoc
* Update some tests that were not expecting integers
* Add a TODO
* Update tests
* Make graphing time series with combined metrics work.
* Slightly more consistent approach to flag updating in APIv2
* Seeds with engagement data
* Make graphing time series when cutoff is in the middle work
Bakes less assumptions into everything as well.
* Rename to legacy_time_on_page_cutoff
* Fixup lib/plausible_web/controllers/api/external_query_api_controller.ex
* Remove a todo and dead/misleading code
* Remove a resolved todo
* Remove needless rounding
* gen types
* Update pages test
* Remove needless columns from select
* Update tests: timestamps and remove comment
* Flip branches
* Remove filter_key terminology from the backend
This resurfaced in a recent review, `dimension` or `filter_dimension` is the correct terminology in the backend
* Update table_decider
* Solve new issues
* Improve report performance in cases where site has a lot of unique pathnames
Ref: https://3.basecamp.com/5308029/buckets/39750953/card_tables/cards/8052057081
JOINs in ClickHouse are slow. In one degenerate case I found a user had
over 20 million unique paths in an import, which resulted in extremely slow
JOINs. This introduces a sort-of hacky solution to it by limiting the
amount of data analyzed.
Query timing without this change:
```
9 rows in set. Elapsed: 11.383 sec. Processed 49.16 million rows, 5.75 GB (4.32 million rows/s., 505.29 MB/s.)
Peak memory usage: 14.75 GiB.
```
After:
```
9 rows in set. Elapsed: 0.572 sec. Processed 49.18 million rows, 5.75 GB (86.03 million rows/s., 10.06 GB/s.)
Peak memory usage: 9.01 GiB.
```
* Splitting should no longer remove pagination. Handle special cases in special_metrics.ex
* select_merge_as in imports
This sets up selected_as aliases which will be used in a subsequent commit
* Add explicit ORDER BY to import
* Rewrite comment
* quoting
* merge conflict
* Split test
* Merge conflict fail fix
* Add filter clauses for each main result filter
This handles the case where main query has a limit and results change. Doesnt handle metrics like percentage.
* Fix percentage calculations by ignoring breakdown-related filters in totals queries
* Refactor comparisons test suite
* Move comparisons logic to comparisons module
* New route for internal query tests
Only to be used in testing
* Support comparison queries with imports/breakdowns
* time dimension predicate extraction
* Clean up a test
* Update docstring
* Update route test
* fix a typo
* WIP: Start refactoring revenue metrics
* Hacks to make things work
* Remove old revenue code, remove revenue metrics if needed
* Update query_optimizer docs
* Minor fixes
* Add tests around average/total revenue when non-revenue goal filtering going on
* Optimize, calculate filters as expected (OR-ing clauses)
* Revenue: Handle cases where revenue metrics should not be returned or nil
* Expose revenue metrics in internal schema, add tests
* Docstring
* Remove TODO
* Typegen
* Solve warnings
* Remove nesting
* ce_test fix
* Tag tests as ee_only
* Fix: When filtering by revenue goal and no conversions, return 0.0 instead of nil
* More straight-forward preloading logic
* Refactor comparisons to a new options format
Prerequisite for APIv2 comparison work
* Experiment with default include deduplication
* WIP
Oops, breaks `include.total_rows`
* WIP
* Refactor breakdown.ex
* Pagination fix: dont paginate split subqueries
* Timeseries tests pass
* Aggregate tests use QueryExecutor
* Simplify QueryExecutor
* Handle legacy time-on-page metric in query_executor.ex
No behavioral changes
* Remove keep_requested_metrics
* Clean up imports
* Refactor aggregate.ex to be more straight-forward in output format building
* top stats: compute comparison via apiv2
* Minor cleanups
* WIP: Pipelines
* WIP: refactor for code cleanliness
* QueryExecutor to QueryRunner
* Make compilable
* Comparisons for timeseries works
Except for comparisons where comparison window is bigger than source query window
* Add special case for timeseries
* JSON schema tests for comparisons
* Test comparisons with the new API
* comparison date range parsing improvement
* Make comparisons api internal-only
* typegen
* credo
* Different schemata
* get_comparison_query
* Add comment on timeseries result format
* comparisons typegen
* Percent change for revenue metrics fix
* Use defstruct for query_runner over map
* Remove preloading atoms
* query.date_range is now in UTC instead of user timezone
This simplifies things down the line and fixes several bugs where
query.date_range is cast to naivedatetime for ecto purposes
Many places still remain broken:
- comparison queries
- `to_date_range` calls
* Make default_for_date_range not care about time zones
* Make timezone parameter mandatory for to_date_range
* Simplify utc_date_range, update legacy query builder
* Fix more cases where query date range is needed
* query.date_range -> query.utc_time_range
* Query.date_range/1 function
* ensure_include_imported update
* Clean up send_email_report
* add realtime date_ranges into the private API schema
This commit starts parsing date ranges into a new NaiveDateTimeRange
struct, rather than a simple Date.Range.
* transform realtime labels into negative integers + test
* move schema type argument to last position in helper functions
* allow passing a date param + tests
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* keep test file structure consistent
* Turn NaiveDateTimeRange into DateTimeRange
* change 'now' field from NaiveDateTime to DateTime in v2 query
* fix minute interval labels + add missing tests
* return query_result.date_range as iso8601 timestamps with timezone
* allow timestamps with tz as date_range arguments in API v2
* delete Plausible.Timezones.to_utc_datetime
* simplify returning comparison periods
* add comment about realtime not supported in comparisons
* pass only now instead of test_opts
* drop redundant else branch
* separate tests
* stick to a single check_date_range function in tests
* fix credo error
---------
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Move fragments module under Plausible.Stats.SQL
* Introduce select_merge_as macro
This simplifies some select_merge calls
* Simplify select_join_fields
* Remove a needless dynamic
* wrap_select_columns macro
* Move metrics from base.ex to expression.ex
* Move WhereBuilder under Plausible.Stats.SQL
* Moduledoc
* Improved macros
* Wrap more code
* select_merge_as more
* Move defp to the end
* include.time_labels parsing
* include.time_labels in result
Note that the previous implementation of the labels from TimeSeries.ex was broken
* Apply consistent function in imports and timeseries.ex
* Remove boilerplate
* WIP: Limited support for timeseries-with-querybuilder
* time:week dimension
* cleanup: property -> dimension
* Make querying with time series work
* Refactor: Move special metrics (percentage, conversion rate) to own module
* Explicitly format datetimes
* Consistent include_imported in special metrics
* Solve week-related crash
* conversion_rate hacking
* Keep include_imported consistent after splitting the query
* Simplify do_decide_tables
* Handle time dimensions in imports cleaner
* Allow time dimensions in custom property queries
* time:week handling continued
* cast_revenue_metrics_to_money
* fix `full_intervals` support
* Handle minute/realtime graphs
* experimental_session_count? with timeseries
This becomes required as we try to include visits from sessions by default
* Support hourly data in imports
* Update bounce_rate in more csv tests
* Update some time-series query tests
* Fix for meta.warning being included incorrectly
* Simplify imported.ex
* experimental_session_count flag removal
* moduledoc
* Split interval and time modules
* Revert "Revert "APIv2: Replace breakdown module with QueryBuilder (#4283)" (#4292)"
This reverts commit ef5e0e0382.
* Allow querying events and pageviews from sessions table
This is not strictly accurate, especially with shorter time frames, but
is useful for a fallback mechanism. I'll figure out something around
shorter time frames in the future.
See also: https://github.com/plausible/analytics/pull/4292
* Only query events and pageviews in legacy breakdowns
* WIP: Breakdown using QueryBuilder
* Revert "Remove problematic test"
This reverts commit b442bb5d1f.
* Get more breakdown tests passing
* Preload goals, sort when dealing with time_on_page
* Handle conversion_rate in breakdowns
* Simplify ordering by using selected_as consistently for dimensions
* Get breakdown tests passing
* Strings to atoms in keys for StatsController.transform_keys calls to work
* Handle revenue metrics removal
* Add test for nil-removal case
* Include percentage metric
* Fix and test with imported locations
* Fixup time-on-page
* Fix country/region automatic filters
* Handle multiple imports (os/browser version) in importsv2
* Filter goals
* Default to ordering by page as well
* Calculate conversion rate on sessions if needed
* Order by event dimensions - handles event:page special case
* Update tests
* Update more tests, handle goal=0 case in imports
* Handle event:goal breakdowns correctly with filters
* Revenue to money
* Improved table deciding
* Also update event:page filters on event:page breakdown
* bounce_rate to 0
Previous behavior relied on two queries being made - new query leads to 0 naturally
* Update pagination test
* dont count non-pageviews as path goal completions
* Make revenue logic breakdown-specific
Its hard to fit into the new schema and likely needs a rethink for apiv2
* Retain previous behavior for TimeSeries module
* Get GA4 test passing
Most failures are related to ordering, pageviews shouldnt be read off of sessions
* Clean up old methods
* Simplify imported.ex
* Dont crash on garbage filters
* Reflect ordering-related change in test
* Fix test data
* Update table_decider
* Re-simplify get_revenue_tracking_currency
* Revert revenue changes
* Use Query.set
* Remove a TODO
* csv importer: no pageviews
Pageviews were incorrectly fetched from sessions table before, causing issues
* csv importer tweaking
* Remove use Plausible
* to_existing_atom
* Add some aggregates tests
* Port aggregates tests to do with filtering
* Session metrics can be queried with event: filters
* Solve a typo
* Update a validation message
* Add validations for views_per_visit
* Port an aggregation/imports test
* Optimize time dimension, add tests
* Add first timeseries test, update parsing tests
* Docs for SQL.Expression
* Test timeseries more
* Allow time explicitly in order_by
* Add multiple breakdowns test
* Refactor QueryOptimizer not to care about time dimension placement in dimensions array
* Add test breaking down by event:hostname
* Add hostname filtering logic to QueryOptimizer, unblock some tests
* WIP: Breakdown by goal
* conversion rate logic for query api
* Update more tests
* Set default order_by
* dimension_label
* preloaded_goals in tests
* inline load_goals
* Use Date functions over Timex
* Comments
* is_binary
* Remove special form used in tests
* Fix defmodule
* WIP: Fix memory leak, event:page breakdown logic
* Enable more tests, fix for group_conversion_rate without explicit visitors metric
* Re-enable a partially commented test
* Re-enable a partially commented test
* Get last test passing
* No imports order_by in apiv2
* Add a TODO
* Remove redundant Util call
* Update aggregate.ex
* Remove problematic test
* WIP new querying
* WIP: Move some aggregate code under new command
* WIP: Add joins, handling less metrics
* join events table to sessions if needed
* Merge imported results with built query
* Remove dead code
* WIP: /api/v2/query
* Allow grouping by time
* Use JOIN for main query
* Build query result
* update parse_time
* Make joinless order by work
* First test
* more breakdown tests
* Serialize event:goal filters in an json-encodable way/reflection
* Handle inner vs outer ORDER BY clauses properly
* Handle single conversion_rate metric
* Update more tests
* Get parsing tests passing again
* Validate filtered goal filter is configured
* Enable more validation tests
* Enable more event:name breakdown tests
* Enable more breakdown tests
* Validate site has access to custom props
* Validate conversion_rate metric which is only allowed in some situations
* Validate that empty event:props: is not valid
* handle query.dimensions properly in table_decider
* test more validations on metrics/dimensions
* Validate session metrics in combination with event dimension(s)
* Tests cleanup
* Parse include.imports
* Get imports working with new querying
* Make more imports tests work
* Make event:props:path imports-adjacent test work
* Get query imports warning-related tests running
* Remove dead pagination tests
* Solve dead import
* Solve some warnings
* Update aggregate metrics tests
* credo
* Improve test naming
* Lazy goal loading
* Use datetime methods
* Ecto -> SQL module name
* Remove Expression.dimension mode option