Commit Graph

8 Commits

Author SHA1 Message Date
Karl-Aksel Puulmann d620432227
Channels: Speed up clickhouse calculations (#4789)
* Fix interpolation in data_migration.ex

* Speed up calculating acquisition_channel in clickhouse

The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.

Benchmarked via this query:

```sql
SELECT
  channel,
  count(),
  countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```

Before this fix:
```
query_duration_ms:                                                57960
DiskReadElapsedMs:                                                374.712
RealTimeMs:                                                       2891200.667
UserTimeMs:                                                       2704024.783
SystemTimeMs:                                                     1693.265
OSCPUWaitMs:                                                      90.253
OSCPUVirtualTimeMs:                                               2705709.58
```

After this fix:
```
query_duration_ms:                                                4367
DiskReadElapsedMs:                                                454.356
RealTimeMs:                                                       213892.207
UserTimeMs:                                                       199363.485
SystemTimeMs:                                                     1479.364
OSCPUWaitMs:                                                      13.739
OSCPUVirtualTimeMs:                                               200837.37
```

Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.

* CREATE OR REPLACE table with SELECT
2024-11-11 10:39:51 +00:00
Karl-Aksel Puulmann dbf7a099a3
Acquisition channels: Functions to calculate channels in clickhouse (#4701)
* Expose a few data migration functions, add quiet option to do_run

* Create functions and test acquisition channel logic in clickhouse

Tests were lifted from test/plausible_web/controllers/api/external_controller_test.exs

* Clean up test code a bit

* Property test for acquisition channels

* Handle empty strings properly in reference implementation

* Fix spelling, minor issues

* Revert "Property test for acquisition channels"

This reverts commit 3fa0e0e4eb.

* Only test clickhouse functions

* Solve minor code issue

* update channels logic

* Revert "Only test clickhouse functions"

This reverts commit e12784031a.

* Add more tests

* Add small result assertion

* Make query options explicit in data migrations

* Move multi-query running logic to within datamigration lib

* Unbreak numeric ids migration

* Named params directly to Clickhouse

* Update reference test implementation

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2024-11-06 11:27:02 +00:00
Adrian Gruntkowski 17b12ddaeb
Implement basics of Teams (#4658)
* Extend schemas with new fields and relationships for teams

* Implement listing sites and sites with invitations with teams

* Implement creating invitations with teams

* Implement accepting invites with teams

* Add `Teams.SiteTransfer` schema

* Implement creating ownership transfers

* Implement accepting site transfer between teams

* Make results shapes from `Teams.Memberships` role functions more consistent

* Remove :team relation from ApiKey schema

* Pass and provision team on subscription creation

* Pass and provision team on enterprise plan creation

* Implement creating site for a team

* Keep team in sync during legacy ownership transfer and invitations

* Resolve conflict in `Teams.get_or_create` without transaction

* Abstract `GracePeriod` manipulation behind `Plausible.Users`

* Put `User.start_trial` behind `Plausible.Users` API

* Sync team fields on user update, if team exists

* Sync cleaning invitations, updating and removing members

* Transfer invitations too

* Implement backfill script

* Allow separate pg repo for backfill script

* Rollback purposefully at the end

* Update backfill script with parallel processing

* Use `IS DISTINCT FROM` when comparing nullable fields

* Handle no teams to backfill case gracefully when reporting

* Parallelize guest memberships backfill

* Remove transaction wrapping and query timeouts

* Make team sync check more granular and fix formatting

* Wrap single team backfill in a transatction for consistent restarts

* Make invitation and site transfer backfills preserve invitation ID

* Update migration repo config for easier dev access

* Backfill teams for users with subscriptions without sites

* Log timestamps

* Put teams sync behind a compile-time flag

* Keep timestamps in sync and fix subscriptions backfill

* Fix formatting

* Make credo happy

* Don't `use Plausible.Migration` to avoid dialyzer complaining

None of the tooling from there is used anywhere and `@repo` can
be defined directly in the migration script.

* Drop SSL workarounds in the backfill script

---------

Co-authored-by: Adam Rutkowski <hq@mtod.org>
2024-10-21 07:35:23 +00:00
Karl-Aksel Puulmann 26d41ddbb9
Mutation to populate event session columns (#3844)
* WIP mutation to populate event session columns

* Remove duplication

* report errors, allow_nondeterministic_updates

* use right columns

* Update existing columns instead of session_* ones

* Make dialyzer happy

* Fix issue with passing pre-existing params in

* Logger -> IO.puts

* Use IngestRepo.config for connection settings

* Make dictionary options configurable

* Move allow_nondeterministic_mutations to within the migration

* Solve credo warning about too deep nesting

* Missed logger call

* Pattern matching in function head
2024-03-08 09:27:24 +02:00
Karl-Aksel Puulmann 8ba2b934b3
Add data migration for moving sessions_v2 table to VersionedCollapsingMergeTree (#3802)
* Add data migration for moving to VersionedCollapsingMergeTree

This has been tested locally and partially on staging. Still requires a bit of work to verify.

Verification query:

```
SELECT main._partition_id, tmp.count, main.count
FROM (
SELECT _partition_id, count() AS count
FROM sessions_v2_tmp_versioned
GROUP BY _partition_id
) AS tmp
FULL OUTER JOIN (
SELECT _partition_id, count() AS count
FROM sessions_v2
GROUP BY _partition_id
) AS main
ON (tmp._partition_id == main._partition_id)
ORDER BY main._partition_id
```

* Add an early exit to migration

* cluster? extract common code
2024-02-22 09:54:39 +02:00
ruslandoga adcce15632
Make self-hosted data migration easier (#2865)
* default to v2

* allow N defaults in data migration prompt and custom messages

* join domains lookup

* remove duplicate test runs from ci (both are v2)
2023-04-21 09:33:57 +02:00
hq1 d2f2c69387
Conditionally support switching between v1 and v2 clickhouse schemas (#2780)
* Remove ClickhouseSetup module

This has been an implicit point of contact to many
tests. From now on the goal is for each test to maintain
its own, isolated setup so that no accidental clashes
and implicit assumptions are relied upon.

* Implement v2 schema check

An environment variable V2_MIGRATION_DONE acts like
a feature flag, switching plausible from using old events/sessions
schemas to v2 schemas introduced by NumericIDs migration.

* Run both test suites sequentially

While the code for v1 and v2 schemas must be kept still,
we will from now on run tests against both code paths.
Secondary test run will set V2_MIGRATION_DONE=1 variable,
thus making all `Plausible.v2?()` checks return `true'.

* Remove unused function

This is a remnant from the short period when
we would check for existing events before allowing
creating a new site.

* Update test setups/factories with v2 migration check

* Make GateKeeper return site id along with :allow

* Make Billing module check for v2 schema

* Make ingestion aware of v2 schema

* Disable site transfers for when v2 is live

In a separate changeset we will implement simplified
site transfer for when v2 migration is complete.
The new transfer will only rename the site domain in postgres
and keep track of the original site prior to the transfer
so we keep an ingestion grace period until the customers
redeploy their scripting.

* Make Stats base queries aware of v2 schema switch

* Update breakdown with v2 conditionals

* Update pageview local start with v2 check

* Update current visitoris with v2 check

* Update stats controller with v2 checks

* Update external controller with v2 checks

* Update remaining tests with proper fixtures

* Rewrite redundant assignment

* Remove unused alias

* Mute credo, this is not the right time

* Add test_helper prompt

* Fetch priv dir so it works with a release

* Fetch distinct partitions only

* Don't limit inspect output for partitions

* Ensure SQL is printed to IO

* Remove redundant domain fixture
2023-03-27 13:52:42 +02:00
Adam 6637751a5e
Implement Numeric IDs migration (#2762)
* Implement Numeric IDs migration

* Fix typo

* Mute credo for now

* Improve configurability and add stop_t

* Adjust to Ch/Chto only

* Fix opts key for dictionary password

* Add regular ecto migration with numeric ids v2 schemas (#2768)

* Add regular ecto migration

* Fix typo

* Update priv/ingest_repo/migrations/20230320094327_create_v2_schemas.exs

Co-authored-by: Vini Brasil <vini@hey.com>

* Implement v2 events/sessions schema modules (#2777)

* Implement v2 events/sessions schema modules

* Clean up session schemas

---------

Co-authored-by: Vini Brasil <vini@hey.com>

* Update moduledocs

---------

Co-authored-by: Vini Brasil <vini@hey.com>
2023-03-23 09:47:41 +01:00