* Fix interpolation in data_migration.ex
* Speed up calculating acquisition_channel in clickhouse
The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.
Benchmarked via this query:
```sql
SELECT
channel,
count(),
countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```
Before this fix:
```
query_duration_ms: 57960
DiskReadElapsedMs: 374.712
RealTimeMs: 2891200.667
UserTimeMs: 2704024.783
SystemTimeMs: 1693.265
OSCPUWaitMs: 90.253
OSCPUVirtualTimeMs: 2705709.58
```
After this fix:
```
query_duration_ms: 4367
DiskReadElapsedMs: 454.356
RealTimeMs: 213892.207
UserTimeMs: 199363.485
SystemTimeMs: 1479.364
OSCPUWaitMs: 13.739
OSCPUVirtualTimeMs: 200837.37
```
Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.
* CREATE OR REPLACE table with SELECT
* Expose a few data migration functions, add quiet option to do_run
* Create functions and test acquisition channel logic in clickhouse
Tests were lifted from test/plausible_web/controllers/api/external_controller_test.exs
* Clean up test code a bit
* Property test for acquisition channels
* Handle empty strings properly in reference implementation
* Fix spelling, minor issues
* Revert "Property test for acquisition channels"
This reverts commit 3fa0e0e4eb.
* Only test clickhouse functions
* Solve minor code issue
* update channels logic
* Revert "Only test clickhouse functions"
This reverts commit e12784031a.
* Add more tests
* Add small result assertion
* Make query options explicit in data migrations
* Move multi-query running logic to within datamigration lib
* Unbreak numeric ids migration
* Named params directly to Clickhouse
* Update reference test implementation
---------
Co-authored-by: Uku Taht <uku.taht@gmail.com>
* Extend schemas with new fields and relationships for teams
* Implement listing sites and sites with invitations with teams
* Implement creating invitations with teams
* Implement accepting invites with teams
* Add `Teams.SiteTransfer` schema
* Implement creating ownership transfers
* Implement accepting site transfer between teams
* Make results shapes from `Teams.Memberships` role functions more consistent
* Remove :team relation from ApiKey schema
* Pass and provision team on subscription creation
* Pass and provision team on enterprise plan creation
* Implement creating site for a team
* Keep team in sync during legacy ownership transfer and invitations
* Resolve conflict in `Teams.get_or_create` without transaction
* Abstract `GracePeriod` manipulation behind `Plausible.Users`
* Put `User.start_trial` behind `Plausible.Users` API
* Sync team fields on user update, if team exists
* Sync cleaning invitations, updating and removing members
* Transfer invitations too
* Implement backfill script
* Allow separate pg repo for backfill script
* Rollback purposefully at the end
* Update backfill script with parallel processing
* Use `IS DISTINCT FROM` when comparing nullable fields
* Handle no teams to backfill case gracefully when reporting
* Parallelize guest memberships backfill
* Remove transaction wrapping and query timeouts
* Make team sync check more granular and fix formatting
* Wrap single team backfill in a transatction for consistent restarts
* Make invitation and site transfer backfills preserve invitation ID
* Update migration repo config for easier dev access
* Backfill teams for users with subscriptions without sites
* Log timestamps
* Put teams sync behind a compile-time flag
* Keep timestamps in sync and fix subscriptions backfill
* Fix formatting
* Make credo happy
* Don't `use Plausible.Migration` to avoid dialyzer complaining
None of the tooling from there is used anywhere and `@repo` can
be defined directly in the migration script.
* Drop SSL workarounds in the backfill script
---------
Co-authored-by: Adam Rutkowski <hq@mtod.org>
* WIP mutation to populate event session columns
* Remove duplication
* report errors, allow_nondeterministic_updates
* use right columns
* Update existing columns instead of session_* ones
* Make dialyzer happy
* Fix issue with passing pre-existing params in
* Logger -> IO.puts
* Use IngestRepo.config for connection settings
* Make dictionary options configurable
* Move allow_nondeterministic_mutations to within the migration
* Solve credo warning about too deep nesting
* Missed logger call
* Pattern matching in function head
* Add data migration for moving to VersionedCollapsingMergeTree
This has been tested locally and partially on staging. Still requires a bit of work to verify.
Verification query:
```
SELECT main._partition_id, tmp.count, main.count
FROM (
SELECT _partition_id, count() AS count
FROM sessions_v2_tmp_versioned
GROUP BY _partition_id
) AS tmp
FULL OUTER JOIN (
SELECT _partition_id, count() AS count
FROM sessions_v2
GROUP BY _partition_id
) AS main
ON (tmp._partition_id == main._partition_id)
ORDER BY main._partition_id
```
* Add an early exit to migration
* cluster? extract common code
* default to v2
* allow N defaults in data migration prompt and custom messages
* join domains lookup
* remove duplicate test runs from ci (both are v2)
* Remove ClickhouseSetup module
This has been an implicit point of contact to many
tests. From now on the goal is for each test to maintain
its own, isolated setup so that no accidental clashes
and implicit assumptions are relied upon.
* Implement v2 schema check
An environment variable V2_MIGRATION_DONE acts like
a feature flag, switching plausible from using old events/sessions
schemas to v2 schemas introduced by NumericIDs migration.
* Run both test suites sequentially
While the code for v1 and v2 schemas must be kept still,
we will from now on run tests against both code paths.
Secondary test run will set V2_MIGRATION_DONE=1 variable,
thus making all `Plausible.v2?()` checks return `true'.
* Remove unused function
This is a remnant from the short period when
we would check for existing events before allowing
creating a new site.
* Update test setups/factories with v2 migration check
* Make GateKeeper return site id along with :allow
* Make Billing module check for v2 schema
* Make ingestion aware of v2 schema
* Disable site transfers for when v2 is live
In a separate changeset we will implement simplified
site transfer for when v2 migration is complete.
The new transfer will only rename the site domain in postgres
and keep track of the original site prior to the transfer
so we keep an ingestion grace period until the customers
redeploy their scripting.
* Make Stats base queries aware of v2 schema switch
* Update breakdown with v2 conditionals
* Update pageview local start with v2 check
* Update current visitoris with v2 check
* Update stats controller with v2 checks
* Update external controller with v2 checks
* Update remaining tests with proper fixtures
* Rewrite redundant assignment
* Remove unused alias
* Mute credo, this is not the right time
* Add test_helper prompt
* Fetch priv dir so it works with a release
* Fetch distinct partitions only
* Don't limit inspect output for partitions
* Ensure SQL is printed to IO
* Remove redundant domain fixture