* Bump deps
* Bump stack
* Fix deprecation warnings
* Fix VCR cassettes mismatch due to OTP-18414
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Format & fix flaky tests
* Handle raw IPv4 hostnames; test public suffix TLD
* Configure locus db cache_dir
So that maxmind unavailability doesn't affect
application startup. PERSISTENT_CACHE_DIR env var is used
to point locus at the GeoIP DB file.
* WIP: Remove ExVCR
* Fix test env config
* Fixup exvcr
* Remove exvcr from deps
* Add convert script
* Remove exvcr cassettes
* Remove convert script
* Rename test
* Update moduledoc
* Update dockerfile
* Bump CI cache
* Tag more slow tests, why not?
* Use charlist for locus cache option
* Pin nodejs
* Merge google tests, make them async
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Pass cached site struct down the ingestion pipeline
Revenue goals need the cached site struct during ingestion to get the
goals name and currency. This cache lookup is not necessary as
`GateKeeper.check/1`, which is called first in the ingestion pipeline,
could already return the site struct from the cache.
This commit changes `GateKeeper.check/1` to return the site struct
instead of the site ID. Moreover, this commit changes the ingestion
pipeline to avoid calling the sites cache twice.
Related: https://github.com/plausible/analytics/pull/2957#discussion_r1203921549
* Remove revenue_goals unnecessary fallback
* Change duplicate child_id in cache test
* Remove revenue goal condition from cache query
* Remove Plausible.DataCase.reload/1
* Add revenue fields to ClickHouse events
This commit adds 4 fields to the ClickHouse events_v2 table:
* `revenue_source_amount` and `revenue_source_currency` store revenue in
the original currency sent during ingestion
* `revenue_reporting_amount` and `revenue_reporting_currency` store
revenue in a common currency to perform calculations, and this
currency is defined by the user when setting up the goal
The type of amount fields is `Nullable(Decimal64(3))`. That covers all
fiat currencies and allows us to store huge amounts. Even though
ClickHouse does not suggest using `Nullable`, this is a good use case,
because otherwise additional work would have to be done to
differentiate missing values from real zeroes.
I ran a benchmark with the data pattern we expect in production, where
we have more missing values than real decimals. I created 100 million
records where 90% of decimals are missing. The difference between the
tables in storage is just 0.4Mb.
* Add revenue parameter to Events API
This commit adds support for sending revenue data in ingestion using the
`revenue` parameter - aliased to `$`.
* Add revenue parameter to mix send_pageview
* Add average and total revenue to breakdown queries
* Remove ClickhouseSetup module
This has been an implicit point of contact to many
tests. From now on the goal is for each test to maintain
its own, isolated setup so that no accidental clashes
and implicit assumptions are relied upon.
* Implement v2 schema check
An environment variable V2_MIGRATION_DONE acts like
a feature flag, switching plausible from using old events/sessions
schemas to v2 schemas introduced by NumericIDs migration.
* Run both test suites sequentially
While the code for v1 and v2 schemas must be kept still,
we will from now on run tests against both code paths.
Secondary test run will set V2_MIGRATION_DONE=1 variable,
thus making all `Plausible.v2?()` checks return `true'.
* Remove unused function
This is a remnant from the short period when
we would check for existing events before allowing
creating a new site.
* Update test setups/factories with v2 migration check
* Make GateKeeper return site id along with :allow
* Make Billing module check for v2 schema
* Make ingestion aware of v2 schema
* Disable site transfers for when v2 is live
In a separate changeset we will implement simplified
site transfer for when v2 migration is complete.
The new transfer will only rename the site domain in postgres
and keep track of the original site prior to the transfer
so we keep an ingestion grace period until the customers
redeploy their scripting.
* Make Stats base queries aware of v2 schema switch
* Update breakdown with v2 conditionals
* Update pageview local start with v2 check
* Update current visitoris with v2 check
* Update stats controller with v2 checks
* Update external controller with v2 checks
* Update remaining tests with proper fixtures
* Rewrite redundant assignment
* Remove unused alias
* Mute credo, this is not the right time
* Add test_helper prompt
* Fetch priv dir so it works with a release
* Fetch distinct partitions only
* Don't limit inspect output for partitions
* Ensure SQL is printed to IO
* Remove redundant domain fixture
* Use user-agent instead of screen_width to get device type
Co-authored-by: eriknakata <erik.nakata5@gmail.com>
* Fix credo
* Log on unhandled UAInspector device type
* Make 'browser' the default tab in devices report
* Remove device tooltip
* Remove screen_width from ingestion completely
* Remove browserstack harness, run playwright directly
* Select meta key based on OS platform
* Run CI tests in parallel
* Improve device match readability
* Add changelog
---------
Co-authored-by: eriknakata <erik.nakata5@gmail.com>
* Clickhouse migration: add ingest_counters table
* Configure ingest counters per MIX_ENV
* Emit telemetry for ingest events with rich metadata
* Allow building Request.t() with fake now() - for testing purposes
* Use clickhousex branch where session_id is assigned to each connection
* Add helper function for getting site id via cache
* Add Ecto schema for `ingest_counters` table
* Implement metrics buffer
* Implement buffering handler for `Plausible.Ingestion.Event` telemetry
* Implement periodic metrics aggregation
* Update counters docs
* Add toStartOfMinute() to ordering key
* Reset the sync connection state in `after` clause
* Flush counters on app termination
* Use separate Repo with async settings enabled at config level
* Switch to clickhouse_settings repo root config key
* Add AsyncInsertRepo module
This PR replaces geolix with locus to simplify self-hosted setup. locus can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation. locus is also a bit faster.
This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.
* Ignore XX and T1 countries
* Add fallback if country_code=nil
* Lookup city overrides directly in CityOverrides module
* Changelog
* Add empty moduledoc
* Remove redundant comment
* Update Sites.Cache
So it's now capable of refreshing most recent sites.
Refreshing a single site is no longer wanted.
* Introduce Warmer.RecentlyUpdated
This is Sites Cache warmer that runs only for
most recently updated sites every 30s.
* Validate Request creation early
* Rename RateLimiter to GateKeeper and introduce detailed policies
* Update events API tests - a provisioned site is now required
* Update events ingestion tests
* Make limits visible in CRM Sites index
* Hard-deprecate DOMAIN_BLACKLIST
* Remove unnecessary clause
* Fix typo
* Explicitly delegate Warmer.All
* GateKeeper.allwoance => GateKeeper.check
* Instrument Sites.Cache measurments
* Update send_pageview task to output response headers
* Instrument ingestion pipeline
* Credo
* Make event telemetry test a sync case
* Simplify Request.uri/hostname handling
* Use embedded schema, apply action and rely on get_field
* Parse event URL in Plausible.Ingestion.Request
* Parse event domain in Plausible.Ingestion.Request
* Rework ingestion pipeline processing (#2462)
* Rework ingestion pipeline processing
So that Request can have multiple domains and
based on that each event is processed uniformly.
The build_and_buffer/1 function now returns an
accumulator with all the dropped/buffered events
for further inspection.
* Reduce function complexity
* Don't chain struct fields to check for an empty host
* Separate referrer and utm tags
* Fix up `with` clause, credo was right cc @vinibrsl
Co-authored-by: Adam Rutkowski <hq@mtod.org>
* Implement FF-driven DB lookup for sites during ingestion
We like to see the impact of doing a simple postgres lookup on each
ingestion event. The percentage-based feature flag `:ingestion_pg_lookup`
must be set in order for lookups to be executed.
* Fix resolving Cachex stats metrics
* Enable PromEx on dev env
This pull request improves the current OpenTelemetry implementation. Currently only 1% of the spans are sent, due to the high volume of ingestion requests to /api/event. I enabled the 1% sampling to /api/event only, recording 100% of the other traces.
* Replace Ingestion.Request headers with user_agent
* Replace generic Ingestion.Request params with specific fields
* Refactor event building function into small functions
* Move Plausible.Ingestion to Plausible.Ingestion.Event
* Add option to override event fields while building
* Rename Ingestion.Request meta to props
* Replace UTM-specific fields with generic query_params
* Remove Map.from_struct/1 call from ingestion pipeline
* Remove stash options from ingestion