analytics

Commit Graph

Author	SHA1	Message	Date
hq1	117eef000d	Upgrade Erlang/Elixir stack (#3454 ) * Bump deps * Bump stack * Fix deprecation warnings * Fix VCR cassettes mismatch due to OTP-18414 Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com> * Format & fix flaky tests * Handle raw IPv4 hostnames; test public suffix TLD * Configure locus db cache_dir So that maxmind unavailability doesn't affect application startup. PERSISTENT_CACHE_DIR env var is used to point locus at the GeoIP DB file. * WIP: Remove ExVCR * Fix test env config * Fixup exvcr * Remove exvcr from deps * Add convert script * Remove exvcr cassettes * Remove convert script * Rename test * Update moduledoc * Update dockerfile * Bump CI cache * Tag more slow tests, why not? * Use charlist for locus cache option * Pin nodejs * Merge google tests, make them async --------- Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>	2023-10-24 10:33:48 +02:00
Uku Taht	3f9ca35d58	Actually ignore unkown device type (#3267 )	2023-08-16 09:40:21 +02:00
Uku Taht	a6bf951852	Ignore unkown device type (#3266 )	2023-08-15 13:37:27 +02:00
Vini Brasil	8834486a19	Pass cached site struct down the ingestion pipeline (#3027 ) * Pass cached site struct down the ingestion pipeline Revenue goals need the cached site struct during ingestion to get the goals name and currency. This cache lookup is not necessary as `GateKeeper.check/1`, which is called first in the ingestion pipeline, could already return the site struct from the cache. This commit changes `GateKeeper.check/1` to return the site struct instead of the site ID. Moreover, this commit changes the ingestion pipeline to avoid calling the sites cache twice. Related: https://github.com/plausible/analytics/pull/2957#discussion_r1203921549 * Remove revenue_goals unnecessary fallback * Change duplicate child_id in cache test * Remove revenue goal condition from cache query * Remove Plausible.DataCase.reload/1	2023-06-14 14:39:06 +01:00
Vini Brasil	e4d4f7d954	Revenue tracking: Ingestion and breakdown queries (#2957 ) * Add revenue fields to ClickHouse events This commit adds 4 fields to the ClickHouse events_v2 table: * `revenue_source_amount` and `revenue_source_currency` store revenue in the original currency sent during ingestion * `revenue_reporting_amount` and `revenue_reporting_currency` store revenue in a common currency to perform calculations, and this currency is defined by the user when setting up the goal The type of amount fields is `Nullable(Decimal64(3))`. That covers all fiat currencies and allows us to store huge amounts. Even though ClickHouse does not suggest using `Nullable`, this is a good use case, because otherwise additional work would have to be done to differentiate missing values from real zeroes. I ran a benchmark with the data pattern we expect in production, where we have more missing values than real decimals. I created 100 million records where 90% of decimals are missing. The difference between the tables in storage is just 0.4Mb. * Add revenue parameter to Events API This commit adds support for sending revenue data in ingestion using the `revenue` parameter - aliased to `$`. * Add revenue parameter to mix send_pageview * Add average and total revenue to breakdown queries	2023-06-12 18:29:17 +01:00
hq1	71ef0bd043	Clean up after V2 migration (#2868 ) * Clean up after V2 migration This PR removes all the leftovers and alternative code branching after v2 migration. The self-hosted release is being drafted at: https://github.com/plausible/hosting/issues/68 Refs: - https://github.com/plausible/analytics/pull/2865 - https://github.com/plausible/analytics/pull/2825 - https://github.com/plausible/analytics/pull/2780 * !fixup	2023-04-24 12:17:57 +02:00
hq1	d2f2c69387	Conditionally support switching between v1 and v2 clickhouse schemas (#2780 ) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture	2023-03-27 13:52:42 +02:00
Uku Taht	43bf7dd09f	Use user-agent instead of screen_width to get device type (#2711 ) * Use user-agent instead of screen_width to get device type Co-authored-by: eriknakata <erik.nakata5@gmail.com> * Fix credo * Log on unhandled UAInspector device type * Make 'browser' the default tab in devices report * Remove device tooltip * Remove screen_width from ingestion completely * Remove browserstack harness, run playwright directly * Select meta key based on OS platform * Run CI tests in parallel * Improve device match readability * Add changelog --------- Co-authored-by: eriknakata <erik.nakata5@gmail.com>	2023-03-02 11:04:01 +01:00
Adam Rutkowski	867dad6da7	Implement ingest counters (#2693 ) * Clickhouse migration: add ingest_counters table * Configure ingest counters per MIX_ENV * Emit telemetry for ingest events with rich metadata * Allow building Request.t() with fake now() - for testing purposes * Use clickhousex branch where session_id is assigned to each connection * Add helper function for getting site id via cache * Add Ecto schema for `ingest_counters` table * Implement metrics buffer * Implement buffering handler for `Plausible.Ingestion.Event` telemetry * Implement periodic metrics aggregation * Update counters docs * Add toStartOfMinute() to ordering key * Reset the sync connection state in `after` clause * Flush counters on app termination * Use separate Repo with async settings enabled at config level * Switch to clickhouse_settings repo root config key * Add AsyncInsertRepo module	2023-02-23 14:34:24 +01:00
ruslandoga	166748dcf2	Replace Geolix with Locus (#2362 ) This PR replaces geolix with locus to simplify self-hosted setup. locus can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation. locus is also a bit faster. This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.	2023-01-17 12:05:09 -03:00
Uku Taht	1785653b1e	Ignore unknown countries (#2556 ) * Ignore XX and T1 countries * Add fallback if country_code=nil * Lookup city overrides directly in CityOverrides module * Changelog * Add empty moduledoc * Remove redundant comment	2023-01-03 10:35:23 -03:00
Adam Rutkowski	356575ef78	Gatekeep ingestion pipeline (#2472 ) * Update Sites.Cache So it's now capable of refreshing most recent sites. Refreshing a single site is no longer wanted. * Introduce Warmer.RecentlyUpdated This is Sites Cache warmer that runs only for most recently updated sites every 30s. * Validate Request creation early * Rename RateLimiter to GateKeeper and introduce detailed policies * Update events API tests - a provisioned site is now required * Update events ingestion tests * Make limits visible in CRM Sites index * Hard-deprecate DOMAIN_BLACKLIST * Remove unnecessary clause * Fix typo * Explicitly delegate Warmer.All * GateKeeper.allwoance => GateKeeper.check * Instrument Sites.Cache measurments * Update send_pageview task to output response headers * Instrument ingestion pipeline * Credo * Make event telemetry test a sync case * Simplify Request.uri/hostname handling * Use embedded schema, apply action and rely on get_field	2022-11-28 15:50:55 +01:00
Vini Brasil	994e7d09de	Parse event URL and domain in Plausible.Ingestion.Request (#2351 ) * Parse event URL in Plausible.Ingestion.Request * Parse event domain in Plausible.Ingestion.Request * Rework ingestion pipeline processing (#2462) * Rework ingestion pipeline processing So that Request can have multiple domains and based on that each event is processed uniformly. The build_and_buffer/1 function now returns an accumulator with all the dropped/buffered events for further inspection. * Reduce function complexity * Don't chain struct fields to check for an empty host * Separate referrer and utm tags * Fix up `with` clause, credo was right cc @vinibrsl Co-authored-by: Adam Rutkowski <hq@mtod.org>	2022-11-23 14:05:44 +01:00
Adam Rutkowski	9364cebb4b	Fix up Cachex metrics (#2418 ) * Fix cachex size metrics * Collect Cachex hit ratio for UAs/Sessions * Revert profiling metrics from `101e5a68`	2022-11-03 12:28:02 +01:00
Adam Rutkowski	101e5a68b5	Allow Site DB lookups during ingestion phase (#2408 ) * Implement FF-driven DB lookup for sites during ingestion We like to see the impact of doing a simple postgres lookup on each ingestion event. The percentage-based feature flag `:ingestion_pg_lookup` must be set in order for lookups to be executed. * Fix resolving Cachex stats metrics * Enable PromEx on dev env	2022-11-01 17:11:50 +02:00
Vinicius Brasil	9220d0034d	OpenTelemetry (OTEL) Implementation (#2317 ) This pull request improves the current OpenTelemetry implementation. Currently only 1% of the spans are sent, due to the high volume of ingestion requests to /api/event. I enabled the 1% sampling to /api/event only, recording 100% of the other traces.	2022-10-18 12:11:30 -03:00
Vinicius Brasil	a10d44a0d7	Refactor event struct creation function (#2098 ) * Replace Ingestion.Request headers with user_agent * Replace generic Ingestion.Request params with specific fields * Refactor event building function into small functions * Move Plausible.Ingestion to Plausible.Ingestion.Event * Add option to override event fields while building * Rename Ingestion.Request meta to props * Replace UTM-specific fields with generic query_params * Remove Map.from_struct/1 call from ingestion pipeline * Remove stash options from ingestion	2022-08-16 14:43:10 +03:00

17 Commits