* add module name to service_error when check times out
Otherwise, it can sometimes remain unclear in the diagnostics, whether
it was InstallationV2 or InstallationV2CacheBust that timed out.
* Remove duplicate timeout logic
The current production logs show two types of verification timeouts:
* service_error: "Unhandled Browserless response status: 408" (vast
majority of cases)
* service_error: :timeout (only a few cases)
The latter happens when we hit the Req receive_timeout
(endpoint_timeout + 2s). I've seen Browserless not respect the timeout
param from time to time, so it's better to keep the timeout logic
"in-house" only.
* make service_error into a map with code and extra
* interpret temporary service errors
...but still consider them "unhandled" for telemetry, also notifying Sentry
and logging the warning.
* separate sentry messages (verification)
* make Verification.ChecksTest more DRY
* organize tests into describe blocks
* test verification telemetry and logging
* fix codespell
* get rid of legacy verification
* rename Checks.InstallationV2 -> Checks.VerifyInstallation
* delete Live.Installation and rename Live.InstallationV2 -> Live.Installation
* rename installationv2 (live) files as well
* delete old change-domain routes
Also rename current liveview modules and routes, removing the v2 suffix
* rename domain_change_v2 files, removing v2 suffix
* remove legacy JS verifier code
Also fix dockerignore and elixir.yml referencing a wrong priv path
* rename verification_v2_test -> verification_test
* remove v2 prefix from logs and sentry messages
* clean up duplicate external_sites_controller_test.exs tests
* remove flag
* fix typespec
* pass timeout as query param to Browserless too
* Fixup external sites controller test module (#5826)
* fix test description
* clean up detection sentry events + tests
* improve naming
---------
Co-authored-by: Artur Pata <artur.pata@gmail.com>
* add module name to service_error when check times out
Otherwise, it can sometimes remain unclear in the diagnostics, whether
it was InstallationV2 or InstallationV2CacheBust that timed out.
* Remove duplicate timeout logic
The current production logs show two types of verification timeouts:
* service_error: "Unhandled Browserless response status: 408" (vast
majority of cases)
* service_error: :timeout (only a few cases)
The latter happens when we hit the Req receive_timeout
(endpoint_timeout + 2s). I've seen Browserless not respect the timeout
param from time to time, so it's better to keep the timeout logic
"in-house" only.
* make service_error into a map with code and extra
* interpret temporary service errors
...but still consider them "unhandled" for telemetry, also notifying Sentry
and logging the warning.
* separate sentry messages (verification)
* make Verification.ChecksTest more DRY
* organize tests into describe blocks
* test verification telemetry and logging
* fix codespell
* get rid of legacy verification
* rename Checks.InstallationV2 -> Checks.VerifyInstallation
* delete Live.Installation and rename Live.InstallationV2 -> Live.Installation
* rename installationv2 (live) files as well
* delete old change-domain routes
Also rename current liveview modules and routes, removing the v2 suffix
* rename domain_change_v2 files, removing v2 suffix
* remove legacy JS verifier code
Also fix dockerignore and elixir.yml referencing a wrong priv path
* rename verification_v2_test -> verification_test
* remove v2 prefix from logs and sentry messages
* clean up duplicate external_sites_controller_test.exs tests
* remove flag
* fix typespec
* pass timeout as query param to Browserless too
* Fixup external sites controller test module (#5826)
* fix test description
---------
Co-authored-by: Artur Pata <artur.pata@gmail.com>
* Ensure `conn` from `Plug.Conn.read_body` is always passed down the pipeline
* Alter persistor related histogram metrics for better view of timings
* Update typespec
* Implement conversion of finch telemetry events to persistor specific ones
* Implement metrics and remove unused telemetry
* Adjust buckets
* Adjust buckets again and use milliseconds for unit uniformly
* detection handled/unhandled telemetry
* telemetry for verification too
* move sentry call next to telemetry event
* fix ce compile warning
* fix case clause
* remove implicit nil
* telemetry_event functions without argument
* move lib/plausible/installation_support/ -> /extra/lib/...
* extract Live.AwaitingPageviews exclusively for CE
* VerificationTest to ee only
* fix the rest of the compile/test errors on CE
* fix warning about not using default for optional argument
* move module attr
* Leverage TrackerScriptCache on ee
On ee, TrackerScriptCache only stores valid ids. This is then leveraged
to do no database queries when looking up tracker scripts for
non-existing ids.
For smoother onboarding purposes, refresh frequency for the script is also
reduced.
Note that the cache layout is not optimal (storing 'true' booleans) but
being more optimal would require changing the underlying cache
implementation significantly.
I tested out the cache - with 1M tracker script configs, it seems to be
~12MB in size.
* Wait on cache
* Add telemetry
* Remove cleverness in trying to reuse code
* new verifier script with tests + telemetry
* dataDomainMismatch tests
* more tests for callbackStatus and plausibleInstalled
* create priv/verifier subfolder + fix Elixir CI
* bump CI cache version
* organize verifier tests
* Remove accidentally committed verifier
* Rework compilation: Make it a variant, always return new verifier code in tests
* Make priv/tracker/verifier/ exist
* Handle static checks with grace
* Fix paths
* Fix paths
* Add some tests
* Add one more test
* split up the JS
* proxyLikely + code structure refactor + unit tests
* fix telemetry fields
* move most telemetry to logs
* run verifier tests only on chromium
* detect wordpressPlugin and wordpressLikely
* detect GTM
* rename JS checks
* detect cookiebot
* include new fields in logs
* different logs for browserless request vs js failures
* detect manual extension
* detect unknown attrs + fix logging
* stick to Elixir checks for snippet detection
* fix codespell
* fix IO.inspect
* remove unnecessary fields from test mock
* cookiebot doc
* move test into verifier subfolder
* do not duplicate ts types
* comma -> semicolon in log
* test dynamically loaded snippet
* improve logging on Browserless error
---------
Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
* Revert "Temporarily disable ingest metrics (#5369)"
This reverts commit b96e96a7f6.
* Add :tools to MIX_ENV=dev
* Stop tracking caches hit ratio in favour of raw counters
* Create a regression demonstration test for race condition
* Use `ConCache.isolated/1` to force sequential processing of session events
* Revise comment in regression test
* Put lock call behind cache adapter API
* Add more explicit handling of failing lock
NOTE: Apparent double execution of lock function needs to be investigated.
* Improve slow lock cases tests
* Reduce number of session cache locks and instrument them w/ telemetry
* Format
---------
Co-authored-by: Adam Rutkowski <hq@mtod.org>
* Reapply "Replace caching engine (#3878)" (#3883)
This reverts commit c5881cdc6d.
* Ensure hit rate is tracked on `get_or_store`
* Remove :wx and :observer
* Remove unused deps
* Use `:set` table type
* Dependencies: swap Cachex for ConCache
* Implement Cache adapter wrapping ConCache
* Implement cache stats tracker, for metrics
* Use Cache.Adapter in Plausible.Cache
Marking the test as not slow anymore
* Use Cache Adapter when tracking sessions
* Use Cache Adapter for UA parsing
* Rename child identifiers - cachex is obsolete now
* Test stats tracking
* Update grafana metrics
* Put all caches under common child specification
* Try less
* Shorten the function delegation path
* Update Sites.Cache
So it's now capable of refreshing most recent sites.
Refreshing a single site is no longer wanted.
* Introduce Warmer.RecentlyUpdated
This is Sites Cache warmer that runs only for
most recently updated sites every 30s.
* Validate Request creation early
* Rename RateLimiter to GateKeeper and introduce detailed policies
* Update events API tests - a provisioned site is now required
* Update events ingestion tests
* Make limits visible in CRM Sites index
* Hard-deprecate DOMAIN_BLACKLIST
* Remove unnecessary clause
* Fix typo
* Explicitly delegate Warmer.All
* GateKeeper.allwoance => GateKeeper.check
* Instrument Sites.Cache measurments
* Update send_pageview task to output response headers
* Instrument ingestion pipeline
* Credo
* Make event telemetry test a sync case
* Simplify Request.uri/hostname handling
* Use embedded schema, apply action and rely on get_field
* Implement FF-driven DB lookup for sites during ingestion
We like to see the impact of doing a simple postgres lookup on each
ingestion event. The percentage-based feature flag `:ingestion_pg_lookup`
must be set in order for lookups to be executed.
* Fix resolving Cachex stats metrics
* Enable PromEx on dev env
* Add Custom telemetry for Plausible.Event.WriteBuffer, Plausible.Event.WriteBuffer and Cachex
Signed-off-by: Manu S Ajith <neo@codingarena.in>
* Rename telemetry.ex to avoid confusion with Phoenix Telemetry supervisor
Signed-off-by: Manu S Ajith <neo@codingarena.in>
* Remove duplicate event
Signed-off-by: Manu S Ajith <neo@codingarena.in>
Signed-off-by: Manu S Ajith <neo@codingarena.in>