More than pointing fingers at the bad guys, there is a monumental shift happening where people are trying to deGoogle their lives. And while there is nothing wrong with Google as a service company, it does track a vast array of personal identification metrics about its users and uses those metrics to present targeted ads. It’s nothing new.
And because many privacy-first web browsers now strictly block Google Analytics scripts, but also AdSense and other similar trackers – webmasters are left with no choice but to make a switch because staying with Google means the tracked data is not 100% accurate.
One way to solve this issue is by using privacy-friendly analytics platforms. And more specifically, platforms that are open-source. Working with an open-source project means you can also host it yourself. And, in the long run, tailor your platform to meet the requirements of regulations such as GDPR and others.
Google uses analytics data to show targeted ads
In other words, Google is tracking anything and everything about you. As long as a website has Google Analytics installed – Google can collect and build an extensive profile of who you are, what you like, and what Google thinks you’ll be interested in.
I myself have used GA many times over in the last 15 years.
And here are the reasons why I won’t be doing it anymore:
- The total size of the tracking script is around ~45kb.
- Excessive tracking of details about site users that in 99% of the cases are not needed.
- You have to use an extensive Privacy Policy to meet the requirements set by GDPR.
- A number of web browsers (Firefox, Brave) directly block Google’s tracking scripts.
- Inconsistent analytics data because of blocked requests.
And, lastly, it is painfully slow. If you want to narrow down a very specific case where someone visited your site, the GA dashboard does not make it easy to find that data. Not because it isn’t there, but because the interface is extremely bloated.
Google Analytics legal problems
In recent news, the Austrian Data Protection Authority has ruled against Google Analytics and its compliance with the GDPR ruleset.
To translate, it is illegal for Google to be storing EU-based user data on US-based servers. At the moment, this ruling is specific to Austria, but there is a possibility that more countries will follow suit.
And there are only two ways to circumvent a ruling like this.
It is by using either a self-hosted solution in the EU or working with a cloud-based analytics provider who is both rerouting and encrypting EU traffic.
The second way is user consent – cookie and tracking notice. Such notice should provide extensive details about the information that you’re tracking about the user. And, the user has to confirm by accepting that you will track or share their data.
Here’s some additional reading on the topic:
- Austrian DSB: EU-US data transfers to Google Analytics illegal
- Schrems II a summary – all you need to know
- CJEU rules US cloud servers don’t comply with GDPR
In a new announcement, Google has stated that it will be retiring its Universal Analytics and moving forward with Google Analytics 4 or GA4 for short. It looks like one of the features introduced in GA4 is going to be the anonymization of IP addresses. Though, as it is pointed out in this Hacker News thread - the wording cleverly hides the fact that these changes won't help you get around GDPR.
Can you track website analytics without GDPR notice?
The answer is a resounding yes.
As long as you’re not collecting any personal data about the users visiting your site, you can actually completely avoid the need to add a “cookie notice” to your website or blog.
Is it really that important to know if the same person visited your website twice? Cookieless tracking means you aren’t building a portfolio on your readers. But, at least you don’t have to add interfering widgets or complex privacy policies to explain why you track users’ data.
And this is especially important if you go the self-hosted route. In that case, you won’t actually be storing any user data anywhere yourself, and cannot be held liable.
Does Open-Source mean it is free?
I think this question will inevitably come up, so I will answer it now.
While Open-Source does mean that the codebase is “free” to use, it doesn’t always translate into that in practical terms. As an example, self-hosting and managing an open-source analytics solution can be quite tedious and time-consuming.
You have to:
- Pay for the server costs, including managing the server. You can use any of the hosting platforms in this article to host them for free, at least for the time being.
- Upkeep the project to the latest version.
- Have some understanding of basic security precautions.
So, in a lot of cases, it’s easier to pay a small subscription fee. I think if the project author(s) are doing exceptional work to provide a privacy-first tracking solution, they deserve a little reward for keeping the project alive. But that’s just my opinion.
Plausible
Plausible is probably the fastest-growing solution in this entire list. Their business model revolves around providing GDPR compliant analytics solutions that are easy to use. And, their pricing reflects the market for small to medium-scale businesses. If you don’t want to pay, it is possible to deploy Plausible as a self-hosted solution using their open-sourced codebase.
It is released under the AGPLv3 license.
The back-end is built in Elixir, while on the front-end you’ll be using a lightweight (~1kb) script written in traditional vanilla JavaScript.
As for the User Interface, it is as simple as it is pleasant to look at. You can actually have a real-time overview of their UI by visiting the Plausible Analytics page. It shows actual analytics data for everyone visiting Plausible’s website.
You’ll find data about top pages, referring sources, and bounce rates, but also have the ability to play with filters and create custom analytical reports.
Privacy-first mindset
Plausible’s entire infrastructure resides in the EU. Which is also beside the point. As a privacy-first company, Plausible is able to track website usage without needing to use cookies or other personal identification metrics. It is completely anonymous.
But my favorite thing has to be their approach to building a brand. Not only is their entire motto to protect users’ privacy, but they have also done a lot of work in terms of providing useful information. And I sincerely mean that.
Blog posts, FAQ pages, technical articles – it is very rare to see an indie company take such huge leaps in informing their customers and supporters. Even if this specific solution isn’t what you’re after, you have to give the guys some credit for the dedication!
Matomo
Matomo – formerly Piwik – is the heavy-hitter in this entire list. It is what one might call a complete analytics solution. And, the closest thing to competing with GA while remaining privacy-friendly. You have the option to use Matomo through their cloud platform, or use their stable release and go the self-hosted route.
With the back-end being built on PHP and MySQL – it is quite easy to integrate Matomo in traditional setups, including WordPress. On top of that, Matomo is able to collect and report on some really interesting stats and use-cases. For example:
- Check which search engines send you the most traffic, but also for which specific keywords.
- Create detailed User Flow graphs with unlimited steps.
- Track product purchases and general evolution.
As for staying GDPR compliant, Matomo provides all the necessary tools to ensure complete transparency. You can anonymize practically all data points about users coming to your site.
You can use a custom-built GDPR Manager to ensure that you’re always in compliance with certain rules and regulations. This is mostly relevant based on the features that you decide to use, and what kind of data you’d like to analyze.
PostHog
PostHog specializes in product analytics and specifically aims to solve the problem of needing to send event data to 3rd-parties. In other words, it is a complete in-house solution for understanding how customers are using your websites or apps.
You have the option to host PostHog yourself, thus giving you complete control over how you store and process data. They have also done a lot of work in making PostHog easy to deploy in certain environments. Whether you work with Ruby, React, or Android – most of these use cases are covered with ready-to-go libraries.
Product-first approach
The interesting thing about PostHog is that it provides a lot of tools to understand app usage. In fact, with a robust open-source community and strong capital backing on its side, PostHog is able to stand on its own feet against giants like Google, Mixpanel, Heap, and others.
Want to roll out new product features over time? PostHog provides a comprehensive Feature Flags solution to ensure you can roll back any time. Additionally, Feature Flags can be used to roll out new features to a specific subset of users.
Pirsch
Pirsch is a lightweight and cookieless analytics solution you can plug directly into your website or application back-end. The project is written in Go and uses a fingerprinting technique to ensure the anonymity of incoming user data.
Pirsch generates a unique fingerprint for each visitor. The fingerprint is a hash of the visitors IP, User-Agent, the date, and a salt. The date guarantees that the data is separated by day, so visitors can only be tracked for up to one day.
The best part, of course, is that Pirsch can track visitors to your platform even if they have an Ad Blocker installed. Some reports say that brands can lose up to 30% of their total analytics accuracy because of ad blockers or other script-blocking browser extensions.
As for the data reported, Pirsch has a variety of statistics you can track. Most notably, custom conversion goals, events, and keywords that were used to land on your pages from Google Search. Their subscription model includes APIs, SDKs, and data export functionality.
Umami
Umami is a strictly self-hosted analytics library that you can deploy in a matter of seconds.
The requirements are quite basic: MySQL or PostgreSQL for the database and Node.js for the backend. It can be easily deployed on a cheap DigitalOceal droplet or a similar VPS provider.
The author did a post-mortem on building the project. He actually mentions quite a few of the tools we have already seen but argues that they’re mostly paid tools. Whereas Umami is not, and has to be self-hosted always. Unsurprisingly, this approach has made Umami one of the hottest GitHub libraries in the open-source analytics realm.
One interesting feature of Umami is Profiles. You can create custom accounts and pass them on to your friends or clients, letting them use this solution on their own website. Each profile has an entirely separate dashboard and reporting.
GoatCounter
GoatCounter delivers seamless tracking solutions for medium-sized businesses.
Martin Tournoij – the author – is keen on making it easy to monitor website analytics while remaining privacy-friendly. This is done by eliminating any potential tracking identifiers and letting you choose what you wish to track.
While the design might seem a little primitive compared to modern front-end possibilities, the technical side of GoatCounter is quite diverse. It’s also free for non-commercial projects, with a subscription model for more demanding websites.
Deploying the library yourself is of course free. WordPress and Gatsby users can enjoy pre-made integrations, so you can start tracking right off the bat.
Ackee
Ackee is a Node.js analytics script that you can deploy instantly. You can comfortably use Docker or deploy it with cloud platforms like Heroku, Vercel, or Netlify.
It is fully integrated with GraphQL API – the actual analytics dashboard features UI elements as provided by the API. This in turn lets you build custom queries and parameters to collect data specific to your project needs.
You can turn on Detailed mode which will track slightly more data, but Ackee recommends that you only enable Detailed in tandem with a privacy notice. Data such as browser type, and system os fall under the category of personal data.
Counter
Counter is a relatively new addition to the privacy-friendly analytics space. Built by a team of 3, Counter utilizes a pay what you want approach to attract users to its service. At the time of writing this, they’re also looking for an investor – with a keen interest to push Counter further with development and maintenance.
The tech behind Counter is Go (Server, Static files) and Redis for data storage.
All in all, the project has some traction on GitHub, so it will be interesting to see if the authors can find success with it. I think there is plenty of room for genuine privacy-based projects that respect the user and their data.
Fugu
Fugu is the 2nd analytics solution for products on this list. Though, admittedly, it is a lot more simple. It is the perfect fit for developers and creators who want a simplified overview of how their app is being used in real-time. In their own words,
Fugu has an event-based tracking system. Every time you want to track an event, you call the Fugu API from your app. In your request, you need to provide an event name and can optionally provide event properties.
It is free to host yourself, and their SaaS pricing is a modest 9$/month.
Koko
If you are a WordPress user, you have the option to use the Koko Analytics plugin. It is both open-source and non-invasive for privacy. The stats themselves are rather simple: top pages and top referrers.
In the settings, you can choose between using cookies to track visitors or not using cookies. The latter choice means that the script won’t be able to detect return visitors. Which isn’t really a dealbreaker, anyway.
Koko is also performance-optimized, so short or big bursts of sudden traffic won’t stall your site. Above all, it’s a simple solution for WordPress sites.
Closing statement
I have to admit, in the process of writing and editing this article – I learned a lot. I can quite comfortably say that GDPR (and other regulations) is an extremely complex topic. In my research, I spoke with several developers who work in this space, including some who work on the projects mentioned here, and even they admit that it is easy to get side-tracked.
As for the general consensus – it is quite simple. If you have no intention of doing foul play, you can comfortably track website analytics without collecting any personal data. But if you do happen to collect some personal data, a carefully tailored privacy notice will suffice.
More importantly, these open-source projects provide a means to never share your user data with 3rd parties. If there is one way we can all contribute to making the web a better place, then it is through transparent non-invasive analytics.