Web Analytics with Piwik: keeping control over your own data

Web analytics is one of the essential tools for a website and in addition to measuring web traffic and getting information about the number of visitors it can be also used as a tool to assess and improve the effectiveness of a website. The most common way to collect data is to use on-site web analytics that measure measuring visitors’ behaviour on your site with page tagging technology. This is how Google Analytics, which is widely used web analytics service, works. But what would you use if you wanted to keep control over your own data?

Look no further! Piwik is an open source web analytics application which aims to be the ultimate alternative to Google Analytics. Here’s a short overview to Piwik Analytics and how to get started with it.

”Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage.” – Wikipedia

Piwik Open Analytics Platform

Piwik is a web analytics application which tracks online visits to one or more websites and displays reports of these visits for analysis. In short, it aims to be the ultimate open source alternative to Google Analytics. The code is GPL v3 licensed and available in GitHub. On technical side, Piwik is written in PHP, it uses MySQL or MariaDB database and you can host it by yourself. And if you don’t want to setup or host Piwik, you can also get commercial services.

Piwik provides the usual features you would expect from a web analytics application. You get reports regarding the geographic location of visits, the source of visits, the technical capabilities of visitors, what the visitors did and the time of visits. Piwik also provides features for analysis of the data it accumulates, such as saving notes to data, goals for actions, transitions for seeing how visitors navigate, overlaying analytics data on top of a website and displaying how metrics change over time. The easiest way to see what it has to offer is to check the Piwik online demo.

Feature highlights

You might ask how Piwik differs from other web analytics applications such as Google Analytics? One principle advantage of using Piwik is that you are in control. You can host Piwik on your own server and the data is tracked inside your MySQL or preferably MariaDB database. You have full control over your data. On contrary, software as a service analytics applications, have full access to the collected user data. Data privacy is essential for public sector and enterprises who can’t or don’t want to share it, for example with Google. You are able to ensure that your visitors behaviour on your website is not shared with advertising companies.

Another interesting feature is that Piwik provides advanced privacy options: ability to anonymize IP addresses, purge the tracking (but not report)  data regularly, opt-out support and Do Not Track support. Your website visitors can decide if they want to be tracked.

You can also do scheduled reports which are sent by e-mail, import data from web server logs, use the API for accessing reports and administrative functions. Piwik also has mobile app to access the analytics data. Piwik is also customizable with plugins and you can integrate it with WordPress and other applications.

Piwik’s User Interface

Piwik has clean and simple user interface as seen in the following screenshots (taken from the online demo).

Setting up Piwik

Setting up Piwik is easy and there’s good documention available for running web analytics. All you need is a web server such as Nginx, PHP 7 and MariaDB. MariaDB has in some cases significantly improved query performance and reliability of Piwik over using MySQL. You can setup Piwik manually but the most easiest way to start with it is to use the provided Docker image and docker-compose. The docker-compose file setups four containers (MySQL, Piwik, Nginx and Cron) and with compose you can start them up. The Piwik image is available from Docker Hub.

The alternative is to do your own Docker image for Piwik and related services. In my opinion, it makes sense to have just two containers: one for Piwik related web stuff and other for MariaDB. The Piwik container runs Piwik, Nginx and Cron script with e.g. supervisor. The official image uses Debian (from PHP) but Piwik runs nicely also on Alpine Linux. One thing to tinker with when using Docker is to get MariaDB access to Piwik’s assets for LOAD DATA INFILE which will greatly speed Piwik’s archiving process.

Third choice is to spend some money, skip the technical setup and use cloud-hosted Piwik Pro.

If you’re setting up Piwik manually, you can watch a video of installation and after that a video of configuring settings. After you’re done with the 5 minute installation you get the JavaScript tag which is then included in the bottom of each page of your website. If you’re using React there’s a Piwik analytics component for React Router. Piwik will then record the activity across your website into your database.

And that’s about all there is to starting with Piwik. Simple setup with Docker or doing it manually, adding the JavaScript tag, configuring some options if needed and then just wait for the data from visitors.

Piwik: your data, your analytics

Piwik is a good and feature rich alternative for a web analytics application. Setting it up isn’t as straightforward as using some hosted service, such as Google Analytics, but that’s the way self-hosted services always are. If you need web analytics, want to keep control of your own data and don’t mind hosting it yourself and thus paying for the server, then Piwik is a good choice.

This article was originally published on Rule of Tech, author’s personal blog about technology and software development.