Show HN: I made a privacy-first minimalist Google Analytics

AdriaanvRossum | 968 points

Creator here. As a developer, I install analytics for clients, but I never feel comfortable installing Google Analytics because Google creates profiles for their visitors, and uses their information for apps (like AdWords). As we all know, big corporations unnecessarily track users without their consent. I want to change that.

So I built Simple Analytics. To ensure that it's fast, secure, and stable, I built it entirely using languages that I'm very familiar with. The backend is plain Node.js without any framework, the database is PostgreSQL, and the frontend is written in plain JavaScript.

I learned a lot while coding, like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request. The script is publicly available (https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...). It works out of the box with modern frontend frameworks by overwriting the "history.pushState"-function.

I am transparent about what I collect (https://simpleanalytics.io/what-we-collect) so please let me know if you have any questions. My analytics tool is just the start for what I want to achieve in the non-tracking movement.

We can be more valuable without exploiting user data.

AdriaanvRossum | 6 years ago

At my work (The New York Public Library), we created a “Google Analytics Proxy” that receives requests and then proxies them to Google’s Measurement Protocol so you still get the benefit of using Google Analytics but can control exactly what’s sent/saved in real-time.

It’s intended as a mostly drop-in replacement for the GA analytics.js API and to be used as an AWS Lambda.

You can check it out here: https://github.com/NYPL/google-analytics-proxy

phprecovery | 6 years ago

I've moved away from using any kind of script embedded in my webpages for tracking and instead just use Goaccess (https://goaccess.io/) to analyze my logs. Though there are obvious caveats with this, you need to install it, configure the server logging to match it and so on. But personally the benefits outweighs the cons, it all runs on the server, you are the sole owner off all the data and this tracking doesn't require any kind of JS on the webpage.

sondr3 | 6 years ago

First: really slick site. I'm not so into the video which takes a while to get to the point, but the site makes it really easy to understand the point of your product (and that's something a lot of sites lack).

I do have some questions/comments and I apologize if they seem a bit rapid-fire.

* When I look at the "Top Pages", there are links. When I click the link, it brings me to that page on your site not a chart of hits for that page. Is that how it's meant to work?

* If I sign up for your service, do my stats become public? https://simpleanalytics.io/apple.com just says "This domain does not have any data yet" (presumably because Apple doesn't have your script installed). But that kinda indicates that any domain with your script installed would show up there. It might just be an error in the messaging, but probably something to fix.

* What's your backend like? I'm mostly curious because analytics at scale isn't an easy problem. Do you write to a log-structured system with high availability (like Kafka) and then process asynchronously? How do you handle making the chart of visitors? Do you roll up the stats periodically?

* Speaking of scale, if I started sending thousands or tens of thousands of requests per second at you, would that be bad? Is this more targeted at small sites?

* What do you do about bots? Bot traffic can be a large source of traffic that throws off numbers.

* How long before numbers are available? It's September 19th, but the last stats on the live demo are September 18th. Is it lagged by a day?

* Do you not want to track user-agents for privacy reasons as well? Seems like a UA doesn't really identify anyone, but it can be useful for determining if you want to support a browser.

* You're not counting anyone that has the "Do Not Track" header. To me, DNT is more about tracking than counting (which is different). Even if you counted my hit, it wouldn't be tracking me if you didn't record information like IP address and there were no cookies.

Kudos for launching something. I think my biggest suggestions would be fixing the live-demo page so it doesn't look like it's leaking other site's data and providing some guidance about limits. It's easy to think that you don't want to put limits on people, but any architecture is made with a certain scale in mind. There's no shame in that. Sometimes what you want is a "let us know if you need more than X" message. At the very least, it lets you prepare. People sometimes use products in ways you wouldn't imagine and ways you didn't intend which the system doesn't handle gracefully.

Good luck with your product!

mdasen | 6 years ago

To everyone complaining about the price point for this service.

You are part of The Problem.

This is a solo dev's venture, that has a relatively pure and straightforward goal. If you can't afford it, don't use it and pick one of the others.

Do NOT compare this with a B2C offering that has nothing to do with analytics.

Do NOT compare this with a B2B offering that's free and feeds your user's data into the parent corporation's advertising revenue stream.

Do NOT compare this with a B2B offering that is open-source, with a team of a dozen core contributors that has had a decade of development under its belt.

nickdandakis | 6 years ago

This is a great idea and I love the design.

It looks like anyone can see the stats for any domain using the service without any authentication. I added the tracking code to my domain and was able to hit https://simpleanalytics.io/[mydomain.co.uk] without signing up or logging in. I was also able to see the stats for your personal site.

Is that intentional? If it is, it seems like an odd choice for a privacy-first service. If not, it seems like quite a worrying oversight in a paid-for product.

whylo | 6 years ago

Please give a comparison to Matomo¹ (formerly Piwik), the current obvious choice for doing this.

1. https://matomo.org/

teddyh | 6 years ago

I think there are a lot of misconceptions about how Google Analytics tracking works. I'm pretty sure a vanilla GA setup does not, in fact, create profiles that track you across the web. For one thing, all the cookies it creates are first-party (on your domain).

I still get objecting to Google products on principle, but their privacy policy for GA seems pretty reasonable to me: https://support.google.com/analytics/answer/6004245

eli | 6 years ago

I assume that it's more of a feeler/prototype than a real product, but even then it is really basic and through that it's ultimately useless.

A Summary page should show traffic volume, who exactly is driving it and where it arrives. That's the bare minimum needed to make shown information actually _useful_ and _actionable_. Things like "Top Domain Referrers" and "Top Pages" are aggregate vanity metrics, their effective utility is zero. If you have a spike in traffic, you want to know the reason and with your current design you can't.

huhtenberg | 6 years ago

I am using fathom [1] for this. They allow hosting the backend yourself and your analytics are not publicly accessible. Biggest con is that each installation can only track one domain as of now.

[1]: https://usefathom.com

ciex | 6 years ago

I would consider a self-hosted option to be the privacy-first approach.

markstos | 6 years ago

Are there any plans to support SRI? It's a pretty big security risk to incorporate 3rd party JS onto all pages - if someone compromises your CDN account then they have full control over every site that's using this code.

This is one of the top ways that credit card breaches are happening lately - e-commerce sites include tons of 3rd party tracking / analytics / remarketing / etc code on their checkout pages, one of them gets hacked and the modified JS posts the credit card form to some compromised server.

r1ch | 6 years ago

I don't doubt your intentions, but I simply don't believe that any kind of user analytics as-a-service is ever going to be good enough privacy wise.

Do you know what isn't creepy and privacy invading? Analysing the attributes of the visitors to FranksKebabShop.com, as part of the tooling that runs FranksKebabShop.com.

This could be analysing web server/cache logs. It could be a more active piece of software that operates via JS and reports back to a service running on the same domain.

I know, I know "everything is SaaS now, nobody installs software". Nobody can install it if you don't make it installable. Be part of the solution not part of the problem.

stephenr | 6 years ago

Well done brother. :-) The more privacy-aware tools, the better!

Something that would interest me, is a little explanation of https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl....

You already have very brief comments at strategic points. If you would explain these one by one, I would learn a lot about optimizing for number of requests, skipping stuff to load, etc. Maybe a technical blog post at a later time when the dust settles?

MrQuincle | 6 years ago

This is very cool - I'm literally building the same thing right now!

pistoriusp | 6 years ago

For a project of mine I created an 'actions' table in my database. For every visit (only server-side data) I make an entry into that table. That way I keep track of key metrics that I am interested in (basically which page is loaded and where did the visit come from?). I also store the request id so that I can differentiate between different visits. Entries into this table are made in an new thread in order to prevent any issues or slow-downs on that end to influence load-times, etc too much. Works very well.

mosselman | 6 years ago

Screen sizes is ambiguous here - are you measuring viewport width (`window.innerWidth` - helpful) or the display the window happens to be on (not too helpful)? Also something to make that data useful would be show the range of sizes, instead of the top specific size. E.g. 1280 may be _the most popular_ but there may be more users using larger width windows, just more variation in those sizes (1320, 1440, etc), so a top level range could be a nice differentiator here.

But, how useful are these stats going to be without being able to see user journeys through a path of pages / actions? Yes, it's good to know which pages are getting how many views. But, in order to improve the UX, we often need to know how many users are able to go from Page A to Page C and whether they went through Page B first. Or e.g. if 90% of sessions that start on Page A (so we know what their purpose was), end on Page B but the main (perhaps beneficial) action for the user was on Page C. You can't just look at the pageviews for each, because you don't know where the session started.

I fear that this would reduce people to "inferring" (guessing) too much about the data that they see, and making decisions they feel are backed with data when there's not enough data to conclude. Then again, I'm sure that happens when the data is there too :-)

petemill | 6 years ago

Just a heads up, it looks like the Referrals section of your Analytics is being vandalized

sincerely | 6 years ago

It doesn't even offer close to the features Google Analytics offers and costs $12/month. The same such a service as Netflix costs. The idea is nice but looking at the actual product here: https://simpleanalytics.io/simpleanalytics.io

It disappoints in every way, you can't even check yesterdays stats.

Cpmly | 6 years ago

I'm a potential user/customer. I support two small scale websites that give my two business a presence on the web. By 2013 I guess I started to feel too anxious when accessing Google Analytics because the service was getting bigger and bigger. I could not see its "UI boundaries" anymore, and with that I got the impression I was leaving useful views/analysis behind. Unfortunately I am the kind of user who needs somebody to provide a set of pre-built views/analysis I could make sense of. I don't have the time to rationalize on what I need at various levels and then build the views.

With that said, a minimalist approach to web analytics is attractive to me, specially if I can see its "boundaries", the set of reports etc.

The argument on privacy (or lack of it) has no impact on my perception about this service's value proposition.

rodolphoarruda | 6 years ago

Hi I work in digital analytics and have a question. A problem with Piwik is if that PSQL database goes down (a database is NEVER 100% up) what happens to the data your JavaScript snippet is sending?

Will also add a lot of comments here are very unfair I hope you take them with a grain of salt.

jackgolding | 6 years ago

Just a quick reminder, that Fathom started its Pro offering only a few days ago: https://usefathom.com/

It's also Open Source so you can see for yourself what is going on, or even self-host.

dna_polymerase | 6 years ago

Data collection for legitimate purposes came up in our GDPR compliance review.

This product (https://truestats.com) collects the I.P. address and user agent for the purpose of detecting fraud (not selling data or profiling users). It is used for frequency checking and other patterns that would indicate fraud. We are still going through the legal analysis of how to deal with this, even though we have no idea who the visitors are.

I think considering the I.P. address as PII is a little much if you are not using it in a way that would violate privacy or selling the data.

gator-io | 6 years ago

Looks good! I'm the founder of a similar service (Blockmetry). Obviously non-tracking web analytics is the future!

I'm curious why you chose to host the data yourself instead of giving customers the data immediately at the point of collection. That's the path we chose for Blockmetry as it genuinely required to be a non-tracking web analytics service and makes it impossible to profile users. Any service that hosts its data would still be open to being untrusted on the "no tracking no profiling" argument.

Thanks, Pierre

PS - YC Startup School founders: ping me via the forums and get an extended-period free trial.

pierrefar | 6 years ago

This is not GDPR friendly.

Executing third party JS on your website is an access to the page content, so unless the customer never had any user data or sensitive data on the page, they'll have to categorise simpleanalytics as a data processor.

Referers are often on their own private data, for example https://www.linkedin.com/in/markalanrichards/edit identifies not just you looked at this user, but that you are this user as it is the profile editing page, unique to this account.

The difference between whether simpleanalytics get or store data might remove a GDPR issue for them, but it certainly is for customers. Having access to the IP addresses is sufficient for privacy to be invaded at any point or by accident (wrong logging parameter added by the next new dev), malice (how can we illegally use this and lie to customers) or compromise (hackers take control of the analytics system) and therefore puts users at risk of full tracking at any point. As mentioned earlier GDPR is also about access, it is definitely about storage but the part in between of being given data (not just access to take it and not putting it on disk) is definitely included too.

In summary, simpleanalytics need to stop lying and redo their privacy impact assessments. Meanwhile don't use third party analytics (I have no idea how you maintain security control on third party JS) and if you're silly enough to, then it definitely is a GDPR consideration that needs to be assessed, added to audit, added to privacy policies, etc.

marichards | 6 years ago

"We don't use cookies or collect any personal data."

IP-address is considered personal data. So when the browsers visits a page with the JS, the IP-address of the user is transferred to your server. So that means the website I am visiting is sharing my IP-address with a third-party (you).

sleepyhead | 6 years ago

>No Evil Corp. Just me.

I would remove this, any company would hesitate to buy a service from a single guy.

sergiotapia | 6 years ago

I think everyone starts like that.

Then, clients that help keeping lights on start asking for this and that.

And suddenly you end up providing a service with user level insights, cross-device tracking and advanced behavioral segments powered by ML because why not.

GA was simple, before.

vassilyk | 6 years ago

Is there a way to track country and language as an aggregate? For businesses this information is extremely useful as it gives an idea of what countries to expand to or what languages should be supported.

tedivm | 6 years ago

So this is open to everyone?

I mean, can I just see stats of a site that uses the service?

e.g.

https://simpleanalytics.io/simpleanalytics.io

tzury | 6 years ago

Very cool! Was just looking into how to configure GA to not use cookies...

Just want to point out for all the front-end devs out there: 12% of traffic to this site atm is coming from screen-width < 375px.

pcmaffey | 6 years ago

Did Google just install your tool? https://simpleanalytics.io/google.com :)

chpmrc | 6 years ago

My feedback: someone else mentioned making the tiny live demo button bigger. I suggest scrapping it entirely... and embedding the demo statistics directly under the video, or very close to it, to go straight from "why" to "what it looks like". The chart/stats page design is sufficiently clean that shoving the whole thing onto the homepage won't actually be an information overload.

Speaking of the video, it's ridiculously professionally done, by the way; excellent acting to begin with and perfect line delivery (confident, well-timed, no hesitancy/awkwardness) as far as I'm concerned.

-

Apart from this, my only other advice is - reject buy offers, reject partner offers, sleep on VC offers for as long as you can (if, ideally, you don't outright reject these as well), and take this as far as possible on your own. I say this considering two standpoints.

a) Considering the developer: this is incredibly well done and you clearly have the competency to drive this forward without assistance. The website and video presentations are both great; the product defaults easily tick "sane enough"; and the only thing stopping me throwing money at the screen is that I have no projects that need this right now - but others definitely will, and I look forward to seeing this go viral.

b) Considering the product: "oooo internet privacy" is a well-trodden path with a thousand and one different options which are all terrible in their own way. You have the opportunity to differentiate by offering something that gains a reputation for actually not compromising, even months and years down the track by working to eliminate some of the sociopolitical cascade that can contribute to dilution of quality. Customers have sadly had good reason to associate buyouts with rapid decline in quality, so that sort of thing just looks bad at face value too.

To clarify what I mean by taking this as far as you can on your own: it's obvious others have already provided assistance - filming and acting in the video, and for all I know beta testing and maybe other development support - and I'm not pointing at that and suggesting it will bite you. I mean that, if you ever bring help on, find a good lawyer who will ensure the project remains _yours_ and make sure there are no implicit "50/50" partnership agreements or the like.

I can't find the references right now but I've read of a couple of projects/products that have exploded sideways (very sadly) because of jealousies and impedance mismatches creating imbalances that provoke partners brought onto projects to assume control and pivot things out of a creator's control, without the creator having any legal recourse.

exikyut | 6 years ago

Really slick! Could you throw some insight about the techstack, architectural decisions etc.? Would love to understand more about those.

borncrusader | 6 years ago

Few Questions: How likely is this to be blocked by uBlock Origin/Firefox private mode (easy-list etc). Do they have any rules what they consider to be 'ethical analytics'? How much overhead does this analytics package have on page load.

Have you considered a free tier for up to 1k page views a month for example?

How can this track conversions for A/B testing? This is one of the most common usages of analytics in my experience. Is there a way to have user based conversion tracking whilst still being GDPR compliant?

RealDinosaur | 6 years ago

This is a fun, self-monitoring prophecy of a kind. You can see clicks originating from ycombinator after this post went to ycombinator.

smolsky | 6 years ago

I love it. More of these, please.

As an author of SPAs and PWAs, though, I'd really like the ability to push a page hit programmatically.

MentallyRetired | 6 years ago

Great work! Nice design and everything has a genuine touch to it. The video is surprisingly amusing and well done too.

Best of luck with it!

Reedx | 6 years ago

How can I view a graph for for individual pages? For example, how would I see the graph for /what-we-collect ?

aembleton | 6 years ago

anyone here uses clicky

https://clicky.com

ksec | 6 years ago

I'm feeling exactly like you. Each time I need to install GA, I am reluctant. Thanks.

eXorus84 | 6 years ago

In case some people are unaware, after GDPR google released an addon that allows you to opt-out from google analytics tracking across the web:

https://tools.google.com/dlpage/gaoptout/

xwvvvvwx | 6 years ago

This feels like a rant, but I've posted my https://trackingco.de/ here multiple times, which has very similar proposal (and is cheaper) but never got a single line of feedback.

fiatjaf | 6 years ago

i love the personal video on a privacy-first site. its a really nice touch. (no sarcasm) It's really refreshing.

We will consider it. Thanks for making this. hopefully more companies will follow suit.

artur_makly | 6 years ago

Yandex Metrica offers a pixel only tracking option.

lcnmrn | 6 years ago

The live-demo button needs to be better visible.

JepZ | 6 years ago

Why don't ya apply to YC with this

cvaidya1986 | 6 years ago

this is like saying "google is to big, so let me start being evil!" most stupid thing I ever seen.

onecooldev24 | 6 years ago

Please give a comparison to Matomo https://www.programsnow.com/

kashosoft | 6 years ago

The idea is great, but price is way too high for a simple site. Many people are interested in anonymised data like pageviews and geographical distribution, for example, but these people pay 10€/year for domain and often 0 for hosting for static site generators. 12€/month is just really expensive at this level, but good luck and I’m sure for many people it’s totally fine price.

gryzzly | 6 years ago