What does Google know about me?

After learning what Twitter knows about me, then what Spotify knows about me, and what Apple knows about me, I thought it was time for the big one: Google.

I’ve used Google services since 2004. Google will know more about me than anyone: I use Google heavily for my Android phone and watch; I use Chrome as a browser; I use Gmail as my main email service; I have Google Assistant throughout the house; I pay using Google Pay almost exclusively. I’m on Google’s Advanced Protection program, which requires me to use a physical 2FA key to log in. I also use NextDNS to block some connections, most notably to Google’s ad services.

Between 2009 and 2015 I used my Google Workspace account rather than my Google account. The below, though, is just for my Google account, not Google Workspace. So, much of the size of this data is half what you’d expect, since it is missing a lot of information.

I opted out of asking Google to send me my entire email inbox, and all my photos. It took three days for Google to send me my data. The data was incomplete: it couldn’t send me my fitness data from Google Fit for some reason. The result was still a zipped data dump of 14.25GB, which comes complete with a nice browse experience as an enclosed HTML file in the data.

So, what has it got?

Well, of the more interesting…

My Google Account

My account was created on 14 June 2004, it tells me: at the time that it was created I was probably at work. The IP address is listed, though it’s now owned by someone else.

There’s a file here with a list of every time I’ve logged-in since 1 April 2022, including an IP address. I’m not sure it’s showing my real logins: there’s a period with seven logins within thirty minutes all on my Google Pixel 6 Pro, as one example.

Access log activity

An 106MB CSV file showing 645,096 accesses to my Google account from different devices over just the last 28 days: IP addresses, what device it was, and which Google service it was accessing (though there are a lot of “Other”).

If interesting, here’s how it breaks down:

  • 177,849 accesses from a smart speaker
  • 294,379 accesses from my (one) MacBook
  • 40,781 accesses from iOS (which is either my iPad or my iPhone)
  • 76,881 access from my Android mobile phone

or

  • 13,958 accesses from a Gmail app on iOS (only my iPad)
  • 9,018 accesses from a Gmail app on Android

Google Assistant

All 32,378 times I’ve used Google Assistant since September 2016. Complete with audio clips where appropriate.

Here, for example, I’m in the middle of cooking (in May 2021):

https://files.james.cridland.net/2021-05-08_07_49_46_854_UTC.mp3

What’s interesting here is that these are from the Google Assistant speakers in the house; but the files only contain my own voice, not those of the rest of the family.

Goodness.

Google Chrome

bookmarks.html, a “Netscape Bookmark File” according to the metadata, containing all my bookmarks. Nice.

A 90MB “Browser History” file, in JSON, which appears to contain all 236,788 web pages that I have visited in the last calendar year. Here’s a bit of that: I’ve hidden the client_id. Of interest here is the page_transition field, which shows whether I followed a link, or typed it, or… etc. (This isn’t all my internet activity: I used Firefox for a period in the last twelve months).

        {  
            "favicon\_url": "https://www.google.com/favicon.ico",  
            "page\_transition": "GENERATED",  
            "title": "youtube bleep - Google Search",  
            "url": "https://www.google.com/search?q=youtube+bleep&oq=youtube+bleep&aqs=chrome..69i57j0i512l6j69i64.1623j0j7&sourceid=chrome&ie=UTF-8",  
            "client\_id": "(snip)",  
            "time\_usec": 1640927119070785  
},  
        {  
            "page\_transition": "TYPED",  
            "title": "New Tab",  
            "url": "chrome://newtab/",  
            "client\_id": "(snip)",  
            "time\_usec": 1640927115098531  
},  
        {  
            "favicon\_url": "https://podjobs.net/\_f/fav/favicon.svg",  
            "page\_transition": "TYPED",  
            "title": "Podjobs - podcasting jobs",  
            "url": "https://podjobs.net/",  
            "client\_id": "(snip)",  
            "time\_usec": 1640926766569034  
},  
        {  
            "favicon\_url": "https://podnews.net/static/favicon.svg",  
            "page\_transition": "LINK",  
            "title": "Maintenance Phase",  
            "url": "https://podnews.net/podcast/i4jyw",  
            "client\_id": "(snip)",  
            "time\_usec": 1640926544852594  
},  
        {  
            "favicon\_url": "https://podnews.net/static/favicon.svg",  
            "page\_transition": "FORM\_SUBMIT",  
            "title": "Search: Maintenance Phase",  
            "url": "https://podnews.net/search?q=maintenance+phase",  
            "client\_id": "(snip)",  
            "time\_usec": 1640926543178359  
},  
        {  
            "favicon\_url": "https://assets.nymag.com/media/sites/vulture/favicon.ico",  
            "page\_transition": "AUTO\_TOPLEVEL",  
            "title": "The Best Podcasts of 2021, According to the Podcast Industry",  
            "url": "https://www.vulture.com/article/best-podcasts-industry-survey-2021.html",  
            "client\_id": "(snip)",  
            "time\_usec": 1640926523651274  
},

Elsewhere, under “News”, it lists all the news articles I’ve read (presumably following links from Google News and others).

Google Podcasts

My professional interest: this has a log of everything I’ve done in Google Podcasts, all the way back to 2017. 298 listens, 277 searches. As ever, it’s all an HTML file, so you can’t do much with this file (and you can’t see any real detail).

What’s a little odd here is that Google Podcasts actually launched in June 2018. So what is it actually measuring here, 12 months earlier?

A large, 126MB, HTML file showing every search I’ve done in Google, and where I was when I did that search.

This is such a large file that Chrome can’t show the whole file when I try to open it up in there; and it crashes my text editor when trying to load it in there. I resort to using the tail command to grab the last 1,000 characters from the file: to my astonishment, it contains every search I’ve ever made since 23 May 2005 (when I searched for Paul Weller). It seems to have 121,078 searches in it.

A separate folder marked YouTube contains every video and song I’ve listened-to in the service — all 19,159. Again, it in this HTML format.

Google Fit

2,859 files marked “activities” from 2018–2022. They’re .tcx format files, and appear to be in a Garmin-authored XML format, namely http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2… I’m fascinated to see that it’s in a Garmin file format, since — while I have owned a Garmin watch, I didn’t own one in 2018, and haven’t used it in 2022. There is some location data in here: they’re mostly walking or cycling. The recent data looks to be where I’ve specifically asked Google Fit to track me (on my Pixel watch).

2,924 files in “all sessions” from 2018–2019, containing, I think, every time Google Fit has noticed I’ve been walking, or possibly every time I’ve told it I am. These are in a JSON format that doesn’t appear to include location data: just timing, and a little bit of data at the end:

..... "fitnessActivity": "walking",  
    "startTime": "2019-04-07T22:16:47.507Z",  
    "endTime": "2019-04-07T22:17:39.102Z"  
  }, {  
    "fitnessActivity": "walking",  
    "startTime": "2019-04-07T22:19:52.060Z",  
    "endTime": "2019-04-07T22:21:44.252Z"  
  }\],  
  "aggregate": \[{  
    "metricName": "com.google.heart\_minutes.summary",  
    "floatValue": 6.0  
  }, {  
    "metricName": "com.google.calories.expended",  
    "floatValue": 185.27497753500938  
  }, {  
    "metricName": "com.google.step\_count.delta",  
    "intValue": 2627  
  }, {  
    "metricName": "com.google.distance.delta",  
    "floatValue": 1706.2184907197952  
  }, {  
    "metricName": "com.google.speed.summary",  
    "floatValue": 0.9639261780371223  
  }, {  
    "metricName": "com.google.active\_minutes",  
    "intValue": 35  
  }\]

There are also 2,383 “daily activity metrics” files from 2015–2022, which break up my fitness data to 15 minute chunks.

Google Pay

In an HTML document for some reason, a list of all 4,492 times that I have used Google Pay to pay contactless. Most have the amounts listed, and a location in lat/lon only. There is no data about where I spent the money otherwise (not the store).

Location History

I should point out, I have spent strenuous efforts to ensure that Google keeps this information. For most people, I believe it auto-deletes.

It has a “Semantic Location History” folder, which includes a separate file for each month. The first few months, from 2009, are rather fine, since I’m on a round-the-world trip: I see places in San Francisco, Toronto and Thailand. The locations would have been collected using my iPhone: I didn’t have roaming data back then, so the data would have been stored and sent when I connected to wifi.

Here’s an example: a walk to the local coffee shop in the afternoon of 1 Aug 2015. It’s reasonably confident that I went there and stayed for an hour.

, {  
    "placeVisit": {  
      "location": {  
        "latitudeE7": 516327448,  
        "longitudeE7": \-1279165,  
        "placeId": "ChIJMTIk2hoZdkgRoUTmfCnZrD4",  
        "address": "10 Chase Side, London, Southgate N14 5PA, United Kingdom",  
        "name": "Harris and Hoole Coffee Shop",  
        "locationConfidence": 63.52416  
      },  
      "duration": {  
        "startTimestamp": "2015-08-01T15:11:19.997Z",  
        "endTimestamp": "2015-08-01T16:13:28.205Z"  
      },  
      "placeConfidence": "MEDIUM\_CONFIDENCE",  
      "centerLatE7": 516324375,  
      "centerLngE7": \-1277414,  
      "visitConfidence": 77,  
      "otherCandidateLocations": \[{  
        "latitudeE7": 516327582,  
        "longitudeE7": \-1279884,  
        "placeId": "ChIJ1-3StQYZdkgRfPpRqn40JBE",  
        "address": "12 Chase Side, London N14 5PA, UK",  
        "name": "The Charcoal Grill",  
        "locationConfidence": 11.794518  
      }, {  
        "latitudeE7": 516327654,  
        "longitudeE7": \-1280680,  
        "placeId": "ChIJif5N5hoZdkgRvi-DpqMubZ0",  
        "address": "14 Chase Side, London N14 5PA, UK",  
        "name": "Subway",  
        "locationConfidence": 10.715063  
      }, {  
        "latitudeE7": 516325808,  
        "longitudeE7": \-1283597,  
        "placeId": "ChIJoWOmwhoZdkgRbKe\_F65qeh0",  
        "address": "3 Chase Side, London N14 5PB, UK",  
        "name": "Claud W Dennis Coffee",  
        "locationConfidence": 2.636393  
      }, {  
        "latitudeE7": 516326005,  
        "longitudeE7": \-1285949,  
        "placeId": "ChIJT4IwwxoZdkgRNmw0JUljAV0",  
        "address": "7 Chase Side, London N14 5BP, UK",  
        "name": "Maze Inn",  
        "locationConfidence": 2.4405143  
      }, {  
        "latitudeE7": 516326760,  
        "longitudeE7": \-1287510,  
        "placeId": "ChIJ1-3StQYZdkgRFRGzvf979AU",  
        "address": "11 Chase Side, London N14 5BP, UK",  
        "name": "Southgate Fish and Chips",  
        "locationConfidence": 1.4867097  
      }, {  
        "latitudeE7": 516323200,  
        "longitudeE7": \-1277900,  
        "placeId": "ChIJP88g7BoZdkgRVfH6ywn7tkw",  
        "address": "High St N, London N14 5BH, UK",  
        "name": "Southgate Station",  
        "locationConfidence": 1.3347566,  
        "isCurrentLocation": true  
      }, {  
        "latitudeE7": 516323200,  
        "longitudeE7": \-1277900,  
        "placeId": "ChIJP88g7BoZdkgRVfH6ywn7tkw",  
        "address": "High St N, London N14 5BH, UK",  
        "name": "Southgate Station",  
        "locationConfidence": 1.192504,  
        "isCurrentLocation": true  
      }, {  
        "latitudeE7": 516327940,  
        "longitudeE7": \-1265178,  
        "placeId": "ChIJRRUqXxoZdkgRPk3uU3yTheQ",  
        "name": "Caf Bravo",  
        "locationConfidence": 0.5965895  
      }, {  
        "latitudeE7": 516329280,  
        "longitudeE7": \-1288418,  
        "placeId": "ChIJ1-3StQYZdkgRNBTjLfs\_s88",  
        "address": "40 Chase Side, London N14 5PA, UK",  
        "name": "Oxfam Shop",  
        "locationConfidence": 0.5554932  
      }\],  
      "editConfirmationStatus": "NOT\_CONFIRMED",  
      "locationConfidence": 58,  
      "placeVisitType": "SINGLE\_PLACE",  
      "placeVisitImportance": "MAIN"  
    }  
  },

Nothing in here comes as any surprise at all — it’s visible anyway on the timeline view in Google Maps. Even that journey in 2015.

Here’s where you control your location history, incidentally.

Ads

A large, HTML file, containing 1,319 webpages I’ve visited “from Google Ads”; the first from 20 Jan 2007 — a paid click (using DoubleClick) to a UK transport website. Looking through them, they seem to be any pages I’ve clicked in an ad search result from Google, or any page that has been suggested to me from the Google Now feed (swiping to the left on Android).

Sometimes it also keeps my search queries: “Searched for 2005 Apple iBook White Laptops White Apple Laptops” was something I did three days ago. It has kept 105 searches.

Sometimes, too, it just notes when I’ve used an app on my phone: Waze, or Woolworths, or the Qatar Airways app, or a game or two, or even just visited a website.

This is possibly the most invasive bit I’ve discovered here: data going back to 2007. Since it’s in an HTML format, it means it’s hard to look through, too. You can turn this off.

Android

Another large, HTML file. This seems to store whenever I open an app. Here’s a view inside it. Again, you can apparently turn this off using the link above.

Again, not entirely sure how I feel about this.

Nest

This appears to have a thing called home_away_assist which seems to contain data on when I’ve left the house, and when I’ve come home again. You’d probably expect that, but it’s still a bit strange to see it.

Other things of interest

Blogger: Atom files containing comments I’d made on other blog posts.

Google Voice: In 2010, I managed to sign up for a (US) Google Voice account. It’s got the records of five calls I made or received — but nothing other than the number (or name); the duration; and whether it was placed or received.

Google Chat: a few sets of messages with people who used this service to chat directly, dating 2016.

Google Maps: A full list of the photos I contributed to the project while being a Local Guide. Some come with a JSON file showing the lat/lon and altitude, and how many views they’ve had on the service.

Google Keep: every note I’ve made on Google Keep, all in HTML with some in JSON.

Google Groups: .mbox files for three groups which I “own”.

Google Play Store: in a JSON file, a list of all 1,456 times I installed an app since 2016. Also, the 169 times I bought something from the play store since 2016; the 54 reviews of apps I’ve made since 2016.

. . .

In summary

There are a few things that are interesting in all this Google data.

First: the amount of careful repurposing into an HTML file. This makes the data much more unweildy to actually use. The HTML file of my Google Search history is so large, it’s almost impossible to work with other than using command-line tools like grep and tail. You can’t help feel that this is on purpose.

Second: the lack of any algorithmically-generated data is interesting. Google has guesses about me which it uses to target ads to me — you can see those algorithmic guesses here. But none of that data appears in the Takeout data that I can see. I guess it isn’t “my” data.

I’d not have considered that Google keeps note of every time I open an app on my phone. I’m fascinated that a data corpus of 121,000 searches from up to 17 years ago is of any use to anyone.

There’s a lot of data here that I’ve knowingly been allowing Google to collect; and while I don’t feel very spooked out by any of it, it’s certainly making me think a bit. Do I really want Google keeping all this data?

Hmm. Interesting.