James Cridland

What does Twitter know about me?

A month or so ago, I discovered that I could download a big data file from Spotify to learn more about what the company knew about me.

And recently I learnt I can do the same from Twitter. So, I requested the data. It took a day or so to be compiled, and was a 162 MB file.

Opening it showed me a little pretty HTML file with this in it:

A few things about how I use Twitter, before we go on: I’ve been with Twitter since 2004, but deleted all tweets before about 2019. I also spent six months without a personal Twitter account for mental health reasons earlier in 2019. I use a variety of accounts, but this is only for my personal @jamescridland name. Twitter is installed on my Android phone, and also on an iPod Touch, as well as on my desktop Mac.

The reason why there are 505 blocked accounts in the list above? I tend to block all irrelevant, crappy advertisers. It seems to work. Try it.

With that out of the way: the “quick stats” section of the above HTML leads to pages containing relevant tweets. Here’s the top of the “likes” page, which is all on my desktop — a little odd to notice that it lists the tweet’s contents, but not the author.

And that’s nice, and all, but I was rather more interested in the entire contents of the data folder, which looks like… this.

There’s a super-helpful README.txt in here too, which contains details of every file, what they are and what they contain.

Because there’s so many file in here, I’ll not look at all of them, but I’ll have a peek at some that look vaguely interesting.

“account-creation-ip” tells me that I used the IP address of when I signed up, which appears to (now) belong to the BBC. That’s curious, since “account” tells me I signed up on 10 December 2006 in the early afternoon, a day when I was most certainly working for Virgin Radio.

“app.js” is quite scary. The README file says:

- appId: Identifier of the app Twitter believes may be installed on devices associated with the user.
- appNames: Name of the app Twitter believes may be installed on devices associated with the user.

…and it is a list of some of the apps I have installed on my phone, including my bank, Audible (which I deleted a while back), Spotify, and Uber among others. I’m not very happy about seeing this, I’ll be frank: it seems invasive.

“connected-application” contains all those websites that I’ve connected my Twitter account with. I’ve got 98 in here, 74 of which have write access to my account. The first website I appear to have connected with is Disqus, in 2009.

“contact.js” is a big list of 1,795 of my contacts email addresses and phone numbers (but no names, other than what you can see in the email addresses). It’s this that is shared with Twitter if you ever agree to sharing your contacts so you can find your friends. Judging by the data in here, I last did that in about 2008. I’m surprised Twitter has kept this data for so long: it seems against the Data Protection Act rules in the UK, certainly, which requires that data like this is kept up to date.

Compare the above with “device-token”, which contains all the devices I’ve logged into Twitter over the last 18 months. That would appear to be a little better in terms of privacy.

“ip-audit” is a bit crazy, too: 2,072 different IPv4 addresses that I’ve logged into Twitter from. The oldest in this list is from 6 September 2021 (so this is 60 days of data). The IPv4 data in here includes my home IP address, my mobile operator, but also many random addresses in Amazon AWS’s Virginia `us-east` datacentres: I don’t run code there, so I’m presuming it’s some of the connected applications above.

“user-link-clicks” contains all the URLs I’ve clicked on in the last 30 days in Twitter. Not that many, it seems.

And then there’s the stuff around advertising and personalisation…

“ad-online-conversions-attributed” and similar are described as “Web events associated with the account in the last 90 days which are attributable to a Promoted Tweet engagement on Twitter.” They’re all empty; this may be because I run quite a fierce network block on my phone and iPod.

“ad-engagements” is apparently “Promoted Tweets the account has engaged with and any associated metadata.” This is a large file, with 2,118 ads that I’ve engaged with, and 12,633 different engagements. Here’s the data of one of them.

…so this is quite interesting: you can see Westpac, an Australian bank most famous for facilitating child abuse, targeted me because I’m in Australia, I’m over 18, and I must have interacted with a tweet with a hashtag of #property for some reason. You can see that I must have viewed enough of a video here to have made an “engagement”. I don’t recall it.

Here’s Seek, an online jobs website, and I’m being targeted here because I visited their website apparently (not entirely sure I did), and Twitter even has a customer ID here from Seek.

Here, Amazon Prime Video is targeting me because I follow ABC Media Watch, a TV show.

…and here’s a WordPress plugin company (I don’t use WordPress) targeting me because I look as though I might be someone who might follow WordPress on Twitter (I don’t).

It’s all quite fascinating stuff, I’ll be honest.

“ad-impression” is more of the same, but shows the ones I also didn’t engage with. It’s almost identical otherwise, and shows all kinds of targeting.

and finally “personalisation” is:

  • a list of languages it thinks I speak (English, French, Italian and Haitian Creole for some reason),
  • 571 interests that it things I have (“Travel news and general info”, yes, “Personal finance”, yes, “Peppa Pig”, no, “Google Brand Conversation”, I guess so, “Ed Vaizey”, no not really, “Bolton Wanderers”, absolutely not
  • A list of “advertisers”, (“List of screennames for the advertisers that own the tailored audiences the account is a part of”), most of whom I don’t recognise, but including…
  • … a very big list of “lookalikeAdvertisers” like that.
  • A list of shows it thinks I watch. What’s Ink Master? Yes to BBC News, no to Victorious, yes to The Last Leg, no to WWE Monday Night RAW.
  • and finally an inferred date, in case I hadn’t put my own birthdate in there. It has inferred that I am “13–54”. I can’t really quibble with that, but then, nor can most people in the world!

In summary

Much of this is benign. Some is quite interesting, especially how advertisers have targeted me.

Some, though, is a bit concerning. Why does Twitter still have my contact data from 12 years ago? And most importantly: why has Twitter wandered through my phone to work out what apps I use?