James Cridland

Odd podcast downloads

Figures from OP3 which are both great and bad

https://op3.dev/show/55f8007aec094899a02fd44273aa6558 shows massively high numbers for Podnews Daily recently. As above, shows that normally get about 2,500 downloads are suddenly getting 49,000 downloads. But it’s not all good news.

  • Stats for March 1 show 51% of downloads from Malaysia and a further 38% of downloads from Indonesia.
  • Top devices for March 1 = Android Phone (97.8% of all downloads).
  • Seemingly, the only useragent being used is Dalvik.

Now - all of these audio downloads appear to be the AAC version, and tagged as ?from=googleaudionews - which means it’s come from a special secret RSS feed that I serve the systems that run Google News’s audio service. (“Hey, Google, play the latest news”). Nobody would be asking for this file if they haven’t seen the Google News RSS feed that I given Google (and Google alone).

I’ve checked the logs for the URL of that special RSS feed. For the entire month of February, I don’t see anything out of the ordinary - the only thing asking for that secret RSS feed is Googlebot, from an IP range that appears to be owned by Google. (Apart from one listener on Overcast who found it there. Weird. Anyway, that’s fine.)

So, this must be somehow coming from Google News in some way. But, again, I don’t know how - only that the Google News secret RSS feed is the only way that we’d add ?from=googleaudionews on the end of the audio URL.

So, to the audio files. Here are the server logs for the audio files, with the user’s IP address removed, for a day. Spot the ones saying ?from=googleaudionews in this big list. I’ve removed the IP addresses for privacy reasons, but “Location” shows the Amazon CloudFront point that the user has connected to, which is usually closest to where they are physically located; you can see plenty of KUL (Kuala Lumpur, Malaysia), SIN (Singapore), and CGK (Jakarta, Indonesia). All of the listeners are using normal IP addresses that look like a residential Indonesian/Malaysian IP address.

All of them have “Dalvik” as their user-agent - that’s a generic useragent from an Android phone if the developer doesn’t set one - but all of them are running different phones. Here’s an example - lots of differences between Android 13, 14 and 15 (and some even using Android 8!). The latest version of Android is v15.

(The other interesting thing about these audio downloads is that they’re also all HTTP/1.1 connections. It seems to be the case that some old Android APIs internally only support HTTP/1.1 - notably the HttpClientHandler class.)

Dalvik/2.1.0 (Linux; U; Android 13; 220333QAG Build/TKQ1.221114.001)
Dalvik/2.1.0 (Linux; U; Android 14; V2248 Build/UP1A.231005.007)
Dalvik/2.1.0 (Linux; U; Android 14; ALI-NX1 Build/HONORALI-N21)
Dalvik/2.1.0 (Linux; U; Android 14; RMX3997 Build/UP1A.231005.007)
Dalvik/2.1.0 (Linux; U; Android 15; FCP-N49 Build/HONORFCP-N49)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2437 Build/AP3A.240617.008)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2581 Build/AP3A.240617.008)
Dalvik/2.1.0 (Linux; U; Android 8.1.0; DUA-L22 Build/HONORDUA-L22)
Dalvik/2.1.0 (Linux; U; Android 14; CRT-NX1 Build/HONORCRT-N31)
Dalvik/2.1.0 (Linux; U; Android 8.1.0; CPH1803 Build/OPM1.171019.026)
Dalvik/2.1.0 (Linux; U; Android 15; CPH2607 Build/SP1A.210812.016)
Dalvik/2.1.0 (Linux; U; Android 14; LLY-NX1 Build/HONORLLY-N31)
Dalvik/2.1.0 (Linux; U; Android 15; ELP-NX9 Build/HONORELP-N39)
Dalvik/2.1.0 (Linux; U; Android 13; RMX3491 Build/RKQ1.211119.0

Some of those files are also being downloaded by things like this:

Mozilla/5.0 (Fuchsia) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 CrKey/1.56.500000

… now, this is a Google Nest speaker, the fancy one with a screen, I happen to know (because that’s what uses Fuschia). This seems an accurate Google News client.

Also, these:

Mozilla/5.0 (Linux; Android 8.1.0; vivo 1807 Build/OPM1.171019.026; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/111.0.5563.58 Mobile Safari/537.36 GSA/14.14.16.26.arm64
Mozilla/5.0 (Linux; Android 9; SM-A505F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.92 Mobile Safari/537.36 GSA/13.48.11.26.arm64
Mozilla/5.0 (Linux; Android 7.1.1; CPH1801 Build/NMF26F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/119.0.6045.193 Mobile Safari/537.36 GSA/13.20.11.23.arm64`

These are the Google app, which is pre-installed in all Android phones. Spot the GSA/ at the end. Again, I’d be comfortable to see a Google News audio request coming from something like this.

But, I don’t think the others are. Just a standard Dalvik call is not that helpful. (Dalvik is the thing that runs Android apps - and is similar to AppleCoreAudio: it’s the name given by the device when the developer hasn’t bothered to add a name of its own).

I’m confused. Is this really Google traffic; if so, why are only users in Malaysia and Indonesia being affected by it; why is it suddenly happening now?

And if it isn’t Google traffic, how are they getting hold of the special feed’s details, when they’re not asking for it?

As of 4.30pm my time, I’ve changed the user-agent tag that my script gives, to read “googleaudionews1” - at least that might help work out if it is being reingested. I can put anything in there - I wonder what else might be interesting.

I don’t see any referers in the two logs of data.

So… what’s going on? Let me know. I’m most confused.

Day two

So, as above, I changed the Google Audio News googleaudionews code to a slightly different code, just so we have clarity that it is this secret RSS feed that Google News uses. And it is - all these downloads are now marked as “googleaudionews1”.

So, all these downloads are coming from the secret RSS feed that only Google News uses - and, as I can see from the logs, only Google consumes.

Another slew of logs

The above is the start of the avalanche of downloads. First, at 11:27:00, a single download from Dallas, as Googlebot-News Audio grabs the audio. Then, five seconds later, the start of a deluge of downloads from Malaysia and Indonesia, with “Dalvik” as the user agent. There is no “Dalvik” until the Googlebot has been to visit. So, more proof tying it to Google.

Things I can also tell you is that the IP addresses (hidden in the above screenshot) are all residential ISPs. I’m really pleased that so many Indonesians and Malaysians are listening to a scrappy podcast about podcasting news, but I also doubt that they would be doing that.

In the meantime, I’ve taken that M4A that I’m creating for Google and Siri, and changed it from a 128kbps/44.1kHz (6.8MB) file to a 32kbps/24kHz (1.9MB) file. If I’m getting 20x the downloads, the least I can do is produce a 3x smaller file. (To be honest, it sounds fine - AAC HE v2 in mono actually is quite decent this low bitrate).

The mystery continues! Hit refresh here tomorrow to see more…

Day three and four

A small delay while I deal with a power cut and preparations for a cyclone…

One reader tells me that with his Pixel 6a:

If you use Google Assistant and ask “listen to the news” and have Podnews configured as one of your news sources, it issues a single GET request (for podnews250304.mp3?_from=googleaudionews1) with the ua: Dalvik/2.1.0 (Linux; U; Android 14; Pixel 6a Build/AP2A.240605.024)

This was a good idea, and I should have thought of it. So, I am currently recharging my old Google Pixel 6 Pro on my desk, to see if I can also replicate this.

However, this support page from Google suggests that news briefings on Assistant are available in Indonesia, but not available in Malaysia. (The message about COVID-19 at the top seems to suggest this page is quite outdated, though). Perhaps in these countries Podnews is the default provider?

Is this a red herring? But the only place this filename (including the ?_from bit) is referenced is in a secret RSS feed for Google News, which only Google News is accessing.

Downloads over time

More to the point (and rather more worrying), this graph appears to show that downloads are not user initiated. There is a constant amount of downloads every hour - and the Friday episode gets downloaded in a similar way over Saturday and Sunday. I can’t think of a user journey that involves a consistent level of downloads like this (especially only in two countries). If this is initiated by some Google process, it would seem to be entirely automated.

My plan to serve a 32kbps AAC file would have worked better had the script not silently failed; so I’ve been serving an MP3 version instead. That’s now fixed, at least. It should now download a significantly smaller file, which should help the costs of this thing.

Anyway. I’m still none the wiser, but I think it’s a) from Google somehow; b) automated; c) expensive.

Day 30

After hearing nothing more from Google, and prodding them a bit during the month, I’ve ended up with a $200 additional bill for podcast hosting for this month (one of the downsides of self-hosting).

Google hasn’t really been very co-operative. Indeed, given that they’ve not asked me anything about this, other than a promise that they’re looking into it, I think that they’re just telling a tale hoping I’ll just forget about this.

So, I’ve published a lead story in Podnews today. I don’t really want to, and it’s a niche thing given that virtually nobody is in Google News audio briefings, but it’s worthwhile to at least see if we can get some answers. We’ve had precious little so far.

Day 65

I’ve heard nothing more from Google, in spite of emailing them a few times. But, I’ve kept looking at this.

Is it really Google News?

On April 22 I made a further amendment to the Google News private RSS feed - to stamp the audio files in that feed with the IP address of the machine requesting the Google News feed. So, if the Google News RSS feed is being requested by 192.168.0.1, the resulting audio file being requested will be called something like audio.mp3?_from=googleaudionews192.168.0.1 which I can then see in my logs.

It appears that the machines requesting the private RSS feed that I give to Google News is populating both the “real” Google News audio briefing plays, and these automated requests in Indonesia and Malaysia. Looking through, the IP addresses of the machines are:

66.249.91.34 rate-limited-proxy-66-249-91-34.google.com (in California)
66.249.91.36 rate-limited-proxy-66-249-91-36.google.com (in California)
66.249.91.38 rate-limited-proxy-66-249-91-38.google.com (in California)
66.249.69.65 crawl-66-249-69-65.googlebot.com (in California)
66.249.69.71 crawl-66-249-69-71.googlebot.com (in California)

From this data, the source of these downloads is provably the same machine that is populating the Google News audio briefing.

Is it really automated?

I have been guessing that it’s automated, given a suspiciously constant number of requests. However, looking at only the “Dalvik” data, it appears to follow the ebb and flow of a typical Malaysian timezone, with a slowdown of downloads between 23h and 05h.

However, Malaysia and Indonesia do not appear popular locations for our newsletter subscribers - https://podnews.net/about/subscriber-countries shows fewer than 100 subscribers for each, yet apparently I have 970,000 listeners there according to this - and I have made a few verbal callouts for anyone listening in the two countries to get in touch, and none have. Plus - the podcast is a glorified advert for the newsletter, with a callout at the top and the end of the show for the newsletter, and many other copious promotions of it in the audio. (That’s the strategy, after all!)

I still believe that the download is not being listened-to, and is being kicked off automatically by users who are doing something else with their Android phones. What seems clear to me is that it isn’t an alarm - given the spread of downloads through the day and through the hour.

Downloads per hour

08h in Kuala Lumpur; 00UTC - 2,999
09h 01 - 2,347
10h 02 - 1,903
11h 03 - 1,914
12h 04 - 2,092
13h 05 - 2,183
14h 06 - 2,160
15h 07 - 2,182
16h 08 - 2,453
17h 09 - 3,202
18h 10 - 2,992
19h 11 - 2,325
20h 12 - 2,061
21h 13 - 1,519
22h 14 - 1,071
23h 15 - 852
24h 16 - 457
01h 17 - 319
02h 18 - 209
03h 19 - 289
04h 20 - 588
05h 21 - 1180
06h 22 - 2,897
07h 23 - 3,554

OP3 tells me that Podnews got 1,230,537 downloads in April; however, 90.5% of all those downloads came from Malaysia or Kuala Lumpur. I’m a bit bored of this, but I really don’t know what else to do.

Day 68

Android also offers “routines”, and perhaps it’s that. Routines are most-often used for alarm clocks.

However, apparently, routines are only available in Indonesia, and not Malaysia. And Indonesia only supports routines in Indonesian, while Podnews Daily is most certainly in English.

But… if it were used in a routine like an alarm clock, I’d expect the “downloads per hour” data, above, to reflect that, and it doesn’t really.

I’d also suggest that alarms are mostly used at specific times. My alarm goes off at 7.00am - and I imagine that there’s a big spike at the top of the hour like that for alarms. What kind of psychopath sets their alarm for 6.56am?

So, in case you’re interested, from the download data, in Malaysia time, minute-by-minute:

6.55am: 67 downloads
6.56: 47 downloads
6.57: 55 downloads
6.58: 64 downloads
6.59: 57 downloads
7.00: 132 downloads
7.01: 101 downloads
7.02: 88 downloads
7.03: 63 downloads
7.04: 60 downloads
7.05: 46 downloads

… so there is certainly a spike at 7am; but not much of one.

None of that would account for the massive change in late February, either. Perhaps a different supplier ceased doing news briefings in February, and I was the only one left in those two markets. But even if that were the case, surely I’d have seen a spike at the end of February, and then a slow decline from March 1, given that a) I’m just talking about podcasts, and b) I’m doing it in English - you’d expect most people to scrabble to find a better solution. It doesn’t really add up.

Google haven’t responded further, by the way.

An aside: “What kind of psychopath sets their alarm for 6.56am?” - I have a routine that kicks in at 6.59am, actually, which reads me the weather forecast, and then tunes into ABC Radio Brisbane, getting the statewide news bulletin at 7.00am. At exactly 7.10am, the end of the news bulletin, it then switches to ABC News Radio, because the ABC Radio Brisbane breakfast show is not a show for me. So perhaps I’m the psychopath.

Day 74

Google reponded last week.

“The investigation shows that the extra traffic is unlikely from some test setup from Google internal side, as in this case the spike would likely be uniform across days. From the reports that you provided, Fridays seem to have the most amount of traffic (more than double of Thursday). … Given the investigation results, it doesn’t seem likely your traffic increases are driven by Google’s side.”

I respond:

“I’m confused. I’ve given you one report - server logs showing a single 24 hour period, for a Wednesday, based on UTC. Where are you seeing Friday “having more than double of Thursday”, please?”

They respond:

“Regarding the spike we mentioned, we are referring to the screenshot you’ve provided showing the trend on traffic from Feb 27(Thursday) onwards. "

That’s the screenshot from the top of this page, which I shared with them 74 days ago. They’ve not looked at the server logs I sent them; they’ve not read the reports I’ve been giving them, nor the link to the live graph showing how much traffic I’m now being sent.

I ask:

You say: “our engineering team has also worked to fix any potential issues and turn down features that could potentially increase traffic.” - what potential issues have you fixed? What “features” have you “turned down”?

and they answer:

Regarding the potential issues we’ve fixed, unfortunately, we can’t provide much details on this as it contains internal info.

Or, perhaps, the potential issues they’ve fixed are… nothing at all.

Here’s what that graph looks like now - tens of thousands of downloads daily - and, worryingly, still increasing - from 35,000 in early March to 41,000 in early May.

See this graph

I’ve given up

In the graph above, you’ll notice that on Thu 6 May, there’s 1,441 downloads rather than 41,423 downloads on the Tuesday.

That’s because I’ve admitted defeat. I’ve replaced the Google News audio feed audio with a curt message…

And, as is obvious from the above graph, removing the Google News audio feed has removed the entire source of this extra traffic. So much for “it doesn’t seem likely your traffic increases are driven by Google’s side”. Provably nonsense.

I don’t believe that Google actually bothered to investigate this at all. This whole experience has shown total disinterest from Google’s team: “Jean”, someone who is so eager to take responsibility for her work that she doesn’t even give a surname or a personal email address.

I’m probably $500 down on this whole experience, but Google have shown that they really don’t care. A company that I was once a big fan of, showing that it couldn’t care less about me, about their news publishers, or about their users.

Incidentally - I’ve not bothered to tell Google that I’ve removed the feed. So let’s see how long it takes them to notice. I’m suspecting it’s likely to be at least September.

(I recommend Kagi for a better search experience; iPhone for a better phone; and if I could find a decent alternative for email, I’d be happy to recommend an alternative there, too - Gmail remains the best there, unfortunately.)

Previously...

Next...