Tidal dropouts - probable cause

Apologies for starting yet another thread on this but I cannot find an open thread to post my findings on.

Tidal has been my main musical source for almost a year and it has been working perfectly well until the drop-outs started about a month ago.  I tried all the obvious things like swapping ports on the hub before a search on here showed me just how common the problem is.  Having just upgraded my Virgin 200 Mbs broadband to their 300Mbs service I was fairly certain it wasn't bandwidth related which was confirmed by Simon's posts stating that the problem is caused by latency.  Then the penny dropped - my problem started at the same time that Virgin changed my Superhub 2 to a Superhub 3 which was required for the 300 Mbs connection.

I spoke to Virgin and both their 1st and 2nd line support denied all knowledge of any problems with the Superhub 3.  However, I have found something on google which explains things.  It appears that some modem manufacturers have stopped using a Broadcom chipset in their cable modems and replaced it with the Intel Puma 6 and any cable modem that uses the Intel chipset could have the problem.  There's actually an ongoing lawsuit over this and the following extract comes from a website which explains the cause:

According to the lawsuits, the root of the problem is the cable modem manufacturers' decision to swap out the Broadcom chipset in their modems with the Puma 6 chipset from Intel Corporation. Arris told online technology websites that the problem stems from Intel’s Puma 6’s chipset, which causes cable modems to suffer from significant jitter and latency on their network connections. Reports on multiple websites and forums indicate that these cable modems suffer from "latency jitter so bad it ruins online gaming and other real-time connections." Intel has also confirmed the defect, stating that the company is "aware of an issue with the Puma 6 system-on-chip software that impacts latency," but after numerous months, it has failed to release any update that fixes the issue.   

I'm not sure what the forum rules are about posting links so if you google "Netgear Arris defective cable modems" then go to the classactionlawyers link dated 21 Apr 2017 you can read the article, there's also a list of the affected cable modems, the Virgin Superhub 3 being amongst them.

As a short term fix I've spoken to Virgin about reverting to the Superhub 2 and 200 Mbs broadband but they're sending an 'engineer' out tomorrow to see if he can do anything first..........  

Ask the question and you appear a fool for a moment.....don't ask the question and you are a fool for life.....

Original Post

There hasn't been much (or any!) reaction to this posting here, but although I don't use Tidal I would like to thank Johnell for drawing this to our attention because I had been musing about getting VM to upgrade my broadband to 300 Mbps so that I could get a Superhub 3. So now I won't be doing that!

best

David

I've actually got the Virgin engineer here as I'm typing and he says this is a known issue with the superhub3 that is causing real problems for online gamers.  With that in mind it would be interesting to hear from anybody who is experiencing the drop-outs that isn't using a Virgin superhub3 or one of the cable modems on the list.

I pulled the superhub 3 out and luckily I still had the 2 Ac. The issue is the three has major issues with certain routers up the chain from the house resulting in drop outs at regular intervals. It was very frsutrating and whilst they denied there was an issue they allowed me to swap back to the 2.

 

there was a 180 page forum post on this very issue. No idea if they have fixed it yet.

At the risk of flogging a dead horse, there is a test that Tidal users who are experiencing regular drop-outs can run in real time to see if it is an increase in latency times that is causing their problem.

If you have a pc or laptop, open a command window and type: "ping -t 8.8.8.8" (without the quotation marks), this sets up a constant ping to a Google server.  Then start using Naim/Tidal and wait for the drop out to occur.  As soon as it does, do a Ctrl-c in the command window to stop the ping and you will get the minimum, maximum and average ping times for that run.   I usually have an average of around 25ms which is a tad high but I see peaks from 700ms upwards and each one of these has caused Tidal to drop-out.

Please bear in mind that all this test will do is prove if it is latency that is causing your problem, it obviously won't tell you what is causing the latency.  However, and I may be completely wrong in this assumption, the fact that the ping is going to a google server which has nothing whatsoever to do with Tidal points to it being caused by something common.  Draw your own conclusions.

Hi, I have just seen this, although the latency and subsequent UDP  jitter issues typically used online gaming and real time apps is different to Tidal streaming which uses TCP (where jitter is mostly irrelevant) adding latency is not going to help, especially if nearing marginal conditions on the Naim streamer... but the occasional latency spike won't generally be an issue again you already very marginal. From what I understand the new Uniti models should be more resilient to this compared to the earlier Naim streamers...

However I hope Virgin provide you with an alternate/improved modem.

BTW to the post above, ICMP Pings are only going to show specific latency conditions in certain scenarios, and really of limited use here unless there is something fundamentally wrong.. which would probably affect everything/most things. To look at latency issues here that affect Tidal you need to monitor the line and the TCP flow with WireShark or similar and Analyzer the dynamics... and I have done this with Naim development.. the odd increase in latency for a non QoS enabled service, like on the public internet is fine and normal . This is why TCP is used for the transfers.

The OP is using the latest and current VM superhub 3 and the only alternative is to go back to an earlier Superhub (2 or 2ac) neither of which VM have stock, although from time to time they recover one from a customer moving onto their top tier broadband.

From googling the issue, the chip manufacturer hasn't yet issued a firmware upgrade, which as its nearly a year and affects many well known cable modem products used in North America, probably means that they have found that can't in practice solve it with a firmware upgrade or that the cable modem implementations can't upgrade that firmware with a software update. It feels rather like they will need a Superhub 4 avoiding that chip set.

In the meantime the OP is going to roll back to a Superhub 2 and it will be interesting to hear if that fixes his problem, which it probably will as it started when he was "upgraded" to a Superhub 3.

best

David

I do think this will  make marginal difference at best however, the only way to tell is look at the TCP trace. However TCP is designed to manage exactly this (varying timing and quality of connection), and this quite different from real time apps which use UDP which are essentially fire and forget streams where the timing of the packet can be important. It is the average latency that is important with TCP over several seconds (which in this case is showing 23mS on the shown path which is excellent)... with UDP (as with real time apps such as gaming and enterprise voice) the spot latency at a point in time affects the packet at that time..and therefore that peak will have an issue.

With Tidal I quite often see occasional peak latencies to their Akamai servers - this is quite normal.... but a high average latency then incurring an additional peak can be too much for the older Naim streamers. On the trace one would look for packet resend requests following that peak latency to see if the TCP engine recovers.

The other caution is that Ping uses ICMP and can be handled differently to other IP packet types such as TCP and UDP and so the timing shown can be down to the responding ICMP state machine... one would typically need to look at multiple sources to gain any sort of definitive idea of what is exactly happening .. although such a peak suggest further investigation.

 

Simon, you obviously have extensive knowledge of this area, indeed it was you stating the problem is down to latency that led me to my find out what my possible/probable cause is.  I'm still waiting for the Virgin engineer to pop back and refit the SH2 but he's doing it as a favour so I'm loathe to chase him.  

I'm paying extra for the 300 Mbps service which the SH2 cannot handle so I'll see what speeds I get and whether to save some money or not.  TBH the speed is a secondary consideration and as long as the drop-outs stop I'll be happy. 

I hope I haven't mislead people into thinking that the SH3 is the sole cause of this problem as there are so many other variables but everything points to it in my case.  I will report back.........

@Johnell, I also suggest it is worth contacting Naim just to see if they have a later firmware for your streamer.... it might be beta but it might address your issues better... good luck with your Virgin hub issue, and do let us know how you get on... yes the latency issues may well be the issue here,  but the occasional short latency spike shouldn't unduly affect things unless already all on or close to the edge.... grabbing a network trace is the only way to be sure to see what is happening

As I said in my first post, Virgin denied all knowledge of this when I put the question to them.  What I find even more unacceptable is that according to the guy that came to my house, Virgin have not even advised their own engineers that there is a problem with the SH3 even though they are regularly dealing with the issues.  It's doubtful whether you'll get any answers in their forum but please keep us posted.  

I'm hoping to have my SH2 refitted this week, did you keep yours?  

This has been ongoing for many months, long before you and I got the problem, and there are numerous forum threads about it.  With all due respect, if the ongoing lawsuit hasn't forced a resolution I very much doubt another forum thread will.  

Hopefully Intel will come up with a solution that can be "flashed" onto affected modems but I wouldn't bank on anything happening until the lawsuit is resolved. 

Maybe I should have used the word "possible" rather than "probable" in the thread title because as has been correctly pointed out by people in another thread, there are many factors to be taken into account.  However, the Intel chip issue is well known and documented, this is another excerpt from the webpage: 

According to The Register, "The problem appears to be that the x86 CPU in the modem is taking on too much work while processing network packets. Every couple of seconds or so, a high-priority maintenance task runs and it winds up momentarily hogging the processor, causing latency to increase by at least 200ms and, over time, about six per cent of packets to be dropped. It affects IPv4 and IPv6 – and it spoils internet gaming and other online real-time interaction that need fast response times."

Even users who do simple web browsing may be affected by the momentary high spikes in latency, causing websites to feel sluggish or not load.

We are actively investigating whether other cable modems containing the Puma 6 chipset, including modems from Linksys, Cisco, and Hitron, also suffer from the same severe network latency defect. Cable modems containing Intel's Puma 6 chipset that may be affected include:

Arris SB6190
Arris TG1672G
Arris TM1602
Super Hub 3 (Arris TG2492LG) (commonly, Virgin Media)
Hitron CGN3 / CDA / CGNV series modems:
Hitron CDA-32372
Hitron CDE-32372
Hitron CDA3-35
Hitron CGNV4
Hitron CGNM-3552 (commonly, Rogers)
Hitron CGN3 (eg CGN3-ACSMR)
Hitron CGNM-2250 (commonly, Shaw)
Linksys CM3024
Linksys CM3016
TP-Link CR7000
Netgear AC1750 C6300 AC1900
Netgear CM700
Telstra Gateway Max (Netgear AC1900 / C6300) (Australia)
Cisco DPC3848V
Cisco DPC3941B / DPC3941T (commonly, Comcast Xfinity XB3)
Cisco DPC3939
Compal CH7465-LG / Arris TG2492LG (commonly, Virgin Media Hub 3)
Samsung Home Media Server

Johnell posted:

You too mate....just make sure the BT modem isn't on the list!!!!!!

BT uses direct fibre or DSL technology - they don't use cable - different technology - BT also uses multicast for 4K/UHD IPTV streaming and such local processor induced delays would cause unacceptable performance glitches. BT tends to test its stuff  rather rigorously in this regard.  Its latest Smart Hub 6 device uses  a  purpose built  Broadcom BCM63137 processor running at 1 GHz with two cores 

Thanks for the info Simon, it's something to bear in mind if BT ever manage to get their fibre technology up to Virgin speeds.     

As for BT testing stuff, I spent the last 16 years of my BT career at Martlesham Heath Research working in a couple of systems integration and software test teams.  They were happy days until the massive influx of.........best I stop there methinks, suffice to say it became intolerable so I opted out.  

It was during this time that I first met Alistair when he ran Signals from his home near the BT site.  

Hi Johnell, I guess it's not all about speed, contention has a big part to play as well... it's not so much the quantity it's the quality ... I wish Ofcom would look at this as well rather than unduly focussing on potentially misleading sync speeds.. I gues Joe Public can understand a simple 'marketing' type number more, but for higher bandwidths above ADSL2 type speeds I think it means less and less and latency and contention matter more at the CPE demarcation  from a typical user perspective.. in my humble opinion of course...

So you were in Suffolk then... by your name reference to the BT facility it sounds like a little while ago... but I suspect things have refreshingly changed for the better ... and a truly global workforce and ecosystem of tech partners is embraced on that campus  now... it's now one of the UK's top R&D facilities.

I guess it's a balance between all the factors Simon as my own case proves.  On the SH2 I got connection speeds around 220 Mbps and even at busy periods everything ran smoothly and more importantly Tidal didn't drop out.  The same cannot be said of the SH3.

As for Martlesham, I left in October 2007 and as my friends who are still there have told me, it is indeed a very different place.  I actually contracted back to BT for an integration project in 2011/2012 but luckily was able to work from home.  

Johnell, thanks, so what is the latest from Virgin media on this? I guess if you have worked for the R&D facility at BT you might know your onions.. can you Wireshark the media transfer TCP on Tidal and see what is happening on the line? From reading between the lines it looks like the SH3 router CPU is not up to the task in hand.... rather than anything to do with the internet access itself... and in such scenarios it would be interesting to see what is happening.. if it it throwing everything away for a period of time - I can see that being an unfair challenge for the relatively small TCP buffers on the Naim streamers,

The fault does appear to lie with the Intel Puma 6 cpu , this is an excerpt from an article in ISPreview dated March 2017.  Until I read this I wasn't aware that the problem is worse on 200Mbps+ connections.  

As far as I'm aware the firmware patch still hasn't been released:

A nasty Intel Puma 6 chipset (x86 SoC) bug, which has caused latency spikes and packet loss for owners of Virgin Media’s latest SuperHub 3 (ARRIS TG2492S/CE) cable broadband router in the United Kingdom, will soon be patched by a new firmware release.

At the end of last year we reported that various Intel Puma 6 based routers (e.g. Arris Surfboard SB6190, Hitron CGNV4 and Compal CH7465-LG etc.), not just theSuperHub 3, all appeared to be suffering from the aforementioned problem and this was particularly noticeable on ultrafast (200Mbps+) broadband connections (here).

In short, the Central Processing Unit (CPU) inside the modem component of the router was taking on too much work while processing network packets, which caused the chipset to run a high-priority maintenance task every few seconds. Owners noticed that this task was occupying the CPU a bit too much and causing momentary latency spikes (increases of 200 milliseconds+), including a little packet loss.

A quick update: I finally got my Superhub 2ac refitted and as expected Tidal has behaved perfectly ever since.

During the conversation with Virgin tech support they actually admitted for the first time that there is a problem with the Superhub 3 and they expect to be releasing a firmware patch sometime soon but he couldn't / wouldn't say exactly when.  As soon as I hear the patch is available I'll refit the Superhub 3.......and cross my fingers.   

Johnell posted:

A quick update: I finally got my Superhub 2ac refitted and as expected Tidal has behaved perfectly ever since.

During the conversation with Virgin tech support they actually admitted for the first time that there is a problem with the Superhub 3 and they expect to be releasing a firmware patch sometime soon but he couldn't / wouldn't say exactly when.  As soon as I hear the patch is available I'll refit the Superhub 3.......and cross my fingers.   

Given what has been said here and eslsewhere, the issue looked like a hardware one, and this sort of issue often is hardware rather than software related.. if so firmware can try and minimise and workaround the issue, but rarely can resolve completely... I wouldn’t be surprised if it is hardware than a new Virgin router appears before too long.. alas not all faults can be ‘fixed’ with firmware. .. and hardware faults can be  a nightmare to workaround with unintended side effects and consequences.

It definitely is a hardware problem as the following states but a firmware release that alters the priority or frequency of the tasks the CPU is running should cure it.......you would think..... 

In short, the Central Processing Unit (CPU) inside the modem component of the router was taking on too much work while processing network packets, which caused the chipset to run a high-priority maintenance task every few seconds. Owners noticed that this task was occupying the CPU a bit too much and causing momentary latency spikes (increases of 200 milliseconds+), including a little packet loss.

Presumably whether you can fix it in firmware will depend whether there is a) a way to change the way the chip prioritises its work by a firmware change and if so b) whether a firmware update for the router can itself update the firmware in the chip.

The fact it has taken so long and is not reported as fixed in any other cable modem with the same chip set suggests to me that the answer to both questions isn't yes and I also think a Superhub 4 (or maybe they will call it something else like "Superhub 3 gaming version" so as not to get overwhelmed by demand) is more likely than a successful firmware upgrade to the SH3.

best

David

Well Ofcom announced the compensation that will be paid if your broadband provider cannot fix your problem. At £8 a day, I think they would have sorted this problem by now, if the compensation rates had applied when this story first surfaced.

David Hendon posted:

Presumably whether you can fix it in firmware will depend whether there is a) a way to change the way the chip prioritises its work by a firmware change and if so b) whether a firmware update for the router can itself update the firmware in the chip.

The fact it has taken so long and is not reported as fixed in any other cable modem with the same chip set suggests to me that the answer to both questions isn't yes and I also think a Superhub 4 (or maybe they will call it something else like "Superhub 3 gaming version" so as not to get overwhelmed by demand) is more likely than a successful firmware upgrade to the SH3.

best

David

I agree that it has taken an inordinate amount of time to release what Virgin themselves said will be a firmware fix so maybe it's just more BS to keep customers quiet.  To be honest I would rather have a new mk4 hub than a strangled mk3 anyway.

The question of compensation is interesting.  I first reported this to Virgin over 3 months ago and downgrading my service to alleviate the problem doesn't constitute a fix in my book so it's now around £800 and counting........I'm obviously not holding my breath.......

Likes (2)
J SavilleKlyde
×
×
×
×