Since everyone is speculating about the Nintendo switch and what hardware it will feature I decided to throw my own thoughts and educated guess in to the hat on the subject. (Warning Very LONG, Tl: Dr at bottom)
Tegra history
So far we only know that the switch will be powered by a custom Nvidia Tegra based chip. So far Nvidia has released 2 major Tegra SoCs the Tegra K1 and Tegra X1. The original K1 featured 4 Arm A15 cores + a low power arm core @2.2 GHz. Graphically it had single smx (https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf) of 192 modified Kepler Cuda cores running at up to 950 MHz. This gives it a peak Fp32 performance of 384 Gflops. All built on the 28nm Node from TSMC. (https://www.jonpeddie.com/download/media/slides/Nvidia_Tegra_K1_Deep_Dive_rev2.pdf) There was also a second version of the Tegra k1 that removed the ARM cores and replaced them with 2 custom ARM based Denver cores made by Nvidia themselves. While there were only 2 cores present on the Denver equipped chips, they had equal or greater performance than the A15 equipped version of the k1. In single core performance clock for clock, the Denver core crushes the A15 core at roughly 1.75 times the performance (http://www.androidauthority.com/tegra-k1-exynos-5433-snap-805-541582/ ) While there is no official die size given for the actual chip there is speculation that it ranges from the 120mm2 to 150 mm^ 2 range (www.fool.com/investing/general/2014/06/04/just-how-big-is-nvidia-corporations-tegra-k1.aspx+&cd=4&hl=en&ct=clnk&gl=us) and a industry insider has stated it at 121 mm2. Given that there is an Nvidia Kepler gpu die that features just 2 Kepler smx units, gk107 (https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_700_series ) it’s relatively easy to calculate the area of a single smx. Gk107 comes out to 118 mm2 giving us ~59mm2 for a single smx unit. The a15 cores on the device take up around ~16.5 mm2 (see chart http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/2) This gives us a dies size of 75.5 sq mm. Add in the wireless modem, extra low power core, cache and all the other features we can take the total die size estimate as accurate. The last important note about the k1 is its power draw, in a tablet it draws on around .6 to 4 watts yet it can reach over 15 w if pushed hard, and in the Jetson tk1 it comes with a fan and heat sink. (http://elinux.org/Jetson/Jetson_TK1_Power)
Moving forward to the Tegra x1, The Tegra x1 was a further evolution of the Tegra k1. Hardware wise it features 4 Arm A57 cores and 4 Arm A53 cores @ 2.0 GHz and a 2smm 256 core Maxwell gpu @ ~1 GHz. In Maxwell the smx from Kepler shrank and became a smm which now features 128 cores rather than 192. All of this built on the 20nm TSMC process (http://www.anandtech.com/show/9289/the-nvidia-shield-android-tv-review/2 ) (http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf ) The new cpu cluster brings us up to a score of ~4900 points multi core (https://browser.primatelabs.com/geekbench3/search?utf8=%E2%9C%93&q=shield+tv ) for comparison a AMD a4-5000 scores roughly ~2600 points (https://browser.primatelabs.com/geekbench3/search?utf8=%E2%9C%93&q=a4-5000 ), The A4-5000 is used as a reference as it’s a slightly slower 4 core jaguar part{ 1.5 GHz vs. 1.75, 4 core vs. 8 ), which allows us estimate the power of the cpu in the Xbox one(https://en.wikipedia.org/wiki/Xbox_One ) and ps4 2600 * 2 (4 > 8 core) * (1.75/1.5) = ~6066 points for the Xbox 1 and slightly less for the ps4. The new gpu in the x1 has a total performance of 512 Gflops fp32 but more importantly can handle up to 4k60 output. For comparison the gpu in the Xbox one has a compute performance of 1310 Gflops fp32 and the ps4 1843 Gflops fp32 it also unlike the k1 can perform 2 fp16 instructions in on clock cycle instead 1 fp32 operation. This allows 2x the computing power if one were to use fp16 instructions which the majority of rendering applications use allowing the x1 to hit a peak of 1024 Gflops fp16. All of this still falls under the same relative power envelope as the prior k1 chip, peaking max in the shield TV at 19.4w (http://www.anandtech.com/show/9289/the-nvidia-shield-android-tv-review/9 ) Seeing as the TMSC 20nm mode allows roughly 1.9x the density on the same chip as compared to older 28nm process the its seems as if with the x1 Nvidia focused on increasing the total performance on the chip completely rather than any power optimization or price reduction. And since the chip has roughly 2 time the amount of all hardware parts it is relatively safe to assume the die size is in the same roughly ~130mm 2 package
All of this information from the prior Tegra chips will allow us speculate about a possible future chip.
A in depth speculation
Nvidia claims that in their own blog that the switch will be powered by a gpu based on the same architecture as its Geforce cards (https://blogs.nvidia.com/blog/2016/10/20/nintendo-switch/ ), which in the current time frame means that they are based on the Pascal architecture. The modern Pascal cards have a few major differences from their prior architectures. The first is that they operate at much higher clock speeds. On average this is 1.42 times the clock speed of a comparable Maxwell based card. It also has the same smm unites which feature 128 cores with some minor changes ( http://www.geforce.com/hardware/10series/geforce-gtx-1080#specs ) (https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf ) The second is that they are built on TSMC’s 16nm process, which provides up 40% higher clock speed and 60% power saving. This explains the much higher clock speed on Pascal chips and their lower tdps. (http://www.tsmc.com/english/dedicatedFoundry/technology/16nm.htm ) Something to notice is that it does not provide any improved density over the 20nm node. The next thing to notice is that compare clock to clock and shader to shader the Nvidia Pascal architecture has no real performance increase over a Maxwell based chip. This means any performance increase must come from either more shaders or a higher clock speed. (http://seekingalpha.com/instablog/45056646-clarence-spurr/4884330-pascal-new-king )
Cpu wise there have been a number of improvements in arm cores since the release of the Tegra x1, Specifically there are now faster high end A series cores such as the A72 and A73 designs as well as a myriad of other custom designs such as Qualcomm’s Kryo and Nvidia’s Denver core. All of them provide major improvements over the prior A15 andA57 ARM cores. The A72 provides 1.16 – 1.5 times the performance clock for clock over the A57 (http://www.anandtech.com/show/9184/arm-reveals-cortex-a72-architecture-details ) and the A73 another 1.05 – 1.15x performance on top of the A72. There have been invitations from ARM on the low power front as well from ARM. In 2015 they launched the A35 low power core. While actually less powerful that a A53 core on average having any ware from 80-100% of the performance at the same clock speed it is considerably smaller and uses a lot less power. Its 75% the size of the A53 on the same process node and uses 68% of the power (http://www.anandtech.com/show/9769/arm-announces-cortex-a35 ) At the 2016 Hot Chips conference Nvidia announced there newest SoC chip codename parker. I it features 2 second generation Denver cores and 4 A57 cores in a custom Big.Little setup as well as 256 core Pascal based gpu that delivers up to 1536 Gflops fp16 performance which is equal to 768 Gflops fp32. (https://blogs.nvidia.com/blog/2016/08/22/parker-for-self-driving-cars/) The CPU in the parker chip may seem like it has reduced computational power in comparison to the x1 however the 2 Denver cores provide a large power boost over the 4 A53 cores they replaced. However they do increase the power consumption of the chip by a large amount. As the Switch is primarily a mobile device power consumption becomes key in comparison to a car, the primary environment for a parker chip. This where the A35 Arm cores are really nice to have, having them being small light and power efficient allows them to conserve power when necessary in environments such as a menu screen or light apps where performance is critical. Since Nvidia already has their own custom Big.little setup it would be entirely possible to see theses core backed up with Denver2 cores to provide the system grunt when running games. Simply just leaving 2 denver2 cores on the chip with a 4 core cluster of A35 core would most likely leave us in same performance range as the prior x1 chip. While this certainly isn’t weak extra performance to power the system up and possibly over the performance level of a Xbox one. This leaves us with my predictions for a CPU component of 4 Nvidia Denver2 cores + 4 A35 cores in a Big.Little Setup.
Since as shown above there is no real performance gain going from Maxwell to Pascal, this means that the chip must be clocked much higher than the one in the x1 Doing the math 1536/1024 = x/1000 means that the gpu must be clocked at ~1500mhz which is pretty much in line with the clock speed increases in Pascal and the 16nm node. There are rumors floating around that the Nintendo switch will have a roughly equal amount hardware power as an Xbox one, so while this chip is a big improvement over the Tegra x1 it is still far away from an Xbox one in performance having roughly 58% of the Xbox one’s performance in fp32. As the smallest size of a Pascal gpu is the smm, any increase in gpu core amount has to be a multiple of 128. This leaves us with the options of 384 cores, 512 cores, 640 cores, 768 cores…. And up for a larger gpu. 384 cores at the same clock speed of 1500mhz would give us a peak of 1152 flops fp32 putting us within striking distance of the power of a Xbox one as the rumors claim. Bring that up to 512 @ 1500 MHz would give 1536 Gflops fp32, a good amount over the Xbox one. However 512 core count could also allow us to slow down the cores to conserve power while still providing an equal amount of performance bring the 512 cores down to 1125 MHz would still give us the 1152 GFlops fp32. This also allows them provide extra gpu performance if we were in a non power limited environment such as when the Switch is in its dock. So for my prediction for the gpu is a Nvidia Pascal chip featuring 4 smms giving us 512 core all running at ~1125 while mobile and over clocking up to ~ 1500 mhz when docked giving us between 1152-1536 Gflops fp32. Considering that we now know that the Switch feature s a 6.2” 720p screen these performance numbers seem certainly reasonable. The Xbox one typically renders at 900p 30fps and sometimes up to 1080p 60. Typically Nintendo in the past has targeted 60 Fps rather than 30, 1152 Gflops Fp32 should be enough to hit that target especially is the rendering is Fp16 heavy as this allows 2x the render performance on Pascal.
Even with the power saving measures taken in my estimates for the chip powering the Switch will come in quite power hungry, by my guesses around 15w or more under heavy load. This pushes it in to the realm of laptops in power consumption. We have been given rumors that the switch will have an up to 3 hour battery life undocked. Typical lithium ion batteries have a voltage output of 3.7 volts. Assuming that the Switch uses 15w of power that means with a standard lithium ion battery it would draw roughly 4 amps ( Watts = volts * amps) Using this battery life calculator (http://www.digikey.com/en/resources/conversion-calculators/conversion-calculator-battery-life ) it would take a roughly 16000 mah battery to power a Switch for around 3 hours. The most typical lithium ion battery is the 18650 cell, a cylindrical cells measuring 1.86 cm in dia and 6.52 cm high with a typical capacity of 1500- 3500mah( https://en.wikipedia.org/wiki/List_of_battery_sizes ) Converting its dimensions to volume gives us a volume of 17.66 cm3. Using a rather standard cell of 3000mah and dividing by the volume gives us a power density ~170mah per cm3. Being given that the Switch under my 15 power draw would require a 16000mah battery to reach its rumored ~3 hour run time, this means its battery would have to be around 94 cm3 assuming it had the same power density. Now we know that the switch fits at least a 6.2 screen the estimated measurements for the switch given here (http://arstechnica.com/gaming/2016/10/how-big-is-the-nintendo-switch-an-ars-visual-analysis/) fall roughly in line with the actual measurements we have. This gives us that the dimensions of the actual tablet part of the Switch are ~ 18.4 cm x 10.6 cm aka 195.04 cm 2. This means that if the Switch was just 1 cm thick a battery to power it wouldn’t take up more than 50% of the device. Furthermore I dia round approximation on the thickness of the Switch from this picture (http://vgfaq.com/wp-content/uploads/2016/10/Nintendo-Switch-Cartridge-Slot.jpg ) Right next to the cartridge slot in the picture is what appears and most likely is a 3.5 mm headphone jack. Measuring the pixels across its center to get a scale and then measuring the pixels of the entire Switch gave me an estimated thickness of 118.4mm making these numbers even more reasonable. Furthermore in that picture we can also see what appears to see a set of cooling vents on the top of the Switch. As much like the older Jetson Tegra boards a chip with this power envelope would require active cooling to run. Since the Switch must also be docked having a radial fan in the device sucking air from the back of the device and the blowing it out through the top much like a laptop would makes a lot of sense as it could still function even when docked on its stand.
Taken all as one my guess on the Hardware powering the switch leaves us with a powerful console that is capable of 720p 60 fps gaming while being very mobile and providing decent battery life.
TL:DR – Custom Tegra Chip 4 Denver2 cores + 4 Arm A35 cores at ~3 GHz + 512 Core Pascal gpu. 15W Tdp 59 W/Hr battery. Renders at 720p 60. Close to as powerful as an Xbox one possibly greater when docked. If this seems extreme please read the full thing.
Edit - Formating
Submitted by zcskywire2 | #Specialdealer Special Offer Online Shopping Store 2016
No comments:
Post a Comment