2. Marco Agus & Marcos Balsa
Outline
• Part 1: Evolution of mobile graphics
• Part 2: Graphics development for mobile systems
• Part 3: Mobile graphics trends and real time visualization
of massive models on mobile systems
2
3. Marco Agus & Marcos Balsa
Part 1
Evolution of mobile graphics
3
4. Marco Agus & Marcos Balsa
Mobile evolution…. in movies
• Motorola DynaTac
• Nickname “brick phone”
• Weight: over 2 pounds
• Cost: thousands of dollars
• Battery life: around 35
minutes.
Money never
sleeps…..This is your
wake-up call, pal…
GO TO WORK
Wall Street, 1987
Michael Douglas in
Gordon Gekko
11
5. Marco Agus & Marcos Balsa
Mobile evolution… in movies
Hello Neo…
Do you know who
this is?
The Matrix, 1999
Laurence Fishburne in
Morpheus
• Nokia 8110
• Nickname “banana phone”
• 145g, display
monochrome, Smart SMS
• It costed 1000 eur
12
6. Marco Agus & Marcos Balsa
Mobile evolution… in movies
Skyfall, 2012
Daniel Craig in James
Bond
• Sony Xperia T
• Smartphone Android
• Display 4.6” 1280x720
• It costed 600 eur
• 13 Mpixel camera +
position sensors
16
7. Marco Agus & Marcos Balsa
Mobile evolution… in movies
Iron Man 3, 2013
Robert Downey Jr in
Tony Stark
• Future devices?
• Transparent and foldable
high resolution screens
• Gesture interfaces
• Wearable / integrated to
body
17
8. Marco Agus & Marcos Balsa
Mobile evolution... in games
• Nokia Snakes
– From 1997 an estimated
350 Mdevices, making it
one of the most widely
distributed games ever
created.
– Installed on Nokia devices
until 2007
19
9. Marco Agus & Marcos Balsa
Mobile evolution… in games
• Angry Birds (Rovio)
– first released for Apple's
iOS in December 2009
– 2 billion downloads across
all platforms
– widespread diffusion end
popolarity
– Adventure parks (Finland
and Malaysia)
20
10. Marco Agus & Marcos Balsa
Mobile evolution… in games
• Unreal Engine (GDC &
Google I/O 2014)
– running on an Nvidia Tegra K1
processor
– will support Google Tango and
Samsung Gear VR
– easy porting of games
– sophisticated 3d effects
21
11. Marco Agus & Marcos Balsa
Mobile connectivity evolution
• Bandwidth is doubling
every 18 months
• Mobile internet users
overcame desktop
internet users
• 2017 smartphone traffic
expected at 2.7 GB per
person per month
22
12. Marco Agus & Marcos Balsa
Displays and User Interface
• Before 2007 – old days
– PDA Palm OS/ Windows Pocket / Windows CE
– Stylus interaction (touch screens at early stages)
• Touch era
– 2007 – iOS /iPhone
– 2008 – Android / HTC Dream or G1
– Touch-enabled devices (no stylus required)
• Nowadays
– Wearables <2”
– Smartphones 3-6”
– Tablets >7-10”
– DLP projectors integrated
23
13. Marco Agus & Marcos Balsa
Chip evolution (1/2)
25
14. Marco Agus & Marcos Balsa
Chip evolution (2/2)
26
15. Marco Agus & Marcos Balsa
• Modern smartphones (tablets) are
compact visual computing
powerhouses
• DIFFUSION: more than 4.6 billion
mobile phone subscriptions
– [Ellison 2010]
• NETWORKING: High speed internet
connection (typical 1GB/month plan)
– 3G - < 0.6-3Mbps ~ 100KB/s - 400KB/s
(latency ~ 100-125ms)
– 4G – < 3-10Mbps ~ 400KB/s - 1MB/s
(latency ~ 60-70ms)
– 5G - 1Gbps (from 2016?)
• MEMORY: Increasing RAM and
storage space
– RAM 1-3GB
– Storage 8-64GB
• COMPUTING: Increasing processing
power
– CPU 4-8 core @ 2.5Ghz
– GPU 72-192 cores (~ALUs)
Scenario
27
16. Marco Agus & Marcos Balsa
Where are we going?
• Powerful devices for acquiring, processing and
visualizing information
• Accessibility of information (anybody, any time,
anywhere)
• Immense potential (integration of acquisition,
processing, visualization, cloud computing, and
collaborative tasks)
29
22. Marco Agus & Marcos Balsa
Operating Systems
• Linux based (Qt…)
– Ubuntu, Tizen, BBOS…
• Web based (HTML5)
– ChromeOS, FirefoxOS, WebOS (deceased?)…
• Windows Phone
• iOS (~unix + COCOA)
• Android (JAVA VM)
http://www.theregister.co.uk/2014/08/04/android_beats_ios_for_first_time/
2014
35
23. Marco Agus & Marcos Balsa
Operating Systems
• Brief comparison (focus is research here )
– Windows phone 8
+ Best IDE – Visual Studio (2013)
- Windows development (porting Linux code?)
- Market is quite restrictive (100$ /year. – 5 free uploads/year)
- Certification + review
- OpenGL support ? Through ANGLE (over D3D)
– iOS
+ Best devices ? (homogeneity, at least)
+ Good IDE – Xcode + clang
- Market is rather restrictive (100$/year)
- Review
+ OpenGL support – PowerVR GPUs only
– Android
+ Best platform ? (more open, more devices, more flexible ?)
- Many IDEs – integration is not great (~tricky)
+ Visual Studio / Eclipse / QtCreator
+ With GCC / clang compiler
+ Market is very accessible (25$)
+ OpenGL support / OpenCL support
Monetization?
Research?
Monetization??
42
24. Marco Agus & Marcos Balsa
Programming Languages
http://www.tops-int.com/blog/which-programming-languages-are-used-for-web-desktop-and-mobile-apps/
43
25. Marco Agus & Marcos Balsa
Programming Languages
• C/C++
– Classic, performance, codebase, control
• Objective C
– Bit different style (message based), well-documented API for iOS, mainly
COCOA/iOS
• Java
– Android is VM/JIT based, ~portability (API), well-known, extended,
codebase
• C#
– VM based, ~Java evolution, MONO (Win, Android, iOS)
• Swift
– Apple new language, simplicity, performance, easy, LLVM-based compilers
• HTML5/JS
– Web technologies, extended, compatibility
• Perl, Python, Ruby, D, GO (Google), Hack (facebook), …
– More options, not so popular ?
44
26. Marco Agus & Marcos Balsa
Programming Languages
• C/C++
– Classic, performance, codebase, control
• Objective C
– Bit different style (message based), well-documented API for iOS, mainly
COCOA/iOS
• Java
– Android is VM/JIT based, ~portability (API), well-known, extended,
codebase
• C#
– VM based, ~Java evolution, MONO (Win, Android, iOS)
• Swift
– Apple new language, simplicity, performance, easy, LLVM-based compilers
• HTML5/JS
– Web technologies, extended, compatibility
• Perl, Python, Ruby, D, GO (Google), Hack (facebook), …
– More options, not so popular ?
45
28. Marco Agus & Marcos Balsa
Architectures
CPU architectures
X86 – ARM – MIPS
48
29. Marco Agus & Marcos Balsa
Architectures
• x86 (CISC 32/64bit)
– Intel Atom Z3740/Z3770
• Bay Trail (2W)
– AMD Mullins (not yet in the market)
• 4.5W
-Power consumption
+Performance PartOf(desktop class GPU!)
+compatibility with old SW ?
49
30. Marco Agus & Marcos Balsa
Architectures
• x86 (CISC 32/64bit)
– Intel Atom Z3740/Z3770
• Bay Trail (2W)
– AMD Mullins (not yet in the market)
• 4.5W
• ARM
– RISC 32/64bit
• MIPS
– RISC 32/64bit
– Acquired by Imagination, Inc. @2014
-Power consumption
+Performance PartOf(desktop class GPU)
+compatibility with old SW ?
+Power efficiency
+Performance/watt
+Smaller area (RISC) lower cost
+demonstrated its capacities on consoles
(PS/PS2/PSP/N64/Wii…), also on SGI
50
31. Marco Agus & Marcos Balsa
Architectures – RISC vs. CISC but…
• CISC (Complex Instruction Set Computer)
– Fast program execution (optimized complex paths)
– Complex instructions (i.e. memory-to-memory instructions)
• RISC (Reduced Instruction Set Computer)
– Fast instructions (fixed cycles per instruction)
– Simple instructions (fixed/reduced cost per instruction)
• FISC (Fast Instruction Set Computer)
– Current RISC processors integrate many improvements from
CISC: superscalar, branch prediction, SIMD, out-of-order
– Philosophy fixed/reduced cycle count/instr. (SIMD?)
– Discussion (Post-RISC):
• http://archive.arstechnica.com/cpu/4q99/risc-cisc/rvc-5.html
51
32. Marco Agus & Marcos Balsa
Architectures
RISC Integrate complex instructions ARM / MIPS
CISC Reduce instruction complexity Intel Atom
MMX/SSE/Out-of-Order
52
33. Marco Agus & Marcos Balsa
Architectures – X86
• Intel (32/64 bit)
– Competitive with Bay Trail Atom Z3470 ~4W
– Pursuing low power consumption instead of performance
– GPU: Intel HD graphics for Bay Trail ~ GF 8600M GT | GF210
– Present in many tablets (i.e. Surface) with Windows Phone/Android
– Present in a few smartphones
• AMD
– Not yet competitive in low power > 4W
– Good GPU performance (GCN 192 core)
– No known smartphone/tablet shipped
• Supported on
– Android, Windows Phone, Tizen, Firefox OS, Ubuntu Touch,…
57
34. Marco Agus & Marcos Balsa
Architectures – ARM
• ARM Ltd.
– RISC processor (32/64 bit) – getting to 64bit ~ 2014/15
– IP (intellectual property) – Instruction Set / ref. implementation
– CPU / GPU (Mali)
• Licenses (instruction set OR ref. design)
– Instruction Set license -> custom made design (SnapDragon,
Hummingbird in iPhone 4 & Galaxy S)
• Optimizations (particular paths, improved core freq. control,…)
– Reference design (Cortex A9, Cortex A15, Cortex A53/A57…)
• Licensees (instruction set OR ref. design)
– Apple, Qualcomm, Samsung, Nvidia, MediaTek, AMD @<2014…
– Few IS licenses, mostly adopting reference design
• Manufacturers
– Contracted by Licensees
• GlobalFoundries, United Microelectronics, TSM, and Intel (@2013)
59
35. Marco Agus & Marcos Balsa
Architectures - MIPS
• MIPS
– RISC processor (32/64 bit)
– IP (intellectual property) – licensing
– Recently acquired by Imagination, Inc.
– Can provide full solution (SystemOnChip, SoC): wireless/cpu/gpu
• Performance/watt should be comparable to that of ARM
• GPU from Imagination have demonstrated its value
– iDevices have always included its PowerVR SGX / Rogue cores
– Good integration with CPU and other components on SoC could
provide a very competitive solution (i.e. Qualcomm)
• Supported on
– Android, Mer (fork from MeeGo)
• Knowledge from previous HW (PSP, PS, PS2, WII…)
– Pretty much the same with ARM HW
61
36. Marco Agus & Marcos Balsa
Architectures
GPU architectures
64
39. Marco Agus & Marcos Balsa
Architectures - GPU
Desktop ~ 2880 cores (GTX780i) ~5000 Gflops
Vs
Mobile ~ 256 cores (Tegra X1) ~512 Gflops @ FP32
PS4 ~ 1840 cores
XBOX ONE ~ 1240 cores
69
40. Marco Agus & Marcos Balsa
Architectures – GPU
• Immediate Mode Rendering (IMR)
• Tile Based Rendering (TBR)
• Tile Based Deferred Rendering (TBDR)
70
41. Marco Agus & Marcos Balsa
Architectures – GPU
• Inmediate Mode Rendering (IMR)
– Geometry is processed in submission order
• High overdraw (shaded pixels can be overwritten)
– Buffers are kept in System Memory
• High bandwidth / power / latency
– Early-Z helps depending on geometry sorting
• Depth buffer value closer than fragment discard
http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4
VS FS
71
42. Marco Agus & Marcos Balsa
Architectures – GPU
• Tile Based Rendering (TBR)
– Rasterizing per-tile (triangles in bins per tile) 16x16, 32x32
• Buffers are kept on-chip memory (GPU) – fast! geometry limit?
– Triangles processed in submission order (TB-IMR)
• Overdraw (front-to-back -> early z cull)
– Early-Z helps depending on geometry sorting
http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4
72
43. Marco Agus & Marcos Balsa
Architectures – GPU
http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4
• Tile Based Deferred Rendering (TBDR)
– Fragment processing (tex + shade) ~waits for Hidden Surface Removal
• Micro Depth Buffer – depth test before fragment submission
– whole tile 1 frag/pixel
• iPAD 2X slower than Desktop GeForce at HSR
(FastMobileShaders_siggraph2011)
– Possible to prefetch textures before shading/texturing
– Hard to profile!!! ~~~Timing?
Limit: ~100Ktri + complex shader
73
44. Marco Agus & Marcos Balsa
Architectures – Power consumption
• Reduce working set Tiling
• Optimize bandwidth Deferring
• Minimize area/circuitry RISC?
Power consumption by memory access
Courtesy of: Shebanow – HPG 2013 keynote
Shared memory Fight for the bus!
BUT
Less CPU GPU copies! (expctd.)
75
45. Marco Agus & Marcos Balsa
Architectures – GPU
• General issues
– Shared memory – no memory copy between CPU – GPU !!!
• ~70% of memory available for app (GPU reserved memory + OS)
– Precision is relevant
• Halving precision ~doubles operations/second (1 FP128 = 8 FP16)
• vertex shader (medp/highp) | fragment shader (~lowp for color)
– Overdraw depending on the renderer front-to-back for IMR/TBR
• Depth only pass can work on IMR/TBR depending on geometry count
– Texture compression!!! bandwidth, power, performance
• Take ~5Gb/s as typical bandwidth on embedded devices (vs. 100Gb/s on desktop)
– Texture mipmapping / compression reduces bandwidth
– glReadPixels(), glCopyTexImage(), glTexSubImage() on FBO… Block! Sync!
– glDiscardFramebufferEXT() indicate render attachment is done with
76
46. Marco Agus & Marcos Balsa
Architectures – GPU
Texture Compression
GL ES Format(bpp) Devices Proprietary Notes
PVRTC >=1Ext RGB (2,4),
RGBA(2,4)
PowerVR Imagination Good quality, not extended
S3TC(DXT1/3 & 5) >=1/2Ext RGB(4),
RGBA(4,8)
Tegra
Intel HD
S3 D3D
ATC >=1Ext Adreno AMD Maps to DXT with minor
conversion
ETC1 Core in 1
Ext in 2
RGB(4) All GLES 1
devices
Free Most extended, only RGB
ETC2/EAC Ext in 2
Core in 3
R(4), RG(8),
RGB(4,8),
RGBA(8)
GLES3 devices
Most GLES2
devices
Free Most extended, good
compression (ETC2),
compat. ETC1
ASTC >=2Ext Many(0.89 to
8bpp)
Mali /
GLES3.1?
Free Includes 3D, various
formats and texture types.
Not spread
77
47. Marco Agus & Marcos Balsa
Architectures -- GPU
• ASTC (Adaptive Scalable Texture Compression)
– ARM general solution for texture compression (open /
complex HW)
– 2D / 3D formats (normal, LDR/HDR, luminance, alpha, …)
– 128bits per block map 4x4..12x12 pixels & 3^3…6^3
– 0.59 bpp on 3D textures with 6^3 pixels per block
– ARM Mali GPU T6xx support & next generation GPUs ?
– Quality & compression ratio! – free
– Wait till GLES3 is expanded --
• ETC2
– Core in GLES3 and GL4.3 RGB + RGBA compressed
formats (~ S3TC)
78
48. Marco Agus & Marcos Balsa
Architectures – GPU
• Profiling tools
– ARM SDK (ARM) – Windows/Linux
• DS-5 Streamline – ($$) Sw and GPU profiling and debugging
– PowerVR SDK (Imagination) – Linux/Windows/OSX
• PVRUniSCo shader analyzer #cycles
– Tegra SDK (NVIDIA) – Linux/Windows/OSX
• Tegra System Profiler
• NVIDIA PerfHUD ES
– Adreno SDK (Qualcomm) –
• Adreno Profiler – Windows only
79
50. Marco Agus & Marcos Balsa
3D APIs
Mantle Direct3D Metal
OpenGL Next
5.0 ?
81
51. Marco Agus & Marcos Balsa
• Direct 3D
– 3D API from M$ for Win OS (XBOX)
– ANGLE library provides GL support on top of D3D
• Mantle
– AMD 3D API with Low-level access D3D12 | GL_NG
• Metal
– Apple 3D API with low-level access
• OpenGL Desktop/ES/WebGL
– GL for embedded systems, now in version 3.0
• GLES3.1 ~ GL4.4 (GL_NG/Vulkan is coming…)
3D APIs
82
52. Marco Agus & Marcos Balsa
3D APIs
• Direct 3D
– Games on Windows (mostly) / XBOX
– Define 3D functionality state-of-the-art
• OpenGL typically following
• 3D graphic cards highly collaborative
• Multithread programming
– Proprietary – closed source – M$
– Tested & stable – good support + tools
• Metal
– Apple 3D API with low-level access
– Much in the way of Mantle?
• buffer & image, command buffers, sync…
– Lean & mean simple + ~flexible
Win &
Game research
Mac/iOS future ?
83
53. Marco Agus & Marcos Balsa
3D APIs
• Mantle
– AMD effort – low level – direct access – 3D API
– Direct control of memory (CPU/GPU) – multithreading done well
• User-required synchronization
– API calls per frame <3k 100K
– Resources: buffer & image
– Simplified driver maintenance (vendors)
• High level API/Framework/Engines will be developed
– Pipeline state
• shaders + targets (depth/color…) + resources + geometry
– Command queues + synchronization
• Compute / Draw / DMA(mem. Copy)
– Bindless – shaders can refer to state resources
– OpenGL NEXT seems to move into ‘Mantle direction’
– Direct 3D 12 already pursuing low-level access
84
54. Marco Agus & Marcos Balsa
3D APIs
Pre-compiled pipeline: shaders + resources execute
http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14
85
55. Marco Agus & Marcos Balsa
3D APIs Command queues generated for each processing unit:
graphics/compute/memory access
http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14
86
56. Marco Agus & Marcos Balsa
3D APIs
The pipeline defines the association of variables to resource descriptors
http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14
87
57. Marco Agus & Marcos Balsa
3D APIs
• OpenGL (Desktop/ES/WebGL)
– Open / research / cross-platform
– Lagging in front of D3D Legacy support
• No more FIXED PIPELINE (1992)!! -- scientific visualization…
– GLSL (2003)…GL 3.1(2009) deprecation/no fixed pipeline
• Compatibility profile legacy again…(till GL 4)
• Core profile
– GLSL shader required
– VAO
» group of VBO
» we need a base VAO for using VBO!
– Simplifying VBO + GLSL only!
90
58. Marco Agus & Marcos Balsa
3D APIs
– OpenGL ES 1.1
• Fixed pipeline – no glBegin/End – no GL_POLYGON -- VBO
– OpenGL ES 2 (OpenGL 1.5 + GLSL) ~ GL4.1
• No fixed pipeline (shaders mandatory), ETC1 texture compress..
– OpenGL ES 3 ~ GL4.3
• Occlusion queries + geometry instancing
• 32bit integer/float in GLSL
• Core 3D textures, depth textures, ETC2/EAC, many formats…
• Uniform Buffer Objects (packed shader parameters)
– OpenGL ES 3.1 ~ GL4.4
• Compute shaders (atomics, load/store)
• Separate shader objects (reuse)
• Indirect draw (shader culling…)
• NO geometry/tessellation
91
59. Marco Agus & Marcos Balsa
3D APIs
• GPGPU
– OpenCL
• On Android it is not much loved
– Use GPU vendor SDK provided libs
• On iOS is only accepted for system apps
– Use old-school GPGPU (fragment shader -> FrameBuffer)
– RenderScript
• Google solution for processing using GPU…
• Too niche! ~ Android
– Compute shaders
• GLES 3.1!!! General solution!!
– DirectCompute on D3D
93
60. Marco Agus & Marcos Balsa
Cross-development
http://www.appian.com/blog/enterprise-mobility-2/are-mobile-platform-choices-limiting-enterprise-process-innovation
94
65. Marco Agus & Marcos Balsa
Cross-development
• Codebase Internet toolchain (clang, gcc…) iOS, Android
CMAKE OpenCV
QMAKE Qt 5.4
Autoconf manually modify
100
*Setup envionment
CC= clang –arch armv7 –sysroot $SYSROOT …
CPP=clang++ …
LD=ld …
AR=ar …
*pointing to NDK_DIR/toolchain/$ARCH/bin/ where $ARCH={armv7, x86,…}
search for gcc/g++/clang inside NDK directories
*once setup the environment ~[”typically”] most tools work (DEFINES,
architecture types, platform supported functions, …)
Needed for!
66. Marco Agus & Marcos Balsa
Cross-development
• 3D framework / engine
– HW/platform abstraction abstraction^2
• L1) Use 3D API portability issues! ( HW , OS )
– Try using GL ES 2/3/3.1 < GL 2.1/3.3/4.4 (Desktop GLES!!)
• L2) Abstract 3D API (HW , OS )
– minimum common function set {D3D, GLES, Metal, Vulkan…}
– (~) buffer, image, shader, pipeline (config pkg)
Metal
GL ES
GL desktop
Vulkan
D3D
wrapper
Shader program
Pipeline
Buffer
Image buffer
Take a look at Metal!
WinPhone
Android
iOS
Unix/Linux
Windows
MacOS
101
Application
67. Marco Agus & Marcos Balsa
Cross-development
• WebGL
– Based on OpenGL ES 2 (WebGL 2 ES 3)
• Exceptions WGL2(from ES3): MappedBuffers, drawRangeElements,
ProgramBinaries
• Performance JS
– TypedArrays [Khronos13]
• http://www.khronos.org/registry/typedarray/specs/latest/
– asm.js (Mozilla)
• JS used in optimized way (i.e. var v1= v2 |0, ensuring type is int)
• TypedArray large arrays memory allocation (pre-reserved )
• Porting C++ code
– Emscripten C++ LLVM JS (TypedArrays + asm.js)
102
68. Marco Agus & Marcos Balsa
Mobile Graphics – Development
• Conclusions
– 1) Native + platform UI …
• C++ [any language] LLVM compiler target platform
• Platform Framework front-end 1 for each platform
• Performance + flexibility
• Call native code from platform code (JNI, Object C, …)
– 2) Native through framework …
• Qt | Marmalade …
• C++ code uses framework API
– Framework API abstracts platform API [N platforms]
– BUT less flexible integration ?
– 3) Go web HTML5/JS …
• Rewrite or Use Emscripten JS code + WebGL
• ~Free portability (chrome / firefox / IE … ?)
• BUT performance is 0.5X at most with asm.js
103
Notas do Editor
Mobile development is nowadays very heterogeneous. There are many architectures, OS, Prog. Lang., 3D APIs, dev. Environments…
The objective of this presentation is to unwrap the complexity and try to provide a brief overview on the available options and solutions in order to develop with minimal pain.
This will be the roadmap, which I will skip since it will remain mainly for quick reference.
Most part linux based (Except Windows Phone)
Some HTML5 framework (as OS API) – web based
Some Qt framework – Ubuntu Phone
IOS – unix based COCOA framework
Android – Java Virtual Machine
WinPho – +Visual Studio / -windows porting / -market restrictive / -D3D vs GL
iOS – +Best HW (homogeneity) / +Xcode / -market restrictive / +GL & PowerVR GPUs
Android -- +open / ?many IDE / +market very accessible / +GL/CL support / ?many GPU
CISC – fast program execution, complex HW
RISC – fast instructions, simple HW, complexity on the compiler
FISC – starts from RISC, takes from CISC, improve performance at the cost of HW complexity
Intel/ARM
http://www.infoworld.com/d/computer-hardware/intel-vs-arm-two-titans-tangled-fate-237265?page=0,1
AMD
http://www.anandtech.com/show/7514/amd-2014-mobile-apu-update-beema-and-mullins
http://www.enterprisetech.com/2014/05/05/amd-unify-x86-arm-systems-skybridge/
RISC took vector instructions (SIMD) and HW improvements (superscalar, OoO)
CISC reduced instruction complexity looking for reduced transistor/improved power performance
Intel is taking known lessons from Intel Atom into desktop-class
Intel is trying to get into the mobile market, until now only tablets are starting to integrate Atom CPUs
ARM vendors are increasing and producing many compelling solutions (MediaTek in china for low-cost, Qualcomm for high-end, Samsung going for premium in-house solutions…)
Nvidia has taken the challenge and providing very competitive solutions (desktop class GPU into tablets, smartphones @2014)
Apple processors are manufactured by Samsung @<2014 (A4 intrinsity + samsung, >A5 apple (intrinsity?))
Intel is trying to get into the mobile market, until now only tablets are starting to integrate Atom CPUs
ARM vendors are increasing and producing many compelling solutions (MediaTek in china for low-cost, Qualcomm for high-end, Samsung going for premium in-house solutions…)
Nvidia has taken the challenge and providing very competitive solutions (desktop class GPU into tablets, smartphones @2014)
Apple processors are manufactured by Samsung @<2014 (A4 intrinsity + samsung, >A5 apple (intrinsity?))
Global computing market share is almost 40-40% for ARM and x86, while mobile market share shows about 70% integrating ARM technology.
----------------------------------------------------------
These graphs illustrate current market share for computing devices (mobile and desktop).
One can see that x86 (Intel+AMD mostly) still hold almost 50% of the market, with a decreasing tendence.
While ARM is increasing its part getting very close to x86
Looking at the mobile market, almost 70% of the market integrates ARM technologies through a variety of vendors and manufacturers.
http://www.androidauthority.com/intel-vs-arm-future-mobile-technology-338340/
Global computing market share is almost 40-40% for ARM and x86, while mobile market share shows about 70% integrating ARM technology.
----------------------------------------------------------
These graphs illustrate current market share for computing devices (mobile and desktop).
One can see that x86 (Intel+AMD mostly) still hold almost 50% of the market, with a decreasing tendence.
While ARM is increasing its part getting very close to x86
Looking at the mobile market, almost 70% of the market integrates ARM technologies through a variety of vendors and manufacturers.
http://www.androidauthority.com/intel-vs-arm-future-mobile-technology-338340/
Typical OpenGL 3D pipeline
2 types:
-fixed vertex / fragment processors (tegra 2/3/4, mali 400, PowerVR SGX 5XX)
-unified shaders (tegra K1/X1, mali 7XX, Snapdragon 400, PowerVR G6XXX)
Barrel processor: changes thread every N instructions, shared context or many registers to hold context and avoid context-switching
Mali
http://www.techradar.com/news/phone-and-communications/mobile-phones/arm-mali-mobile-graphics-everything-you-need-to-know-1095888
Adreno
http://www.anandtech.com/show/6112/qualcomms-quadcore-snapdragon-s4-apq8064adreno-320-performance-preview
Typical OpenGL 3D pipeline
http://kyokojap.myweb.hinet.net/gpu_gflops
<--
Barrel processor: changes thread every N instructions, shared context or many registers to hold context and avoid context-switching
Mali
http://www.techradar.com/news/phone-and-communications/mobile-phones/arm-mali-mobile-graphics-everything-you-need-to-know-1095888
Adreno
http://www.anandtech.com/show/6112/qualcomms-quadcore-snapdragon-s4-apq8064adreno-320-performance-preview
PowerVR
http://www.idownloadblog.com/2014/02/24/imagination-details-powervr-gx6650/
http://withimagination.imgtec.com/powervr/graphics-cores-trying-compare-apples-apples
http://www.anandtech.com/show/7335/the-iphone-5s-review/7
http://www.anandtech.com/show/4413/ti-announces-omap-4470-and-specs-powervr-sgx544-18-ghz-dual-core-cortexa9
PS3/xbox360
http://www.3djuegos.com/foros/tema/2108290/0/xbox-360-vs-ps3-hardware/
Mobile GPU benchmarks
http://forum.beyond3d.com/showthread.php?t=64068
Barrel processor: changes thread every N instructions, shared context or many registers to hold context and avoid context-switching
Mali
http://www.techradar.com/news/phone-and-communications/mobile-phones/arm-mali-mobile-graphics-everything-you-need-to-know-1095888
Adreno
http://www.anandtech.com/show/6112/qualcomms-quadcore-snapdragon-s4-apq8064adreno-320-performance-preview
PowerVR
http://www.idownloadblog.com/2014/02/24/imagination-details-powervr-gx6650/
http://withimagination.imgtec.com/powervr/graphics-cores-trying-compare-apples-apples
http://www.anandtech.com/show/735/3
Rather than process one polygon at a time without knowledge of other polygons in the scene, a tile based renderer first groups polygons together in groups called display lists. These display lists allow a scene to be broken into smaller blocks, known as tiles, which are then rendered independently.
The first advantage to rendering smaller portions of a scene at once is that it allows operations to be performed on-chip without having to access external memory. This allows all z-calculations to be performed without having to access an external z-buffer via the memory bus. Naturally, this eliminates the expensive z-buffer reads and writes that occur constantly on immediate mode renderers.
Rendering small tiles instead of a complete scene also means that pixels that are not visible can be thrown out before the rendering process beings. Since each tile consists of a display list that includes each polygon in that tile, hidden surface removal can occur before any textures are applied. Once again this significantly reduces the amount of information that must travel over the memory bus, as textures for non-visible surfaces do not need to be processed. Also located on chip is a tile buffer which acts as a fame buffer for an individual tile. This allows blending to be performed without costly memory reads and writ
Pre-compiled pipeline – upfront, less validation on binding
Command queues for graphics, compute and memory access can be run in parallel from different threads and synchronization barriers can be used for synchronization
Bindless – associate pipeline layout with resource descriptor
Access mobile platform API through framework (monetization -> QtMonetisation)
Look for cmake profiles for Android/ios on the web (i.e. OpenCV)
Look for cmake profiles for Android/ios on the web (i.e. OpenCV)
Look for cmake profiles for Android/ios on the web (i.e. OpenCV)
Look for cmake profiles for Android/ios on the web (i.e. OpenCV)